From patchwork Wed Jan 8 16:15:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13931268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB6F9E77188 for ; Wed, 8 Jan 2025 16:15:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CF0D6B008A; Wed, 8 Jan 2025 11:15:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 57F556B0095; Wed, 8 Jan 2025 11:15:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 401F66B0096; Wed, 8 Jan 2025 11:15:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 20E996B008A for ; Wed, 8 Jan 2025 11:15:52 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9DA78A0804 for ; Wed, 8 Jan 2025 16:15:51 +0000 (UTC) X-FDA: 82984785702.11.F3375F3 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf07.hostedemail.com (Postfix) with ESMTP id 9796A40021 for ; Wed, 8 Jan 2025 16:15:49 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=J+iLvJZ9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3tKR-ZwoKCDsvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3tKR-ZwoKCDsvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736352949; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=sk3nztKTZL3vCOZPF5M7nt11YZ4zTaONZRd3xxpiShI=; b=XPBf/5dozv5Hm4jwkf/I1BbeAAl2CwkC14Z/5UYzOPS/xMared4RGXy4l2iv819cylR/T9 LyWV1sKEAH/KOS94q3hZ+aucahcshkEhpZNN+t9YMwuXKknHorr4amd8EIi/ESbccI/Lc0 s2EqOxAu5ZDuLWHknPJpJT9bqdSlOzQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736352949; a=rsa-sha256; cv=none; b=DCflUemgdsUjbmZATZZiuPnYqqgstL/zemrqlsqAuD9Iby18UxNFMz7crsvTfmjBDxeR5c CMNXvGX9+V4St45AHlqu0jwdHv6VA47AnobwEfoL1PM9lc/jVVzM2HtNgLqT4Z5yQHKx6f g0K08C+VBIgenzXHrRxJy5WzFDIlMmo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=J+iLvJZ9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3tKR-ZwoKCDsvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3tKR-ZwoKCDsvlpovXejbadlldib.Zljifkru-jjhsXZh.lod@flex--yosryahmed.bounces.google.com Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-216405eea1fso235101915ad.0 for ; Wed, 08 Jan 2025 08:15:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736352948; x=1736957748; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=sk3nztKTZL3vCOZPF5M7nt11YZ4zTaONZRd3xxpiShI=; b=J+iLvJZ9F+KSL58HCPzCPqfEbNa18JBLh7F03t291slZ2N5IZloDZENhi/2xiXS/bn WJBNfgKzPlQPn69Hrj7mMy3/QCQ8o4O3GHtzEQCgSmkXpEEpeS0fu/Wqtu484DHEIj3S QLDxMjxaWk1lGElPMy8tg/tiwUd9Lphqb2V9CDi6kYYux+91wrlhAtT+2Z+b1MDSdl3r p+/8inVzjs1aWziyCH/FVybz9xDNHsu6Gm9dtIindtS9w51NLkmSVl0+Dm7sBe4P/M7d GuVHENbBg2JZLr+dFctvEsClD7p6nUPFc8EhtYWySfxBNKdiYfYxnPnluxuQGih6+bUs xC8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736352948; x=1736957748; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sk3nztKTZL3vCOZPF5M7nt11YZ4zTaONZRd3xxpiShI=; b=Y953H+LUkhndZSPLBw91Xvnmqt+KnIh2lcweQhxVHMBjUg3bEDfich2yI4EfC9AIR7 kWadQO3bLPWA9SgAD8jMYxEsAj3K/1zh/leHjUpcY4ypp+I6iL7D37hOytS5qVmq+XJ0 JJHygJH28LvOEUj2/b1pV6kj2CoxHde0EzOfF50DsRdnCytIYWwQtWADfWQ31GnnvNp2 jgODEDd7zcJIOJrb6fPDGivri9Mm3FpxYY46ck6Baf9vq56nTLiTqn2ezQJ1/JLUNt3n YCVVVuldGnLlviOfjX8SQR0p9cQ+znRi8dtbi9Il0VBUHsAiNCNT+i9IK64MQAumWxvp cg0g== X-Forwarded-Encrypted: i=1; AJvYcCW1dCYYy8FDqJdebsVXB70ObXc6NxY+FU1s7l+NDj1KgN7GuvdPkxy2QMQ8TVIk6ZtFUrTz20BgXA==@kvack.org X-Gm-Message-State: AOJu0YxnApVAoclURuLMIr1340/6jVS20zqF+xNezN9wjocZ9EtUUsF0 dBWWv6GvRW7Peq/V32wn33MScFv0onRLGx0Fb0RweuNJQ8PRl7mZGrz6flE2jX6zNqB08yCiTAY M/5fcR3TE6ktjU7tNBg== X-Google-Smtp-Source: AGHT+IF+k0kNGMCgpb1rjKQ5TUAYhvnPEovTtKxpvrG0ZZM5AKCuXp007gpJWla3EMSjurP1+7JDiP7gWOWXbJ0b X-Received: from plbko14.prod.google.com ([2002:a17:903:7ce:b0:206:fee5:fffa]) (user=yosryahmed job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ec82:b0:216:2dc5:2310 with SMTP id d9443c01a7336-21a83fcf8bcmr50554015ad.48.1736352948066; Wed, 08 Jan 2025 08:15:48 -0800 (PST) Date: Wed, 8 Jan 2025 16:15:29 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250108161529.3193825-1-yosryahmed@google.com> Subject: [PATCH] mm: zswap: properly synchronize freeing resources during CPU hotunplug From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org X-Stat-Signature: oxxcw5aa5sycy4bukcpxeqf8crm1ogat X-Rspamd-Queue-Id: 9796A40021 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736352949-7875 X-HE-Meta: U2FsdGVkX1+InXWzdPZLjCjh+eHT2dFR3vKwLWJ2bBCg9AiCtEO4j+xGCymDmfqHJjWnGayxpoSL1KTOEVuI8gGb7OZWW1coc6DRB8Vyj74oRx1wsV+Z4k34EJL/XEcj8SYMoCC9//KnS3mnymspDW/KAslxKazf2b7b1m32A70vwE91cLKbk4JuC4HLdjC7qEAnTVdikOcf4lvK+5mdBn/3WUe0oawymgfxZp00w8e4rDTl7h75UAO6siP1a22exeO27o8A4nr6nMlB4eEzMJ4alZDFInGHBwY51C03mnXpoNewveXP2WKRM3cFHWzhuJB9cgOEE+NfvFwIgq4p2dFq5AmPlpraE5l923r1UsngGcgrZAHf6tiJ5v0+wdLq92MaTs622FC3W1JLYjP5IU+cs2+/Bs9+b7LrRtl1jvCBICz0h195WQqxtqyhaZPhiSi/cD++zoq0/HhjmpxyPvGTSqN++XqzmCaSZ2mnjEkq0GjbRwKVAvGC8m2qVRRfrIMgXfqYF1TF3y10nzT3kyde4timO0NYfif18GQt9BZRA6JiQpaT4WN41a6VzJt5eLQ7AjG79FLBNiQDKwci5Az56YwhsZKESjpooucRK55yt7aSWxQLyAZv4oxfSJL8remutXaVABnttf51PFVl7QMztVfPiLPViyvgkY1KdzbG1LXc0WNKjc8vqA7Px6wRApYA3iWKLp7ULUYQ9i1FV3okWYPxTPEcq+oz+jq3qHLt9aeFt7/jCS8XU4/pHKkpjSUkH04/fcjN8kRKW3Fti6VshOf04pur1syrPMJSZzoV4zNrYrqX9U+JCL8+Dm7WgwhmuhqzfezeXtOUJPEJnBzueL82rSG4Zkb7BnBj3lYUxxTa+P4cBlMACU8zz+hQLS9C9Tc4e5U7K4MNCMl0j8hhmzyz/ul+hYeTKeuefAZ0ni8cjuXt2yLXQU4H0RQ3aYXsYVwwAqrGfshJpGQ YyivOl2m fOxXRfUCOBgUn7ZYBXnZscYm1HHkAzxsEyqm5HQqnsxFNUsdNBUGh1LgQI2iPG9ZEYlY15U22WLZS6X4nuH8eBmmEnKLOqhqPFsiyBsMutMygVviLD6gFYGBcfRJoiBvFCP6hqYPXA19wZu4N2aTi4UoiNOjpLDnC+xhFGIQdko6ERxl/4+OV8zdEVQsDOtZ/FWuvXIv17avXscI/qPAdb2hZ3A6WMrq3RJZ3iTZv4qP8EklEdqNNQFs+a+p+aJQvIPwPsauPaDossN4xSp8EBThbvJZ8Ba1hPkSSQUvjeyw/o8hOeQvxOnXQSxRYHLBxaONDfAErsRNgozv5e7+ETtQlvEoqniseFsTgprhEcCoEBJfDq6w1ebf8+QZHb/8ztmxF22Pv9U2a2NXxJBbRWvB9lkETQrjKh1pqWylhAZHI5DsvMs7xVjPiXfKHKe68+nSPIm2EHyBytAamyUC80vQ3FUXF3qZiPjsvOOauwVqxAPUtcLAcdu6JjJzLlVpY8CCEs47Ry/Y1a3Bul0MZp+iu5ADbrzkOL49zTkDzBIZLhfiRiw7KEpvg6YbiLr4C/5mR0+OazBLuw3RvLD0NSJIrPCQq4G8bo5SIXrTXthz1J3NTJSmUhRvOTpL8wz4oSt3OGzDDmVWqqyRSJL3fn8QrWmJBss+cF8qc6Hthh+OJ4gaWo0zsZKy8nQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the current CPU at the beginning of the operation is retrieved and used throughout. However, since neither preemption nor migration are disabled, it is possible that the operation continues on a different CPU. If the original CPU is hotunplugged while the acomp_ctx is still in use, we run into a UAF bug as some of the resources attached to the acomp_ctx are freed during hotunplug in zswap_cpu_comp_dead(). The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") when the switch to the crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was retrieved using get_cpu_ptr() which disables preemption and makes sure the CPU cannot go away from under us. Preemption cannot be disabled with the crypto_acomp API as a sleepable context is needed. During CPU hotunplug, hold the acomp_ctx.mutex before freeing any resources, and set acomp_ctx.req to NULL when it is freed. In the compress/decompress paths, after acquiring the acomp_ctx.mutex make sure that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU. This adds proper synchronization to ensure that the acomp_ctx resources are not freed from under compress/decompress paths. Note that the per-CPU acomp_ctx itself (including the mutex) is not freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU after it is hotunplugged. Previously a fix was attempted by holding cpus_read_lock() [1]. This would have caused a potential deadlock as it is possible for code already holding the lock to fall into reclaim and enter zswap (causing a deadlock). A fix was also attempted using SRCU for synchronization, but Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug notifiers [2]. Alternative fixes that were considered/attempted and could have worked: - Refcounting the per-CPU acomp_ctx. This involves complexity in handling the race between the refcount dropping to zero in zswap_[de]compress() and the refcount being re-initialized when the CPU is onlined. - Disabling migration before getting the per-CPU acomp_ctx [3], but that's discouraged and is a much bigger hammer than needed, and could result in subtle performance issues. [1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@google.com/ [2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@google.com/ [3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@google.com/ Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") Cc: Signed-off-by: Yosry Ahmed Reported-by: Johannes Weiner Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ Reported-by: Sam Sun Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ --- This applies on top of the latest mm-hotfixes-unstable on top of 'Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"' and after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was dropped. --- mm/zswap.c | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index f6316b66fb236..4e3148050e093 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node) struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node); struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu); + mutex_lock(&acomp_ctx->mutex); if (!IS_ERR_OR_NULL(acomp_ctx)) { if (!IS_ERR_OR_NULL(acomp_ctx->req)) acomp_request_free(acomp_ctx->req); + acomp_ctx->req = NULL; if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) crypto_free_acomp(acomp_ctx->acomp); kfree(acomp_ctx->buffer); } + mutex_unlock(&acomp_ctx->mutex); return 0; } +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock( + struct crypto_acomp_ctx __percpu *acomp_ctx) +{ + struct crypto_acomp_ctx *ctx; + + for (;;) { + ctx = raw_cpu_ptr(acomp_ctx); + mutex_lock(&ctx->mutex); + if (likely(ctx->req)) + return ctx; + /* + * It is possible that we were migrated to a different CPU after + * getting the per-CPU ctx but before the mutex was acquired. If + * the old CPU got offlined, zswap_cpu_comp_dead() could have + * already freed ctx->req (among other things) and set it to + * NULL. Just try again on the new CPU that we ended up on. + */ + mutex_unlock(&ctx->mutex); + } +} + +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx) +{ + mutex_unlock(&ctx->mutex); +} + static bool zswap_compress(struct page *page, struct zswap_entry *entry, struct zswap_pool *pool) { @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, gfp_t gfp; u8 *dst; - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); - - mutex_lock(&acomp_ctx->mutex); - + acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx); dst = acomp_ctx->buffer; sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, else if (alloc_ret) zswap_reject_alloc_fail++; - mutex_unlock(&acomp_ctx->mutex); + acomp_ctx_put_unlock(acomp_ctx); return comp_ret == 0 && alloc_ret == 0; } @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) struct crypto_acomp_ctx *acomp_ctx; u8 *src; - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(&acomp_ctx->mutex); - + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx); src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); /* * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer @@ -986,10 +1010,10 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); - mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); + acomp_ctx_put_unlock(acomp_ctx); } /*********************************