From patchwork Wed Jan 8 22:24:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13931662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35C81E77188 for ; Wed, 8 Jan 2025 22:24:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1ECF6B0085; Wed, 8 Jan 2025 17:24:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BCEF46B0088; Wed, 8 Jan 2025 17:24:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A97406B0089; Wed, 8 Jan 2025 17:24:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 886646B0085 for ; Wed, 8 Jan 2025 17:24:46 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3BD61ADB29 for ; Wed, 8 Jan 2025 22:24:46 +0000 (UTC) X-FDA: 82985715372.08.1F14FA6 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf11.hostedemail.com (Postfix) with ESMTP id 77B004000D for ; Wed, 8 Jan 2025 22:24:44 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rvVmELNj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of 3K_t-ZwoKCGAWMQPW8FKCBEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yosryahmed.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3K_t-ZwoKCGAWMQPW8FKCBEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736375084; a=rsa-sha256; cv=none; b=chqvQbNVnYtkYN+EYSwoDzYJWsjYdp3TT9cTq2hZVbZUH12Vm4XH0EV1MizA65yCMbTa+Y IVi2rXOxIoZXDikpZJvmI6chJlqv3P4PmiWNwNAZvHI/bNmcHcC5Fm8w+1J5/C6CDcydN0 +JYdm7thxU7RqCI72njsJt302u1yhnM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rvVmELNj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of 3K_t-ZwoKCGAWMQPW8FKCBEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yosryahmed.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3K_t-ZwoKCGAWMQPW8FKCBEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736375084; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=endIA1QywE9kx/kEQSL+BrbQ7PfliQ5dUNVoyyBb53c=; b=1ul4lex8aysv0Lpa67Qpj6g+GNBhJtPCxb8xkaTOee2RF7Cpp8Ad4cWr4Bv0nCVZD6ZK1z CjRWFbtpn+xaOhCI+hps2Kis+269GShmwj743Jug/E3xZ5Q5Vqx8Aatp0mtF+bwQux8vt1 rZXsi5e6tN60yhIOvchwSYlI+Fg8qCw= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2f2a9743093so490570a91.3 for ; Wed, 08 Jan 2025 14:24:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736375083; x=1736979883; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=endIA1QywE9kx/kEQSL+BrbQ7PfliQ5dUNVoyyBb53c=; b=rvVmELNjigBz5EoKTIqIyiNgwdl9V2qnHNEQsonyAWysbhHLpzGwWXnL4+2j4Wha9x o8N5IfRB/GCrbD1xv6clOBlJnISh+AnzDRDW00EoP3CSwt0SHjMArWSDg2cH5rrY9SYN MZY7Dx2hgo/p2mjiOa7tc7Mvf6f1NbQxYLh/8GbopG5Xz8muoohili5UJw28Nn2Xz3e5 tJtNrZv7e9GYmFrUYQYvP9jop2HFh/Jaa3ujnyBnuaGd6fbhPVg6l52vMJcuafbK1WBZ jgizgBtTqqXXQqurXXZF/UpeiOINQ+FSq7KIL6WQmZHkS+jZX4npUPcIcUekTH+uRmBk rR+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736375083; x=1736979883; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=endIA1QywE9kx/kEQSL+BrbQ7PfliQ5dUNVoyyBb53c=; b=WIkdRfxIQlvrtSIb2xsb9NIlqFgNFD377kkFvuVTZt1o8nHliECPXr9DYyMSKiVEKV zXxXhKMqApSRSGgV08tb5DL83nK+fO0FvUJ+elVQP+COacXLiS1qQxDr6GFI9Jfc+H+h Jgiqjq6mhET8CviRe+bW99S2Xi4wuhyUoA3ApzcvO9+5+sDe5yYjiGxwXWon/TBPVvyQ h3uQkBbv1lCM+KRVLIc5wL9SKkyN1scAr/PQc8kSxoSH5jvhMFKCIjKqg2n3Vgr56d1S qWf0KJhsN1kCKPCotN5+DLdSTf/0EE2Unwb6O1z2X9W2AyBfFGP72qYYMAGnmm3z/Xak JDsQ== X-Forwarded-Encrypted: i=1; AJvYcCVu3/00toqL34MtLciGS6hSZKWjDqVw56mOFcBhWFVxdOHC8VYmUgQyYsU5oTIebrZPIyoUtu/AkQ==@kvack.org X-Gm-Message-State: AOJu0YxWckUx/HMiLcJ4zVSsVLv8wIhOgG4AcKUgxCimpQfCqr6POvrn 31iDAHkrYb/eoNG97/eFIIEmM5fTi+jmrMcjk/ciOfeQr/rYv3AxfBW9vrIbxjEC/INknRucbFL 0X7Nxqq698rhMhhaCIQ== X-Google-Smtp-Source: AGHT+IHQNDoBQkyxuqotK+6pIB3Qh6vv4RAvnvdTxtTdy7eLuD2HWyiqxTPks8H9syShFTKOer6bMy+2RWSSHlc3 X-Received: from pjbsu4.prod.google.com ([2002:a17:90b:5344:b0:2f4:47fc:7f17]) (user=yosryahmed job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2e41:b0:2ee:96a5:721e with SMTP id 98e67ed59e1d1-2f548eb25cfmr8199419a91.12.1736375083149; Wed, 08 Jan 2025 14:24:43 -0800 (PST) Date: Wed, 8 Jan 2025 22:24:41 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250108222441.3622031-1-yosryahmed@google.com> Subject: [PATCH v2] mm: zswap: properly synchronize freeing resources during CPU hotunplug From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org X-Stat-Signature: txhtuzpf95hbhbt1gnkzc3r581wskwr7 X-Rspam-User: X-Rspamd-Queue-Id: 77B004000D X-Rspamd-Server: rspam08 X-HE-Tag: 1736375084-28473 X-HE-Meta: U2FsdGVkX1/vp2M4hKqTjjr7L7omGJzbKZW/Fs2LfAQRVGUDxj/3n9IT7rBwYYIl192uRlg6Qx0R5GX9X6JHzzwH91jfSRVsM6OGnixZHOWRLcL++owrIEWAF13ZcxC7RDR9dvhPcU+idydWuYUvQUhbzVwxXin3OczgOzKil7uGUOtTDsE5HABr/XvqmPsIoF4D5I4KAqZkjXG/YDWdORX4a7Mo4v6VY66iyxwWHpue8ChXVuAn4pZ45U1UscxB4QojbZo/Zr8msT8egcTl1mMP+5vBaDsxnRasXFi7ti7025rRn1wtxP6jIgFHYuzLZugomCBxMAItwAfmI54oAkn3feU9ZfX6PAQ3MdKW4KCClvxQcFWVq1Kj0SSETwYrj4kCBqsR7LToxuySFOZfic4cSY7vWCVFTzpD6fBcZZbWHaTG1c1brfmMOTUcq/KZWpD5lPs4GAVsYn9P7DS3HMiY3Q0VAFVxQ3EjckjArP/W/QLquO0nrFCl/uE8PURW2JJuD//HMvXxHo7d4w8PWfYTs/MD3hgJqRVlDRtIjlZ+Fpro48vTb0sIoHmH9fkhdl31Tya1c23TTSrbyycBmptyURHfYfDRmwyd5lrFH0zPAB2+VSgQlJ+wiC/ihsU+e+2aawEGYJzIt1SLEdgp0FzZNNFi6CB5s/W17mDF6UwFA8bJ5XASrajc7h6lFLXbnU5QAOUgtu2XIhdvaUb5JPG8193Wlh+dslBsL0ZcL9WhX3avlFwAhm/47MN5IAbMkdxk6oWyk1JY+xlA/y+hbzxVnRk1dZqN6RW5TIN8nAe0An7F/mlN1f0MWKjFLkvpjPPSVNL8/xjtoftP4cWwcny3nH/J5EcfyFvp4GiWt/x25mobhqQrY0hudyE9fzVMhJ9Pfo1w6JEgIn8piw4CAj6AkjJRAByr05kMOT6i4CdEaHNhIhAFb4uNHS+9zm/xZD/0FUHX1NLJE6JA8nx 2XrqCC5O N5VMJ0KReiCOSQbRbj7cpfhK105he1yhNV44XV1IVFzb+rHavCMqMZxNsfO/hDSsxPYwLJAzExKv6dNWX2qXIRmnnpqCHLcmj5PDYX+crtbMxYYfCLThe5K6biru2A2Tgh80JVQox6kxuIJajVdmx1pSH93zTLKzl3kGb4Yvg5XeBFyeoQrgwoU8fL+wAd9R4IVFFLzATnzu/AojsuejvUIS+rtECa0CUjhnwChHC1mqVBv6HYo9I9nVTicf0HGHLqsx9EgeqzlUHRHCSlwrQhYuExyWOSoGdSKUyBTa6Y4o9312UFu0tydcrof0zPMypUdtOpUsgkbnWQq4Snvup/KGqm+h6uWP7MEnCceUFzBUjk4T01sElXRaDNurbhwvtODkhPMQmLHZz1ZEK2ohcvQ4mcIJlbU1dSTCxcTETsAVxkkC6SboNg92RrWO255D2VpzrrboFcUvtbzRZdYPCjmQcBTXEGFKY62us5N9goGL964cp5yPgvqCv6n7OTVL7sVWJajSkNR/k2Gn0iQ26tS7gooqm0jxGuxRpYc1BRLoZO61EWjMG566fSFN97ivMIZvTWdmGR3R0WipJs7nOl5OTVX6CHKchwhfQzAxRF/vXvZuseOaqlenAheGK56oAS8SEaTJNzXfUEKlO1fPp40xGK03c8IFNWTvecjcYNwWT4G44dNTdq48akA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the current CPU at the beginning of the operation is retrieved and used throughout. However, since neither preemption nor migration are disabled, it is possible that the operation continues on a different CPU. If the original CPU is hotunplugged while the acomp_ctx is still in use, we run into a UAF bug as some of the resources attached to the acomp_ctx are freed during hotunplug in zswap_cpu_comp_dead() (i.e. acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp). The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") when the switch to the crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was retrieved using get_cpu_ptr() which disables preemption and makes sure the CPU cannot go away from under us. Preemption cannot be disabled with the crypto_acomp API as a sleepable context is needed. Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating and freeing resources with compression/decompression paths. Make sure that acomp_ctx.req is NULL when the resources are freed. In the compression/decompression paths, check if acomp_ctx.req is NULL after acquiring the mutex (meaning the CPU was offlined) and retry on the new CPU. The initialization of acomp_ctx.mutex is moved from the CPU hotplug callback to the pool initialization where it belongs (where the mutex is allocated). In addition to adding clarity, this makes sure that CPU hotplug cannot reinitialize a mutex that is already locked by compression/decompression. Previously a fix was attempted by holding cpus_read_lock() [1]. This would have caused a potential deadlock as it is possible for code already holding the lock to fall into reclaim and enter zswap (causing a deadlock). A fix was also attempted using SRCU for synchronization, but Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug notifiers [2]. Alternative fixes that were considered/attempted and could have worked: - Refcounting the per-CPU acomp_ctx. This involves complexity in handling the race between the refcount dropping to zero in zswap_[de]compress() and the refcount being re-initialized when the CPU is onlined. - Disabling migration before getting the per-CPU acomp_ctx [3], but that's discouraged and is a much bigger hammer than needed, and could result in subtle performance issues. [1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@google.com/ [2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@google.com/ [3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@google.com/ Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") Cc: Signed-off-by: Yosry Ahmed Reported-by: Johannes Weiner Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ Reported-by: Sam Sun Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ --- This applies on top of the latest mm-hotfixes-unstable on top of 'Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"' and after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was dropped. v1 -> v2: - Move the initialization of the mutex to pool initialization. - Use the mutex to also synchronize with the CPU hotplug callback (i.e. zswap_cpu_comp_prep()). - Naming cleanups. --- mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 14 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index f6316b66fb236..4d7e564732267 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) struct zswap_pool *pool; char name[38]; /* 'zswap' + 32 char (max) num + \0 */ gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; - int ret; + int ret, cpu; if (!zswap_has_pool) { /* if either are unset, pool initialization failed, and we @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) goto error; } + for_each_possible_cpu(cpu) + mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex); + ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node); if (ret) @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) struct acomp_req *req; int ret; - mutex_init(&acomp_ctx->mutex); - + mutex_lock(&acomp_ctx->mutex); acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); - if (!acomp_ctx->buffer) - return -ENOMEM; + if (!acomp_ctx->buffer) { + ret = -ENOMEM; + goto buffer_fail; + } acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu)); if (IS_ERR(acomp)) { @@ -844,6 +848,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) ret = -ENOMEM; goto req_fail; } + + /* acomp_ctx->req must be NULL if the acomp_ctx is not fully initialized */ acomp_ctx->req = req; crypto_init_wait(&acomp_ctx->wait); @@ -855,12 +861,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, crypto_req_done, &acomp_ctx->wait); + mutex_unlock(&acomp_ctx->mutex); return 0; req_fail: crypto_free_acomp(acomp_ctx->acomp); acomp_fail: kfree(acomp_ctx->buffer); +buffer_fail: + mutex_unlock(&acomp_ctx->mutex); return ret; } @@ -869,17 +878,45 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node) struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node); struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu); + mutex_lock(&acomp_ctx->mutex); if (!IS_ERR_OR_NULL(acomp_ctx)) { if (!IS_ERR_OR_NULL(acomp_ctx->req)) acomp_request_free(acomp_ctx->req); + acomp_ctx->req = NULL; if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) crypto_free_acomp(acomp_ctx->acomp); kfree(acomp_ctx->buffer); } + mutex_unlock(&acomp_ctx->mutex); return 0; } +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool) +{ + struct crypto_acomp_ctx *acomp_ctx; + + for (;;) { + acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); + mutex_lock(&acomp_ctx->mutex); + if (likely(acomp_ctx->req)) + return acomp_ctx; + /* + * It is possible that we were migrated to a different CPU after + * getting the per-CPU ctx but before the mutex was acquired. If + * the old CPU got offlined, zswap_cpu_comp_dead() could have + * already freed ctx->req (among other things) and set it to + * NULL. Just try again on the new CPU that we ended up on. + */ + mutex_unlock(&acomp_ctx->mutex); + } +} + +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx) +{ + mutex_unlock(&acomp_ctx->mutex); +} + static bool zswap_compress(struct page *page, struct zswap_entry *entry, struct zswap_pool *pool) { @@ -893,10 +930,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, gfp_t gfp; u8 *dst; - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); - - mutex_lock(&acomp_ctx->mutex); - + acomp_ctx = acomp_ctx_get_cpu_lock(pool); dst = acomp_ctx->buffer; sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); @@ -949,7 +983,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, else if (alloc_ret) zswap_reject_alloc_fail++; - mutex_unlock(&acomp_ctx->mutex); + acomp_ctx_put_unlock(acomp_ctx); return comp_ret == 0 && alloc_ret == 0; } @@ -960,9 +994,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) struct crypto_acomp_ctx *acomp_ctx; u8 *src; - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(&acomp_ctx->mutex); - + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool); src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); /* * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer @@ -986,10 +1018,10 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); - mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); + acomp_ctx_put_unlock(acomp_ctx); } /*********************************