diff mbox series

mm: zswap: properly synchronize freeing resources during CPU hotunplug

Message ID 20250108161529.3193825-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series mm: zswap: properly synchronize freeing resources during CPU hotunplug | expand

Commit Message

Yosry Ahmed Jan. 8, 2025, 4:15 p.m. UTC
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout.  However, since neither preemption nor migration are
disabled, it is possible that the operation continues on a different
CPU.

If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as some of the resources attached to the acomp_ctx
are freed during hotunplug in zswap_cpu_comp_dead().

The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
use crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure
the CPU cannot go away from under us.  Preemption cannot be disabled
with the crypto_acomp API as a sleepable context is needed.

During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
resources, and set acomp_ctx.req to NULL when it is freed. In the
compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.

This adds proper synchronization to ensure that the acomp_ctx resources
are not freed from under compress/decompress paths.

Note that the per-CPU acomp_ctx itself (including the mutex) is not
freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
after it is hotunplugged.

Previously a fix was attempted by holding cpus_read_lock() [1]. This
would have caused a potential deadlock as it is possible for code
already holding the lock to fall into reclaim and enter zswap (causing a
deadlock). A fix was also attempted using SRCU for synchronization, but
Johannes pointed out that synchronize_srcu() cannot be used in CPU
hotplug notifiers [2].

Alternative fixes that were considered/attempted and could have worked:
- Refcounting the per-CPU acomp_ctx. This involves complexity in
  handling the race between the refcount dropping to zero in
  zswap_[de]compress() and the refcount being re-initialized when the
  CPU is onlined.
- Disabling migration before getting the per-CPU acomp_ctx [3], but
  that's discouraged and is a much bigger hammer than needed, and could
  result in subtle performance issues.

[1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@google.com/
[2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@google.com/
[3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@google.com/

Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/
---

This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
"mm: zswap: fix race between [de]compression and CPU hotunplug"' and
after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
dropped.

---
 mm/zswap.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

Comments

Sridhar, Kanchana P Jan. 8, 2025, 8:23 p.m. UTC | #1
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Wednesday, January 8, 2025 8:15 AM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>; Nhat Pham
> <nphamcs@gmail.com>; Chengming Zhou <chengming.zhou@linux.dev>;
> Vitaly Wool <vitalywool@gmail.com>; Barry Song <baohua@kernel.org>; Sam
> Sun <samsun1006219@gmail.com>; Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com>; linux-mm@kvack.org; linux-
> kernel@vger.kernel.org; Yosry Ahmed <yosryahmed@google.com>;
> stable@vger.kernel.org
> Subject: [PATCH] mm: zswap: properly synchronize freeing resources during
> CPU hotunplug
> 
> In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> the
> current CPU at the beginning of the operation is retrieved and used
> throughout.  However, since neither preemption nor migration are
> disabled, it is possible that the operation continues on a different
> CPU.
> 
> If the original CPU is hotunplugged while the acomp_ctx is still in use,
> we run into a UAF bug as some of the resources attached to the acomp_ctx
> are freed during hotunplug in zswap_cpu_comp_dead().
> 
> The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> use crypto_acomp API for hardware acceleration") when the switch to the
> crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
> retrieved using get_cpu_ptr() which disables preemption and makes sure
> the CPU cannot go away from under us.  Preemption cannot be disabled
> with the crypto_acomp API as a sleepable context is needed.
> 
> During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
> resources, and set acomp_ctx.req to NULL when it is freed. In the
> compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
> that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
> by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
> 
> This adds proper synchronization to ensure that the acomp_ctx resources
> are not freed from under compress/decompress paths.
> 
> Note that the per-CPU acomp_ctx itself (including the mutex) is not
> freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
> acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
> after it is hotunplugged.

Only other fail-proofing I can think of is to initialize the mutex right after
the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple
it from the cpu onlining. This further clarifies the intent for this mutex
to be used at the same lifetime scope as the acomp_ctx itself, independent
of cpu hotplug/hotunplug.

Thanks,
Kanchana

> 
> Previously a fix was attempted by holding cpus_read_lock() [1]. This
> would have caused a potential deadlock as it is possible for code
> already holding the lock to fall into reclaim and enter zswap (causing a
> deadlock). A fix was also attempted using SRCU for synchronization, but
> Johannes pointed out that synchronize_srcu() cannot be used in CPU
> hotplug notifiers [2].
> 
> Alternative fixes that were considered/attempted and could have worked:
> - Refcounting the per-CPU acomp_ctx. This involves complexity in
>   handling the race between the refcount dropping to zero in
>   zswap_[de]compress() and the refcount being re-initialized when the
>   CPU is onlined.
> - Disabling migration before getting the per-CPU acomp_ctx [3], but
>   that's discouraged and is a much bigger hammer than needed, and could
>   result in subtle performance issues.
> 
> [1]https://lkml.kernel.org/20241219212437.2714151-1-
> yosryahmed@google.com/
> [2]https://lkml.kernel.org/20250107074724.1756696-2-
> yosryahmed@google.com/
> [3]https://lkml.kernel.org/20250107222236.2715883-2-
> yosryahmed@google.com/
> 
> Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
> hardware acceleration")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Closes:
> https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
> Reported-by: Sam Sun <samsun1006219@gmail.com>
> Closes:
> https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
> curuL4tPg6OaQ@mail.gmail.com/
> ---
> 
> This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
> "mm: zswap: fix race between [de]compression and CPU hotunplug"' and
> after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
> dropped.
> 
> ---
>  mm/zswap.c | 42 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb236..4e3148050e093 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu,
> struct hlist_node *node)
>  	struct zswap_pool *pool = hlist_entry(node, struct zswap_pool,
> node);
>  	struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool-
> >acomp_ctx, cpu);
> 
> +	mutex_lock(&acomp_ctx->mutex);
>  	if (!IS_ERR_OR_NULL(acomp_ctx)) {
>  		if (!IS_ERR_OR_NULL(acomp_ctx->req))
>  			acomp_request_free(acomp_ctx->req);
> +		acomp_ctx->req = NULL;
>  		if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
>  			crypto_free_acomp(acomp_ctx->acomp);
>  		kfree(acomp_ctx->buffer);
>  	}
> +	mutex_unlock(&acomp_ctx->mutex);
> 
>  	return 0;
>  }
> 
> +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(
> +		struct crypto_acomp_ctx __percpu *acomp_ctx)
> +{
> +	struct crypto_acomp_ctx *ctx;
> +
> +	for (;;) {
> +		ctx = raw_cpu_ptr(acomp_ctx);
> +		mutex_lock(&ctx->mutex);
> +		if (likely(ctx->req))
> +			return ctx;
> +		/*
> +		 * It is possible that we were migrated to a different CPU
> after
> +		 * getting the per-CPU ctx but before the mutex was
> acquired. If
> +		 * the old CPU got offlined, zswap_cpu_comp_dead() could
> have
> +		 * already freed ctx->req (among other things) and set it to
> +		 * NULL. Just try again on the new CPU that we ended up on.
> +		 */
> +		mutex_unlock(&ctx->mutex);
> +	}
> +}
> +
> +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx)
> +{
> +	mutex_unlock(&ctx->mutex);
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  			   struct zswap_pool *pool)
>  {
> @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page,
> struct zswap_entry *entry,
>  	gfp_t gfp;
>  	u8 *dst;
> 
> -	acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> -
> -	mutex_lock(&acomp_ctx->mutex);
> -
> +	acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx);
>  	dst = acomp_ctx->buffer;
>  	sg_init_table(&input, 1);
>  	sg_set_page(&input, page, PAGE_SIZE, 0);
> @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct
> zswap_entry *entry,
>  	else if (alloc_ret)
>  		zswap_reject_alloc_fail++;
> 
> -	mutex_unlock(&acomp_ctx->mutex);
> +	acomp_ctx_put_unlock(acomp_ctx);
>  	return comp_ret == 0 && alloc_ret == 0;
>  }
> 
> @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry
> *entry, struct folio *folio)
>  	struct crypto_acomp_ctx *acomp_ctx;
>  	u8 *src;
> 
> -	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -	mutex_lock(&acomp_ctx->mutex);
> -
> +	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx);
>  	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>  	/*
>  	 * If zpool_map_handle is atomic, we cannot reliably utilize its
> mapped buffer
> @@ -986,10 +1010,10 @@ static void zswap_decompress(struct
> zswap_entry *entry, struct folio *folio)
>  	acomp_request_set_params(acomp_ctx->req, &input, &output,
> entry->length, PAGE_SIZE);
>  	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> >req), &acomp_ctx->wait));
>  	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> -	mutex_unlock(&acomp_ctx->mutex);
> 
>  	if (src != acomp_ctx->buffer)
>  		zpool_unmap_handle(zpool, entry->handle);
> +	acomp_ctx_put_unlock(acomp_ctx);
>  }
> 
>  /*********************************
> --
> 2.47.1.613.gc27f4b7a9f-goog
Yosry Ahmed Jan. 8, 2025, 8:50 p.m. UTC | #2
On Wed, Jan 8, 2025 at 12:23 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Wednesday, January 8, 2025 8:15 AM
> > To: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>; Nhat Pham
> > <nphamcs@gmail.com>; Chengming Zhou <chengming.zhou@linux.dev>;
> > Vitaly Wool <vitalywool@gmail.com>; Barry Song <baohua@kernel.org>; Sam
> > Sun <samsun1006219@gmail.com>; Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com>; linux-mm@kvack.org; linux-
> > kernel@vger.kernel.org; Yosry Ahmed <yosryahmed@google.com>;
> > stable@vger.kernel.org
> > Subject: [PATCH] mm: zswap: properly synchronize freeing resources during
> > CPU hotunplug
> >
> > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> > the
> > current CPU at the beginning of the operation is retrieved and used
> > throughout.  However, since neither preemption nor migration are
> > disabled, it is possible that the operation continues on a different
> > CPU.
> >
> > If the original CPU is hotunplugged while the acomp_ctx is still in use,
> > we run into a UAF bug as some of the resources attached to the acomp_ctx
> > are freed during hotunplug in zswap_cpu_comp_dead().
> >
> > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> > use crypto_acomp API for hardware acceleration") when the switch to the
> > crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
> > retrieved using get_cpu_ptr() which disables preemption and makes sure
> > the CPU cannot go away from under us.  Preemption cannot be disabled
> > with the crypto_acomp API as a sleepable context is needed.
> >
> > During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
> > resources, and set acomp_ctx.req to NULL when it is freed. In the
> > compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
> > that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
> > by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
> >
> > This adds proper synchronization to ensure that the acomp_ctx resources
> > are not freed from under compress/decompress paths.
> >
> > Note that the per-CPU acomp_ctx itself (including the mutex) is not
> > freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
> > acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
> > after it is hotunplugged.
>
> Only other fail-proofing I can think of is to initialize the mutex right after
> the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple
> it from the cpu onlining. This further clarifies the intent for this mutex
> to be used at the same lifetime scope as the acomp_ctx itself, independent
> of cpu hotplug/hotunplug.

I mentioned doing this initially then dismissed it as a readability
improvement. However, I think it's actually required for correctness.
It's possible that the CPU becomes online again after
acomp_ctx_get_cpu_lock() decides to retry but before it unlocks the
mutex, in which case the CPU being onlined will reinitialize an
already locked mutex.

I will add that and send a v2 shortly.
Barry Song Jan. 8, 2025, 9:38 p.m. UTC | #3
On Thu, Jan 9, 2025 at 9:23 AM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Wednesday, January 8, 2025 8:15 AM
> > To: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>; Nhat Pham
> > <nphamcs@gmail.com>; Chengming Zhou <chengming.zhou@linux.dev>;
> > Vitaly Wool <vitalywool@gmail.com>; Barry Song <baohua@kernel.org>; Sam
> > Sun <samsun1006219@gmail.com>; Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com>; linux-mm@kvack.org; linux-
> > kernel@vger.kernel.org; Yosry Ahmed <yosryahmed@google.com>;
> > stable@vger.kernel.org
> > Subject: [PATCH] mm: zswap: properly synchronize freeing resources during
> > CPU hotunplug
> >
> > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> > the
> > current CPU at the beginning of the operation is retrieved and used
> > throughout.  However, since neither preemption nor migration are
> > disabled, it is possible that the operation continues on a different
> > CPU.
> >
> > If the original CPU is hotunplugged while the acomp_ctx is still in use,
> > we run into a UAF bug as some of the resources attached to the acomp_ctx
> > are freed during hotunplug in zswap_cpu_comp_dead().
> >
> > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> > use crypto_acomp API for hardware acceleration") when the switch to the
> > crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
> > retrieved using get_cpu_ptr() which disables preemption and makes sure
> > the CPU cannot go away from under us.  Preemption cannot be disabled
> > with the crypto_acomp API as a sleepable context is needed.
> >
> > During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
> > resources, and set acomp_ctx.req to NULL when it is freed. In the
> > compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
> > that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
> > by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
> >
> > This adds proper synchronization to ensure that the acomp_ctx resources
> > are not freed from under compress/decompress paths.
> >
> > Note that the per-CPU acomp_ctx itself (including the mutex) is not
> > freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
> > acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
> > after it is hotunplugged.
>
> Only other fail-proofing I can think of is to initialize the mutex right after
> the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple
> it from the cpu onlining. This further clarifies the intent for this mutex
> to be used at the same lifetime scope as the acomp_ctx itself, independent
> of cpu hotplug/hotunplug.

Good catch! That step should have been executed immediately after
calling alloc_percpu(). Initially, the mutex was dynamically
allocated and initialized in zswap_cpu_comp_prepare(). Later, it
was moved to the context and allocated statically. It would be
better to relocate the mutex_init() accordingly.

>
> Thanks,
> Kanchana
>
> >
> > Previously a fix was attempted by holding cpus_read_lock() [1]. This
> > would have caused a potential deadlock as it is possible for code
> > already holding the lock to fall into reclaim and enter zswap (causing a
> > deadlock). A fix was also attempted using SRCU for synchronization, but
> > Johannes pointed out that synchronize_srcu() cannot be used in CPU
> > hotplug notifiers [2].
> >
> > Alternative fixes that were considered/attempted and could have worked:
> > - Refcounting the per-CPU acomp_ctx. This involves complexity in
> >   handling the race between the refcount dropping to zero in
> >   zswap_[de]compress() and the refcount being re-initialized when the
> >   CPU is onlined.
> > - Disabling migration before getting the per-CPU acomp_ctx [3], but
> >   that's discouraged and is a much bigger hammer than needed, and could
> >   result in subtle performance issues.
> >
> > [1]https://lkml.kernel.org/20241219212437.2714151-1-
> > yosryahmed@google.com/
> > [2]https://lkml.kernel.org/20250107074724.1756696-2-
> > yosryahmed@google.com/
> > [3]https://lkml.kernel.org/20250107222236.2715883-2-
> > yosryahmed@google.com/
> >
> > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
> > hardware acceleration")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> > Closes:
> > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
> > Reported-by: Sam Sun <samsun1006219@gmail.com>
> > Closes:
> > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
> > curuL4tPg6OaQ@mail.gmail.com/
> > ---
> >
> > This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
> > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and
> > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
> > dropped.
> >
> > ---
> >  mm/zswap.c | 42 +++++++++++++++++++++++++++++++++---------
> >  1 file changed, 33 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb236..4e3148050e093 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu,
> > struct hlist_node *node)
> >       struct zswap_pool *pool = hlist_entry(node, struct zswap_pool,
> > node);
> >       struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool-
> > >acomp_ctx, cpu);
> >
> > +     mutex_lock(&acomp_ctx->mutex);
> >       if (!IS_ERR_OR_NULL(acomp_ctx)) {
> >               if (!IS_ERR_OR_NULL(acomp_ctx->req))
> >                       acomp_request_free(acomp_ctx->req);
> > +             acomp_ctx->req = NULL;
> >               if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
> >                       crypto_free_acomp(acomp_ctx->acomp);
> >               kfree(acomp_ctx->buffer);
> >       }
> > +     mutex_unlock(&acomp_ctx->mutex);
> >
> >       return 0;
> >  }
> >
> > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(
> > +             struct crypto_acomp_ctx __percpu *acomp_ctx)
> > +{
> > +     struct crypto_acomp_ctx *ctx;
> > +
> > +     for (;;) {
> > +             ctx = raw_cpu_ptr(acomp_ctx);
> > +             mutex_lock(&ctx->mutex);
> > +             if (likely(ctx->req))
> > +                     return ctx;
> > +             /*
> > +              * It is possible that we were migrated to a different CPU
> > after
> > +              * getting the per-CPU ctx but before the mutex was
> > acquired. If
> > +              * the old CPU got offlined, zswap_cpu_comp_dead() could
> > have
> > +              * already freed ctx->req (among other things) and set it to
> > +              * NULL. Just try again on the new CPU that we ended up on.
> > +              */
> > +             mutex_unlock(&ctx->mutex);
> > +     }
> > +}
> > +
> > +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx)
> > +{
> > +     mutex_unlock(&ctx->mutex);
> > +}
> > +
> >  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >                          struct zswap_pool *pool)
> >  {
> > @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page,
> > struct zswap_entry *entry,
> >       gfp_t gfp;
> >       u8 *dst;
> >
> > -     acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> > -
> > -     mutex_lock(&acomp_ctx->mutex);
> > -
> > +     acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx);
> >       dst = acomp_ctx->buffer;
> >       sg_init_table(&input, 1);
> >       sg_set_page(&input, page, PAGE_SIZE, 0);
> > @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct
> > zswap_entry *entry,
> >       else if (alloc_ret)
> >               zswap_reject_alloc_fail++;
> >
> > -     mutex_unlock(&acomp_ctx->mutex);
> > +     acomp_ctx_put_unlock(acomp_ctx);
> >       return comp_ret == 0 && alloc_ret == 0;
> >  }
> >
> > @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry
> > *entry, struct folio *folio)
> >       struct crypto_acomp_ctx *acomp_ctx;
> >       u8 *src;
> >
> > -     acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > -     mutex_lock(&acomp_ctx->mutex);
> > -
> > +     acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx);
> >       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> >       /*
> >        * If zpool_map_handle is atomic, we cannot reliably utilize its
> > mapped buffer
> > @@ -986,10 +1010,10 @@ static void zswap_decompress(struct
> > zswap_entry *entry, struct folio *folio)
> >       acomp_request_set_params(acomp_ctx->req, &input, &output,
> > entry->length, PAGE_SIZE);
> >       BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > >req), &acomp_ctx->wait));
> >       BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > -     mutex_unlock(&acomp_ctx->mutex);
> >
> >       if (src != acomp_ctx->buffer)
> >               zpool_unmap_handle(zpool, entry->handle);
> > +     acomp_ctx_put_unlock(acomp_ctx);
> >  }
> >
> >  /*********************************
> > --
> > 2.47.1.613.gc27f4b7a9f-goog
>

Thanks
Barry
diff mbox series

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb236..4e3148050e093 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -869,17 +869,46 @@  static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 	struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
 	struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
 
+	mutex_lock(&acomp_ctx->mutex);
 	if (!IS_ERR_OR_NULL(acomp_ctx)) {
 		if (!IS_ERR_OR_NULL(acomp_ctx->req))
 			acomp_request_free(acomp_ctx->req);
+		acomp_ctx->req = NULL;
 		if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
 			crypto_free_acomp(acomp_ctx->acomp);
 		kfree(acomp_ctx->buffer);
 	}
+	mutex_unlock(&acomp_ctx->mutex);
 
 	return 0;
 }
 
+static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(
+		struct crypto_acomp_ctx __percpu *acomp_ctx)
+{
+	struct crypto_acomp_ctx *ctx;
+
+	for (;;) {
+		ctx = raw_cpu_ptr(acomp_ctx);
+		mutex_lock(&ctx->mutex);
+		if (likely(ctx->req))
+			return ctx;
+		/*
+		 * It is possible that we were migrated to a different CPU after
+		 * getting the per-CPU ctx but before the mutex was acquired. If
+		 * the old CPU got offlined, zswap_cpu_comp_dead() could have
+		 * already freed ctx->req (among other things) and set it to
+		 * NULL. Just try again on the new CPU that we ended up on.
+		 */
+		mutex_unlock(&ctx->mutex);
+	}
+}
+
+static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx)
+{
+	mutex_unlock(&ctx->mutex);
+}
+
 static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 			   struct zswap_pool *pool)
 {
@@ -893,10 +922,7 @@  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	gfp_t gfp;
 	u8 *dst;
 
-	acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
-
-	mutex_lock(&acomp_ctx->mutex);
-
+	acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx);
 	dst = acomp_ctx->buffer;
 	sg_init_table(&input, 1);
 	sg_set_page(&input, page, PAGE_SIZE, 0);
@@ -949,7 +975,7 @@  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	else if (alloc_ret)
 		zswap_reject_alloc_fail++;
 
-	mutex_unlock(&acomp_ctx->mutex);
+	acomp_ctx_put_unlock(acomp_ctx);
 	return comp_ret == 0 && alloc_ret == 0;
 }
 
@@ -960,9 +986,7 @@  static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	struct crypto_acomp_ctx *acomp_ctx;
 	u8 *src;
 
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(&acomp_ctx->mutex);
-
+	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx);
 	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
 	/*
 	 * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer
@@ -986,10 +1010,10 @@  static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
 	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
 	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
-	mutex_unlock(&acomp_ctx->mutex);
 
 	if (src != acomp_ctx->buffer)
 		zpool_unmap_handle(zpool, entry->handle);
+	acomp_ctx_put_unlock(acomp_ctx);
 }
 
 /*********************************