diff mbox series

zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

Message ID fae85e4440a8ef6f13192476bd33a4826416fc58.camel@gmx.de (mailing list archive)
State New, archived
Headers show
Series zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat | expand

Commit Message

Mike Galbraith Dec. 19, 2020, 10:04 a.m. UTC
Greetings,

Beating on zswap+zsmalloc leads to the splat below, possible patch
below that.

[  139.794413] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[  139.794585] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 907, name: avahi-daemon
[  139.794608] 2 locks held by avahi-daemon/907:
[  139.794621]  #0: ffff8881010766d8 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x15b/0x6e0
[  139.794659]  #1: ffff8881b6b13110 (&zspage->lock){.+.+}-{2:2}, at: zs_map_object+0x89/0x350
[  139.794691] Preemption disabled at:
[  139.794693] [<ffffffff812763a8>] zs_map_object+0x38/0x350
[  139.794719] CPU: 0 PID: 907 Comm: avahi-daemon Kdump: loaded Tainted: G            E     5.10.0.g3644e2d-preempt #4
[  139.794746] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  139.794766] Call Trace:
[  139.794775]  dump_stack+0x77/0x97
[  139.794788]  ___might_sleep+0x17d/0x260
[  139.794806]  __mutex_lock+0x47/0x9d0
[  139.794823]  ? zswap_frontswap_load+0xcb/0x240
[  139.794841]  ? zs_map_object+0x248/0x350
[  139.794858]  ? zswap_frontswap_load+0xcb/0x240
[  139.794871]  zswap_frontswap_load+0xcb/0x240
[  139.794886]  ? lru_cache_add+0xe9/0x1b0
[  139.794904]  __frontswap_load+0x6e/0xd0
[  139.794921]  swap_readpage+0x70/0x240
[  139.794939]  swap_cluster_readahead+0x1d9/0x280
[  139.794964]  ? swapin_readahead+0x140/0x3e0
[  139.794979]  swapin_readahead+0x140/0x3e0
[  139.794997]  ? lookup_swap_cache+0x5c/0x190
[  139.795016]  ? do_swap_page+0x2f7/0x900
[  139.795030]  do_swap_page+0x2f7/0x900
[  139.795046]  handle_mm_fault+0xa06/0x13b0
[  139.795083]  exc_page_fault+0x2a3/0x6e0
[  139.795110]  ? asm_exc_page_fault+0x1e/0x30
[  139.795139]  ? asm_exc_page_fault+0x8/0x30
[  139.795166]  asm_exc_page_fault+0x1e/0x30
[  139.795191] RIP: 0033:0x560caa4074d0
[  139.795215] Code: 89 05 e4 74 21 00 0f 84 28 06 00 00 e8 d9 13 00 00 e8 b4 12 00 00 44 8b 25 b9 75 21 00 45 85 e4 0f 85 ac 04 00 00 41 83 cf ff <48> 8b 3d b1 74 21 00 44 89 fe e8 d1 f6 ff ff 85 c0 89 c3 0f 88 d9
[  139.795299] RSP: 002b:00007ffeb4ec5ce0 EFLAGS: 00010246
[  139.795316] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007ff1616e4db4
[  139.795336] RDX: 0000000000000001 RSI: 00007ffeb4ec5c5f RDI: 0000000000000006
[  139.795356] RBP: 0000560cab63be60 R08: 0000560cab656220 R09: 0000560cab670750
[  139.795376] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  139.795395] R13: 0000560cab63f550 R14: 0000560cab63c000 R15: 00000000ffffffff

(CC Sebastian because RT has the ~same problem at the same spot,
curable the same way, or by fixing the upstream buglet then take
backports+fix, and nuking the associated RT patch)

mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

zsmalloc map/unmap methods use preemption disabling bit spinlocks.
Take the
mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
in zswap_frontswap_store().

Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration"
---
 mm/zswap.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Mike Galbraith Dec. 19, 2020, 10:12 a.m. UTC | #1
(mailer partially munged formatting? resend)

mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
in zswap_frontswap_store().

Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration"
---
 mm/zswap.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned

 	/* decompress */
 	dlen = PAGE_SIZE;
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
 	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 	if (zpool_evictable(entry->pool->zpool))
 		src += sizeof(struct zswap_header);

-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
 	sg_set_page(&output, page, PAGE_SIZE, 0);
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
 	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
-	mutex_unlock(acomp_ctx->mutex);

 	zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	mutex_unlock(acomp_ctx->mutex);
 	BUG_ON(ret);

 freeentry:
Vitaly Wool Dec. 19, 2020, 10:20 a.m. UTC | #2
Hi Mike,

On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote:
>
> (mailer partially munged formatting? resend)
>
> mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat
>
> zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
> mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
> in zswap_frontswap_store().

oh wait... So is zsmalloc taking a spin lock in its map callback and
releasing it only in unmap? In this case, I would rather keep zswap as
is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it.

Best regards,
   Vitaly

> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration"
> ---
>  mm/zswap.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned
>
>         /* decompress */
>         dlen = PAGE_SIZE;
> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> +       mutex_lock(acomp_ctx->mutex);
>         src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
>         if (zpool_evictable(entry->pool->zpool))
>                 src += sizeof(struct zswap_header);
>
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(acomp_ctx->mutex);
>         sg_init_one(&input, src, entry->length);
>         sg_init_table(&output, 1);
>         sg_set_page(&output, page, PAGE_SIZE, 0);
>         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
>         ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
> -       mutex_unlock(acomp_ctx->mutex);
>
>         zpool_unmap_handle(entry->pool->zpool, entry->handle);
> +       mutex_unlock(acomp_ctx->mutex);
>         BUG_ON(ret);
>
>  freeentry:
>
Mike Galbraith Dec. 19, 2020, 10:27 a.m. UTC | #3
On Sat, 2020-12-19 at 11:20 +0100, Vitaly Wool wrote:
> Hi Mike,
>
> On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote:
> >
> > (mailer partially munged formatting? resend)
> >
> > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat
> >
> > zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
> > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
> > in zswap_frontswap_store().
>
> oh wait... So is zsmalloc taking a spin lock in its map callback and
> releasing it only in unmap? In this case, I would rather keep zswap as
> is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it.

The kernel that generated that splat was NOT an RT kernel, it was plain
master.today with a PREEMPT config.

	-Mike
Vitaly Wool Dec. 19, 2020, 10:46 a.m. UTC | #4
On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote:
>
> On Sat, 2020-12-19 at 11:20 +0100, Vitaly Wool wrote:
> > Hi Mike,
> >
> > On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote:
> > >
> > > (mailer partially munged formatting? resend)
> > >
> > > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat
> > >
> > > zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
> > > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
> > > in zswap_frontswap_store().
> >
> > oh wait... So is zsmalloc taking a spin lock in its map callback and
> > releasing it only in unmap? In this case, I would rather keep zswap as
> > is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it.
>
> The kernel that generated that splat was NOT an RT kernel, it was plain
> master.today with a PREEMPT config.


I see, thanks. I don't think it makes things better for zsmalloc
though. From what I can see, the offending code is this:

>        /* From now on, migration cannot move the object */
>        pin_tag(handle);

Bit spinlock is taken in pin_tag(). I find the comment above somewhat
misleading, why is it necessary to take a spinlock to prevent
migration? I would guess an atomic flag should normally be enough.

zswap is not broken here, it is zsmalloc that needs to be fixed.

Best regards,
   Vitaly
Mike Galbraith Dec. 19, 2020, 10:59 a.m. UTC | #5
On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote:
> On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote:
>
> > The kernel that generated that splat was NOT an RT kernel, it was plain
> > master.today with a PREEMPT config.
>
>
> I see, thanks. I don't think it makes things better for zsmalloc
> though. From what I can see, the offending code is this:
>
> >        /* From now on, migration cannot move the object */
> >        pin_tag(handle);
>
> Bit spinlock is taken in pin_tag(). I find the comment above somewhat
> misleading, why is it necessary to take a spinlock to prevent
> migration? I would guess an atomic flag should normally be enough.
>
> zswap is not broken here, it is zsmalloc that needs to be fixed.

Cool, those damn bit spinlocks going away would be a case of happiness
for RT as well :)

	-Mike
Mike Galbraith Dec. 19, 2020, 11:03 a.m. UTC | #6
(CC zsmalloc maintainers)

On Sat, 2020-12-19 at 11:59 +0100, Mike Galbraith wrote:
> On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote:
> > On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote:
> >
> > > The kernel that generated that splat was NOT an RT kernel, it was plain
> > > master.today with a PREEMPT config.
> >
> >
> > I see, thanks. I don't think it makes things better for zsmalloc
> > though. From what I can see, the offending code is this:
> >
> > >        /* From now on, migration cannot move the object */
> > >        pin_tag(handle);
> >
> > Bit spinlock is taken in pin_tag(). I find the comment above somewhat
> > misleading, why is it necessary to take a spinlock to prevent
> > migration? I would guess an atomic flag should normally be enough.
> >
> > zswap is not broken here, it is zsmalloc that needs to be fixed.
>
> Cool, those damn bit spinlocks going away would be a case of happiness
> for RT as well :)
>
> 	-Mike
diff mbox series

Patch

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1258,20 +1258,20 @@  static int zswap_frontswap_load(unsigned

 	/* decompress */
 	dlen = PAGE_SIZE;
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
 	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 	if (zpool_evictable(entry->pool->zpool))
 		src += sizeof(struct zswap_header);

-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
 	sg_set_page(&output, page, PAGE_SIZE, 0);
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
 	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
-	mutex_unlock(acomp_ctx->mutex);

 	zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	mutex_unlock(acomp_ctx->mutex);
 	BUG_ON(ret);

 freeentry: