diff mbox series

[v2] nfs: fix panic when nfs4_ff_layout_prepare_ds() fails

Message ID 5a244749e5bfefd921cb296c9e7b16fe4990f440.1710169883.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series [v2] nfs: fix panic when nfs4_ff_layout_prepare_ds() fails | expand

Commit Message

Josef Bacik March 11, 2024, 3:11 p.m. UTC
We've been seeing the following panic in production

BUG: kernel NULL pointer dereference, address: 0000000000000065
PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0
RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
Call Trace:
 <TASK>
 ? __die+0x78/0xc0
 ? page_fault_oops+0x286/0x380
 ? __rpc_execute+0x2c3/0x470 [sunrpc]
 ? rpc_new_task+0x42/0x1c0 [sunrpc]
 ? exc_page_fault+0x5d/0x110
 ? asm_exc_page_fault+0x22/0x30
 ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles]
 ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
 ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles]
 pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4]
 pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4]
 ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles]
 nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles]
 ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
 __nfs_pageio_add_request+0x154/0x6c0 [nfs]
 nfs_pageio_add_request+0x26b/0x380 [nfs]
 nfs_do_writepage+0x111/0x1e0 [nfs]
 nfs_writepages_callback+0xf/0x30 [nfs]
 write_cache_pages+0x17f/0x380
 ? nfs_pageio_init_write+0x50/0x50 [nfs]
 ? nfs_writepages+0x6d/0x210 [nfs]
 ? nfs_writepages+0x6d/0x210 [nfs]
 nfs_writepages+0x125/0x210 [nfs]
 do_writepages+0x67/0x220
 ? generic_perform_write+0x14b/0x210
 filemap_fdatawrite_wbc+0x5b/0x80
 file_write_and_wait_range+0x6d/0xc0
 nfs_file_fsync+0x81/0x170 [nfs]
 ? nfs_file_mmap+0x60/0x60 [nfs]
 __x64_sys_fsync+0x53/0x90
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Inspecting the core with drgn I was able to pull this

  >>> prog.crashed_thread().stack_trace()[0]
  #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27
  >>> prog.crashed_thread().stack_trace()[0]['idx']
  (u32)1
  >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds
  (struct nfs4_ff_layout_ds *)0xffffffffffffffed

This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds()
which could error out initializing the mirror_ds, and then we go to
clean it all up and our check is only for if (!mirror->mirror_ds).  This
is inconsistent with the rest of the users of mirror_ds, which have

  if (IS_ERR_OR_NULL(mirror_ds))

to keep from tripping over this exact scenario.  Fix this up in
ff_layout_cancel_io() to make sure we don't panic when we get an error.
I also spot checked all the other instances of checking mirror_ds and we
appear to be doing the correct checks everywhere, only unconditionally
dereferencing mirror_ds when we know it would be valid.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
v1->v2:
- My bad, I missed the formatting of the drgn output and it looked mangled.

 fs/nfs/flexfilelayout/flexfilelayout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeffrey Layton March 11, 2024, 3:17 p.m. UTC | #1
On Mon, 2024-03-11 at 11:11 -0400, Josef Bacik wrote:
> We've been seeing the following panic in production
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000065
> PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0
> RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
> Call Trace:
>  <TASK>
>  ? __die+0x78/0xc0
>  ? page_fault_oops+0x286/0x380
>  ? __rpc_execute+0x2c3/0x470 [sunrpc]
>  ? rpc_new_task+0x42/0x1c0 [sunrpc]
>  ? exc_page_fault+0x5d/0x110
>  ? asm_exc_page_fault+0x22/0x30
>  ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles]
>  ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
>  ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles]
>  pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4]
>  pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4]
>  ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles]
>  nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles]
>  ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
>  __nfs_pageio_add_request+0x154/0x6c0 [nfs]
>  nfs_pageio_add_request+0x26b/0x380 [nfs]
>  nfs_do_writepage+0x111/0x1e0 [nfs]
>  nfs_writepages_callback+0xf/0x30 [nfs]
>  write_cache_pages+0x17f/0x380
>  ? nfs_pageio_init_write+0x50/0x50 [nfs]
>  ? nfs_writepages+0x6d/0x210 [nfs]
>  ? nfs_writepages+0x6d/0x210 [nfs]
>  nfs_writepages+0x125/0x210 [nfs]
>  do_writepages+0x67/0x220
>  ? generic_perform_write+0x14b/0x210
>  filemap_fdatawrite_wbc+0x5b/0x80
>  file_write_and_wait_range+0x6d/0xc0
>  nfs_file_fsync+0x81/0x170 [nfs]
>  ? nfs_file_mmap+0x60/0x60 [nfs]
>  __x64_sys_fsync+0x53/0x90
>  do_syscall_64+0x3d/0x90
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Inspecting the core with drgn I was able to pull this
> 
>   >>> prog.crashed_thread().stack_trace()[0]
>   #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27
>   >>> prog.crashed_thread().stack_trace()[0]['idx']
>   (u32)1
>   >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds
>   (struct nfs4_ff_layout_ds *)0xffffffffffffffed
> 
> This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds()
> which could error out initializing the mirror_ds, and then we go to
> clean it all up and our check is only for if (!mirror->mirror_ds).  This
> is inconsistent with the rest of the users of mirror_ds, which have
> 
>   if (IS_ERR_OR_NULL(mirror_ds))
> 
> to keep from tripping over this exact scenario.  Fix this up in
> ff_layout_cancel_io() to make sure we don't panic when we get an error.
> I also spot checked all the other instances of checking mirror_ds and we
> appear to be doing the correct checks everywhere, only unconditionally
> dereferencing mirror_ds when we know it would be valid.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> v1->v2:
> - My bad, I missed the formatting of the drgn output and it looked mangled.
> 
>  fs/nfs/flexfilelayout/flexfilelayout.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> index ef817a0475ff..3e724cb7ef01 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -2016,7 +2016,7 @@ static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
>  	for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
>  		mirror = flseg->mirror_array[idx];
>  		mirror_ds = mirror->mirror_ds;
> -		if (!mirror_ds)
> +		if (IS_ERR_OR_NULL(mirror_ds))
>  			continue;
>  		ds = mirror->mirror_ds->ds;
>  		if (!ds)

Nice catch:

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Trond Myklebust March 11, 2024, 4:17 p.m. UTC | #2
On Mon, 2024-03-11 at 11:11 -0400, Josef Bacik wrote:
> We've been seeing the following panic in production
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000065
> PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0
> RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
> Call Trace:
>  <TASK>
>  ? __die+0x78/0xc0
>  ? page_fault_oops+0x286/0x380
>  ? __rpc_execute+0x2c3/0x470 [sunrpc]
>  ? rpc_new_task+0x42/0x1c0 [sunrpc]
>  ? exc_page_fault+0x5d/0x110
>  ? asm_exc_page_fault+0x22/0x30
>  ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles]
>  ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
>  ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles]
>  pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4]
>  pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4]
>  ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles]
>  nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles]
>  ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
>  __nfs_pageio_add_request+0x154/0x6c0 [nfs]
>  nfs_pageio_add_request+0x26b/0x380 [nfs]
>  nfs_do_writepage+0x111/0x1e0 [nfs]
>  nfs_writepages_callback+0xf/0x30 [nfs]
>  write_cache_pages+0x17f/0x380
>  ? nfs_pageio_init_write+0x50/0x50 [nfs]
>  ? nfs_writepages+0x6d/0x210 [nfs]
>  ? nfs_writepages+0x6d/0x210 [nfs]
>  nfs_writepages+0x125/0x210 [nfs]
>  do_writepages+0x67/0x220
>  ? generic_perform_write+0x14b/0x210
>  filemap_fdatawrite_wbc+0x5b/0x80
>  file_write_and_wait_range+0x6d/0xc0
>  nfs_file_fsync+0x81/0x170 [nfs]
>  ? nfs_file_mmap+0x60/0x60 [nfs]
>  __x64_sys_fsync+0x53/0x90
>  do_syscall_64+0x3d/0x90
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Inspecting the core with drgn I was able to pull this
> 
>   >>> prog.crashed_thread().stack_trace()[0]
>   #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in
> ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27
>   >>> prog.crashed_thread().stack_trace()[0]['idx']
>   (u32)1
>   >>>
> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirro
> r_ds
>   (struct nfs4_ff_layout_ds *)0xffffffffffffffed
> 
> This is clear from the stack trace, we call
> nfs4_ff_layout_prepare_ds()
> which could error out initializing the mirror_ds, and then we go to
> clean it all up and our check is only for if (!mirror->mirror_ds). 
> This
> is inconsistent with the rest of the users of mirror_ds, which have
> 
>   if (IS_ERR_OR_NULL(mirror_ds))
> 
> to keep from tripping over this exact scenario.  Fix this up in
> ff_layout_cancel_io() to make sure we don't panic when we get an
> error.
> I also spot checked all the other instances of checking mirror_ds and
> we
> appear to be doing the correct checks everywhere, only
> unconditionally
> dereferencing mirror_ds when we know it would be valid.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> v1->v2:
> - My bad, I missed the formatting of the drgn output and it looked
> mangled.
> 
>  fs/nfs/flexfilelayout/flexfilelayout.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c
> b/fs/nfs/flexfilelayout/flexfilelayout.c
> index ef817a0475ff..3e724cb7ef01 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -2016,7 +2016,7 @@ static void ff_layout_cancel_io(struct
> pnfs_layout_segment *lseg)
>  	for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
>  		mirror = flseg->mirror_array[idx];
>  		mirror_ds = mirror->mirror_ds;
> -		if (!mirror_ds)
> +		if (IS_ERR_OR_NULL(mirror_ds))
>  			continue;
>  		ds = mirror->mirror_ds->ds;
>  		if (!ds)

That is truly weird... I have the above correct in the original patch,
which is still in the Hammerspace repo. However the port to upstream
appears to have introduced the bug.

Thanks so much for catching this, Josef!
diff mbox series

Patch

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index ef817a0475ff..3e724cb7ef01 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2016,7 +2016,7 @@  static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
 	for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
 		mirror = flseg->mirror_array[idx];
 		mirror_ds = mirror->mirror_ds;
-		if (!mirror_ds)
+		if (IS_ERR_OR_NULL(mirror_ds))
 			continue;
 		ds = mirror->mirror_ds->ds;
 		if (!ds)