diff mbox series

[1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof

Message ID 168263573987.1717721.305790819127740342.stgit@frogsfrogsfrogs (mailing list archive)
State Superseded, archived
Headers show
Series xfs: bug fixes for 6.4-rc1 | expand

Commit Message

Darrick J. Wong April 27, 2023, 10:48 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

xfs/170 on a filesystem with su=128k,sw=4 produces this splat:

BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP
CPU: 1 PID: 4022907 Comm: dd Tainted: G        W          6.3.0-xfsx #2 6ebeeffbe9577d32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu
RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs]
RSP: 0018:ffffc90001e43858 EFLAGS: 00010217
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000
RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037
R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000
R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000
FS:  00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0
Call Trace:
 <TASK>
 xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 iomap_iter+0x132/0x2f0
 __iomap_dio_rw+0x2f8/0x840
 iomap_dio_rw+0xe/0x30
 xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
 vfs_write+0x2eb/0x410
 ksys_write+0x65/0xe0
 do_syscall_64+0x2b/0x80

This crash occurs under the "out_low_space" label.  We grabbed a perag
reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and
afterwards args->pag is NULL.  Fix the second function not to clobber
args->pag if the caller had passed one in.

Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Dave Chinner April 28, 2023, 1:49 a.m. UTC | #1
On Thu, Apr 27, 2023 at 03:48:59PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs/170 on a filesystem with su=128k,sw=4 produces this splat:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000010
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP
> CPU: 1 PID: 4022907 Comm: dd Tainted: G        W          6.3.0-xfsx #2 6ebeeffbe9577d32
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu
> RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs]
> RSP: 0018:ffffc90001e43858 EFLAGS: 00010217
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
> RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000
> RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037
> R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000
> R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000
> FS:  00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0
> Call Trace:
>  <TASK>
>  xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  iomap_iter+0x132/0x2f0
>  __iomap_dio_rw+0x2f8/0x840
>  iomap_dio_rw+0xe/0x30
>  xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  vfs_write+0x2eb/0x410
>  ksys_write+0x65/0xe0
>  do_syscall_64+0x2b/0x80
> 
> This crash occurs under the "out_low_space" label.  We grabbed a perag
> reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and
> afterwards args->pag is NULL.  Fix the second function not to clobber
> args->pag if the caller had passed one in.
> 
> Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index b512de0540d5..cd8870a16fd1 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3494,8 +3494,10 @@ xfs_bmap_btalloc_at_eof(
>  		if (!caller_pag)
>  			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
>  		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		if (!caller_pag)
> +		if (!caller_pag) {
>  			xfs_perag_put(args->pag);
> +			args->pag = NULL;
> +		}
>  		if (error)
>  			return error;
>  
> @@ -3505,7 +3507,6 @@ xfs_bmap_btalloc_at_eof(
>  		 * Exact allocation failed. Reset to try an aligned allocation
>  		 * according to the original allocation specification.
>  		 */
> -		args->pag = NULL;
>  		args->alignment = stripe_align;
>  		args->minlen = nextminlen;
>  		args->minalignslop = 0;

Yup, that'll fix the problem.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

FWIW, I'm working on some patches to take this further by always
providing a caller perag to this function.  That gets rid of all the
conditional code here and it gets rid of the only case where we call
xfs_alloc_vextent_near_bno() without args->pag set so the
conditional caller_pag code goes from there, too. This makes
xfs_alloc_vextent_{near,exact}_bno() identical except for one line
so they can be collapsed. And now that we always specify args->pag
for these functions, we can ignore the agno part of the target block
meaning we can select the perag to allocate from at a much higher
level via setting up args->pag when we initially scan the AGs to
set up args->minlen/maxlen based on the the nearest AG with 
contiguous free space large enough for the whole allocation. This
avoids attempting allocations that are guaranteed to fail before
we fall back to iterating from the target block agno and eventually
finding the same AG we originally found that had enough contiguous
free space in it for a maxlen allocation....

I think I can take it further an make both filestreams and the
normal allocator use the same "this ag" algorithm for EOF and
aligned allocation and then have them both fall back to the same
unaligned allocation attempts, which will then allow a bunch of code
to be collapsed in the xfs_bmap_btalloc() path...

-Dave.
Darrick J. Wong April 28, 2023, 4:30 a.m. UTC | #2
On Fri, Apr 28, 2023 at 11:49:28AM +1000, Dave Chinner wrote:
> On Thu, Apr 27, 2023 at 03:48:59PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > xfs/170 on a filesystem with su=128k,sw=4 produces this splat:
> > 
> > BUG: kernel NULL pointer dereference, address: 0000000000000010
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 0 P4D 0
> > Oops: 0002 [#1] PREEMPT SMP
> > CPU: 1 PID: 4022907 Comm: dd Tainted: G        W          6.3.0-xfsx #2 6ebeeffbe9577d32
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu
> > RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs]
> > RSP: 0018:ffffc90001e43858 EFLAGS: 00010217
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
> > RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000
> > RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037
> > R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000
> > R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000
> > FS:  00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0
> > Call Trace:
> >  <TASK>
> >  xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  iomap_iter+0x132/0x2f0
> >  __iomap_dio_rw+0x2f8/0x840
> >  iomap_dio_rw+0xe/0x30
> >  xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
> >  vfs_write+0x2eb/0x410
> >  ksys_write+0x65/0xe0
> >  do_syscall_64+0x2b/0x80
> > 
> > This crash occurs under the "out_low_space" label.  We grabbed a perag
> > reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and
> > afterwards args->pag is NULL.  Fix the second function not to clobber
> > args->pag if the caller had passed one in.
> > 
> > Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c |    5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index b512de0540d5..cd8870a16fd1 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3494,8 +3494,10 @@ xfs_bmap_btalloc_at_eof(
> >  		if (!caller_pag)
> >  			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> >  		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> > -		if (!caller_pag)
> > +		if (!caller_pag) {
> >  			xfs_perag_put(args->pag);
> > +			args->pag = NULL;
> > +		}
> >  		if (error)
> >  			return error;
> >  
> > @@ -3505,7 +3507,6 @@ xfs_bmap_btalloc_at_eof(
> >  		 * Exact allocation failed. Reset to try an aligned allocation
> >  		 * according to the original allocation specification.
> >  		 */
> > -		args->pag = NULL;
> >  		args->alignment = stripe_align;
> >  		args->minlen = nextminlen;
> >  		args->minalignslop = 0;
> 
> Yup, that'll fix the problem.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> FWIW, I'm working on some patches to take this further by always
> providing a caller perag to this function.  That gets rid of all the
> conditional code here and it gets rid of the only case where we call
> xfs_alloc_vextent_near_bno() without args->pag set so the
> conditional caller_pag code goes from there, too. This makes
> xfs_alloc_vextent_{near,exact}_bno() identical except for one line
> so they can be collapsed. And now that we always specify args->pag
> for these functions, we can ignore the agno part of the target block
> meaning we can select the perag to allocate from at a much higher
> level via setting up args->pag when we initially scan the AGs to
> set up args->minlen/maxlen based on the the nearest AG with 
> contiguous free space large enough for the whole allocation. This
> avoids attempting allocations that are guaranteed to fail before
> we fall back to iterating from the target block agno and eventually
> finding the same AG we originally found that had enough contiguous
> free space in it for a maxlen allocation....
> 
> I think I can take it further an make both filestreams and the
> normal allocator use the same "this ag" algorithm for EOF and
> aligned allocation and then have them both fall back to the same
> unaligned allocation attempts, which will then allow a bunch of code
> to be collapsed in the xfs_bmap_btalloc() path...

<nod> I was thinking along those lines too when I was writing this
patch, but for now I'll just patch everything back together. :)

That /would/ make the interfaces to those functions a lot more
consistent.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index b512de0540d5..cd8870a16fd1 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3494,8 +3494,10 @@  xfs_bmap_btalloc_at_eof(
 		if (!caller_pag)
 			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
 		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-		if (!caller_pag)
+		if (!caller_pag) {
 			xfs_perag_put(args->pag);
+			args->pag = NULL;
+		}
 		if (error)
 			return error;
 
@@ -3505,7 +3507,6 @@  xfs_bmap_btalloc_at_eof(
 		 * Exact allocation failed. Reset to try an aligned allocation
 		 * according to the original allocation specification.
 		 */
-		args->pag = NULL;
 		args->alignment = stripe_align;
 		args->minlen = nextminlen;
 		args->minalignslop = 0;