diff mbox series

xfs: reserve blocks for rmapbt changes in xfs_reflink_end_cow

Message ID 20181123175448.GX6792@magnolia (mailing list archive)
State New, archived
Headers show
Series xfs: reserve blocks for rmapbt changes in xfs_reflink_end_cow | expand

Commit Message

Darrick J. Wong Nov. 23, 2018, 5:54 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_reflink_end_cow, we have to swap written extents from the CoW
fork into the data fork, which can require extensive rmapbt updates.
The transaction block reservation calculation forgot that part of the
calculation, which lead to a shutdown during an end_cow transaction roll
during fsx exercises:

XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
<machine registers snipped>
Call Trace:
 xfs_trans_dup+0x211/0x250 [xfs]
 xfs_trans_roll+0x6d/0x180 [xfs]
 xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
 xfs_defer_finish_noroll+0xdf/0x740 [xfs]
 xfs_defer_finish+0x13/0x70 [xfs]
 xfs_reflink_end_cow+0x2c6/0x680 [xfs]
 xfs_dio_write_end_io+0x115/0x220 [xfs]
 iomap_dio_complete+0x3f/0x130
 iomap_dio_rw+0x3c3/0x420
 xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
 xfs_file_write_iter+0x8b/0xc0 [xfs]
 __vfs_write+0x193/0x1f0
 vfs_write+0xba/0x1c0
 ksys_write+0x52/0xc0
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Brian Foster Nov. 26, 2018, 2:44 p.m. UTC | #1
On Fri, Nov 23, 2018 at 09:54:48AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xfs_reflink_end_cow, we have to swap written extents from the CoW
> fork into the data fork, which can require extensive rmapbt updates.
> The transaction block reservation calculation forgot that part of the
> calculation, which lead to a shutdown during an end_cow transaction roll
> during fsx exercises:
> 
> XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
> <machine registers snipped>
> Call Trace:
>  xfs_trans_dup+0x211/0x250 [xfs]
>  xfs_trans_roll+0x6d/0x180 [xfs]
>  xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
>  xfs_defer_finish_noroll+0xdf/0x740 [xfs]
>  xfs_defer_finish+0x13/0x70 [xfs]
>  xfs_reflink_end_cow+0x2c6/0x680 [xfs]
>  xfs_dio_write_end_io+0x115/0x220 [xfs]
>  iomap_dio_complete+0x3f/0x130
>  iomap_dio_rw+0x3c3/0x420
>  xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
>  xfs_file_write_iter+0x8b/0xc0 [xfs]
>  __vfs_write+0x193/0x1f0
>  vfs_write+0xba/0x1c0
>  ksys_write+0x52/0xc0
>  do_syscall_64+0x50/0x160
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

It's a bit interesting that we only seem to use XFS_NRMAPADD_SPACE_RES()
in XFS_SWAP_RMAP_SPACE_RES(), and then the latter (more expectedly) is
only used in the swap extent operation. Any particular reason for that?
IOW, we don't seem to include this res in places where we do extent
allocs and whatnot, which also (defer) rmap updates..

Brian

>  fs/xfs/xfs_reflink.c |   11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 322a852ce284..c706d7791479 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -663,9 +663,14 @@ xfs_reflink_end_cow(
>  		ASSERT(0);
>  		goto out;
>  	}
> -	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> -			(unsigned int)(end_fsb - offset_fsb),
> -			XFS_DATA_FORK);
> +	if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb))
> +		resblks = XFS_SWAP_RMAP_SPACE_RES(ip->i_mount,
> +				(unsigned int)(end_fsb - offset_fsb),
> +				XFS_DATA_FORK);
> +	else
> +		resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> +				(unsigned int)(end_fsb - offset_fsb),
> +				XFS_DATA_FORK);
>  	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
>  			resblks, 0, XFS_TRANS_RESERVE | XFS_TRANS_NOFS, &tp);
>  	if (error)
Darrick J. Wong Nov. 26, 2018, 6:06 p.m. UTC | #2
On Mon, Nov 26, 2018 at 09:44:56AM -0500, Brian Foster wrote:
> On Fri, Nov 23, 2018 at 09:54:48AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In xfs_reflink_end_cow, we have to swap written extents from the CoW
> > fork into the data fork, which can require extensive rmapbt updates.
> > The transaction block reservation calculation forgot that part of the
> > calculation, which lead to a shutdown during an end_cow transaction roll
> > during fsx exercises:
> > 
> > XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
> > <machine registers snipped>
> > Call Trace:
> >  xfs_trans_dup+0x211/0x250 [xfs]
> >  xfs_trans_roll+0x6d/0x180 [xfs]
> >  xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
> >  xfs_defer_finish_noroll+0xdf/0x740 [xfs]
> >  xfs_defer_finish+0x13/0x70 [xfs]
> >  xfs_reflink_end_cow+0x2c6/0x680 [xfs]
> >  xfs_dio_write_end_io+0x115/0x220 [xfs]
> >  iomap_dio_complete+0x3f/0x130
> >  iomap_dio_rw+0x3c3/0x420
> >  xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
> >  xfs_file_write_iter+0x8b/0xc0 [xfs]
> >  __vfs_write+0x193/0x1f0
> >  vfs_write+0xba/0x1c0
> >  ksys_write+0x52/0xc0
> >  do_syscall_64+0x50/0x160
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> It's a bit interesting that we only seem to use XFS_NRMAPADD_SPACE_RES()
> in XFS_SWAP_RMAP_SPACE_RES(), and then the latter (more expectedly) is
> only used in the swap extent operation. Any particular reason for that?
> IOW, we don't seem to include this res in places where we do extent
> allocs and whatnot, which also (defer) rmap updates..

<scrubs all the cobwebs out of his brain>

Normally the per-AG reservation is supposed to handle expansions of the
rmap and refcount btrees, so I think this patch isn't correct.

OTOH, looking again at the code, I see...

	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);

	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
			(unsigned int)(end_fsb - offset_fsb),
			XFS_DATA_FORK);

So if, say, blocksize = 4096, offset = 512 and count = 1024, then
offset_fsb = 0 and end_fsb = 2, so this reserves enough blocks for
swapping 2 - 0 blocks, whereas the range covers three different blocks.

Hmm, I guess I'll try that, though the overflow took a while to hit. :)

--D

> Brian
> 
> >  fs/xfs/xfs_reflink.c |   11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 322a852ce284..c706d7791479 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -663,9 +663,14 @@ xfs_reflink_end_cow(
> >  		ASSERT(0);
> >  		goto out;
> >  	}
> > -	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > -			(unsigned int)(end_fsb - offset_fsb),
> > -			XFS_DATA_FORK);
> > +	if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb))
> > +		resblks = XFS_SWAP_RMAP_SPACE_RES(ip->i_mount,
> > +				(unsigned int)(end_fsb - offset_fsb),
> > +				XFS_DATA_FORK);
> > +	else
> > +		resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > +				(unsigned int)(end_fsb - offset_fsb),
> > +				XFS_DATA_FORK);
> >  	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
> >  			resblks, 0, XFS_TRANS_RESERVE | XFS_TRANS_NOFS, &tp);
> >  	if (error)
Brian Foster Nov. 26, 2018, 7:28 p.m. UTC | #3
On Mon, Nov 26, 2018 at 10:06:34AM -0800, Darrick J. Wong wrote:
> On Mon, Nov 26, 2018 at 09:44:56AM -0500, Brian Foster wrote:
> > On Fri, Nov 23, 2018 at 09:54:48AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > In xfs_reflink_end_cow, we have to swap written extents from the CoW
> > > fork into the data fork, which can require extensive rmapbt updates.
> > > The transaction block reservation calculation forgot that part of the
> > > calculation, which lead to a shutdown during an end_cow transaction roll
> > > during fsx exercises:
> > > 
> > > XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
> > > <machine registers snipped>
> > > Call Trace:
> > >  xfs_trans_dup+0x211/0x250 [xfs]
> > >  xfs_trans_roll+0x6d/0x180 [xfs]
> > >  xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
> > >  xfs_defer_finish_noroll+0xdf/0x740 [xfs]
> > >  xfs_defer_finish+0x13/0x70 [xfs]
> > >  xfs_reflink_end_cow+0x2c6/0x680 [xfs]
> > >  xfs_dio_write_end_io+0x115/0x220 [xfs]
> > >  iomap_dio_complete+0x3f/0x130
> > >  iomap_dio_rw+0x3c3/0x420
> > >  xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
> > >  xfs_file_write_iter+0x8b/0xc0 [xfs]
> > >  __vfs_write+0x193/0x1f0
> > >  vfs_write+0xba/0x1c0
> > >  ksys_write+0x52/0xc0
> > >  do_syscall_64+0x50/0x160
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > It's a bit interesting that we only seem to use XFS_NRMAPADD_SPACE_RES()
> > in XFS_SWAP_RMAP_SPACE_RES(), and then the latter (more expectedly) is
> > only used in the swap extent operation. Any particular reason for that?
> > IOW, we don't seem to include this res in places where we do extent
> > allocs and whatnot, which also (defer) rmap updates..
> 
> <scrubs all the cobwebs out of his brain>
> 
> Normally the per-AG reservation is supposed to handle expansions of the
> rmap and refcount btrees, so I think this patch isn't correct.
> 

Ok.

> OTOH, looking again at the code, I see...
> 
> 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> 	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);
> 
> 	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> 			(unsigned int)(end_fsb - offset_fsb),
> 			XFS_DATA_FORK);
> 
> So if, say, blocksize = 4096, offset = 512 and count = 1024, then
> offset_fsb = 0 and end_fsb = 2, so this reserves enough blocks for
> swapping 2 - 0 blocks, whereas the range covers three different blocks.
> 

Isn't this all within a single 4k block? Did you mean a different block
size (though it still looks like 2 blocks even with 1k FSBs)?

Brian

> Hmm, I guess I'll try that, though the overflow took a while to hit. :)
> 
> --D
> 
> > Brian
> > 
> > >  fs/xfs/xfs_reflink.c |   11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > index 322a852ce284..c706d7791479 100644
> > > --- a/fs/xfs/xfs_reflink.c
> > > +++ b/fs/xfs/xfs_reflink.c
> > > @@ -663,9 +663,14 @@ xfs_reflink_end_cow(
> > >  		ASSERT(0);
> > >  		goto out;
> > >  	}
> > > -	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > > -			(unsigned int)(end_fsb - offset_fsb),
> > > -			XFS_DATA_FORK);
> > > +	if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb))
> > > +		resblks = XFS_SWAP_RMAP_SPACE_RES(ip->i_mount,
> > > +				(unsigned int)(end_fsb - offset_fsb),
> > > +				XFS_DATA_FORK);
> > > +	else
> > > +		resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > > +				(unsigned int)(end_fsb - offset_fsb),
> > > +				XFS_DATA_FORK);
> > >  	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
> > >  			resblks, 0, XFS_TRANS_RESERVE | XFS_TRANS_NOFS, &tp);
> > >  	if (error)
Darrick J. Wong Nov. 26, 2018, 8:26 p.m. UTC | #4
On Mon, Nov 26, 2018 at 02:28:39PM -0500, Brian Foster wrote:
> On Mon, Nov 26, 2018 at 10:06:34AM -0800, Darrick J. Wong wrote:
> > On Mon, Nov 26, 2018 at 09:44:56AM -0500, Brian Foster wrote:
> > > On Fri, Nov 23, 2018 at 09:54:48AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > In xfs_reflink_end_cow, we have to swap written extents from the CoW
> > > > fork into the data fork, which can require extensive rmapbt updates.
> > > > The transaction block reservation calculation forgot that part of the
> > > > calculation, which lead to a shutdown during an end_cow transaction roll
> > > > during fsx exercises:
> > > > 
> > > > XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
> > > > <machine registers snipped>
> > > > Call Trace:
> > > >  xfs_trans_dup+0x211/0x250 [xfs]
> > > >  xfs_trans_roll+0x6d/0x180 [xfs]
> > > >  xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
> > > >  xfs_defer_finish_noroll+0xdf/0x740 [xfs]
> > > >  xfs_defer_finish+0x13/0x70 [xfs]
> > > >  xfs_reflink_end_cow+0x2c6/0x680 [xfs]
> > > >  xfs_dio_write_end_io+0x115/0x220 [xfs]
> > > >  iomap_dio_complete+0x3f/0x130
> > > >  iomap_dio_rw+0x3c3/0x420
> > > >  xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
> > > >  xfs_file_write_iter+0x8b/0xc0 [xfs]
> > > >  __vfs_write+0x193/0x1f0
> > > >  vfs_write+0xba/0x1c0
> > > >  ksys_write+0x52/0xc0
> > > >  do_syscall_64+0x50/0x160
> > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > 
> > > It's a bit interesting that we only seem to use XFS_NRMAPADD_SPACE_RES()
> > > in XFS_SWAP_RMAP_SPACE_RES(), and then the latter (more expectedly) is
> > > only used in the swap extent operation. Any particular reason for that?
> > > IOW, we don't seem to include this res in places where we do extent
> > > allocs and whatnot, which also (defer) rmap updates..
> > 
> > <scrubs all the cobwebs out of his brain>
> > 
> > Normally the per-AG reservation is supposed to handle expansions of the
> > rmap and refcount btrees, so I think this patch isn't correct.
> > 
> 
> Ok.
> 
> > OTOH, looking again at the code, I see...
> > 
> > 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> > 	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);
> > 
> > 	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > 			(unsigned int)(end_fsb - offset_fsb),
> > 			XFS_DATA_FORK);
> > 
> > So if, say, blocksize = 4096, offset = 512 and count = 1024, then
> > offset_fsb = 0 and end_fsb = 2, so this reserves enough blocks for
> > swapping 2 - 0 blocks, whereas the range covers three different blocks.
> > 
> 
> Isn't this all within a single 4k block? Did you mean a different block
> size (though it still looks like 2 blocks even with 1k FSBs)?

Yeah, I meant 1k blocks, though the same applies if count = 4096.

--D

> Brian
> 
> > Hmm, I guess I'll try that, though the overflow took a while to hit. :)
> > 
> > --D
> > 
> > > Brian
> > > 
> > > >  fs/xfs/xfs_reflink.c |   11 ++++++++---
> > > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index 322a852ce284..c706d7791479 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -663,9 +663,14 @@ xfs_reflink_end_cow(
> > > >  		ASSERT(0);
> > > >  		goto out;
> > > >  	}
> > > > -	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > > > -			(unsigned int)(end_fsb - offset_fsb),
> > > > -			XFS_DATA_FORK);
> > > > +	if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb))
> > > > +		resblks = XFS_SWAP_RMAP_SPACE_RES(ip->i_mount,
> > > > +				(unsigned int)(end_fsb - offset_fsb),
> > > > +				XFS_DATA_FORK);
> > > > +	else
> > > > +		resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
> > > > +				(unsigned int)(end_fsb - offset_fsb),
> > > > +				XFS_DATA_FORK);
> > > >  	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
> > > >  			resblks, 0, XFS_TRANS_RESERVE | XFS_TRANS_NOFS, &tp);
> > > >  	if (error)
diff mbox series

Patch

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 322a852ce284..c706d7791479 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -663,9 +663,14 @@  xfs_reflink_end_cow(
 		ASSERT(0);
 		goto out;
 	}
-	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
-			(unsigned int)(end_fsb - offset_fsb),
-			XFS_DATA_FORK);
+	if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb))
+		resblks = XFS_SWAP_RMAP_SPACE_RES(ip->i_mount,
+				(unsigned int)(end_fsb - offset_fsb),
+				XFS_DATA_FORK);
+	else
+		resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
+				(unsigned int)(end_fsb - offset_fsb),
+				XFS_DATA_FORK);
 	error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
 			resblks, 0, XFS_TRANS_RESERVE | XFS_TRANS_NOFS, &tp);
 	if (error)