diff mbox series

[03/13] xfs: free RT extents after updating the bmap btree

Message ID 20240327110318.2776850-4-hch@lst.de (mailing list archive)
State Superseded
Headers show
Series [01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions | expand

Commit Message

Christoph Hellwig March 27, 2024, 11:03 a.m. UTC
Currently xfs_bmap_del_extent_real frees RT extents before updating
the bmap btree, while it frees regular blocks after performing the bmap
btree update.  While this behavior goes back to the original commit,
I can't find any good reason for handling RT extent vs regular block
freeing differently.  We use the same transaction, and unless rmaps
or reflink are enabled (which currently aren't support for RT inodes)
there are no transactions rolls or deferred ops that can rely on this
ordering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

Comments

Darrick J. Wong March 27, 2024, 2:55 p.m. UTC | #1
On Wed, Mar 27, 2024 at 12:03:08PM +0100, Christoph Hellwig wrote:
> Currently xfs_bmap_del_extent_real frees RT extents before updating
> the bmap btree, while it frees regular blocks after performing the bmap
> btree update.  While this behavior goes back to the original commit,
> I can't find any good reason for handling RT extent vs regular block
> freeing differently.  We use the same transaction, and unless rmaps
> or reflink are enabled (which currently aren't support for RT inodes)
> there are no transactions rolls or deferred ops that can rely on this
> ordering.

...and the realtime rmap/reflink patchsets will want to reuse the data
device's ordering (bmap -> rmap -> refcount -> efi) for the rt volume.

> Signed-off-by: Christoph Hellwig <hch@lst.de>

I'm ok with moving this now since I'm mostly going to pave over it later
anyway :)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 09d4b730ee9709..282b44deb9f864 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5107,8 +5107,7 @@ xfs_bmap_del_extent_real(
>  {
>  	xfs_fsblock_t		del_endblock=0;	/* first block past del */
>  	xfs_fileoff_t		del_endoff;	/* first offset past del */
> -	int			do_fx;	/* free extent at end of routine */
> -	int			error;	/* error return value */
> +	int			error = 0;	/* error return value */
>  	struct xfs_bmbt_irec	got;	/* current extent entry */
>  	xfs_fileoff_t		got_endoff;	/* first offset past got */
>  	int			i;	/* temp state */
> @@ -5151,20 +5150,10 @@ xfs_bmap_del_extent_real(
>  		return -ENOSPC;
>  
>  	*logflagsp = XFS_ILOG_CORE;
> -	if (xfs_ifork_is_realtime(ip, whichfork)) {
> -		if (!(bflags & XFS_BMAPI_REMAP)) {
> -			error = xfs_rtfree_blocks(tp, del->br_startblock,
> -					del->br_blockcount);
> -			if (error)
> -				return error;
> -		}
> -
> -		do_fx = 0;
> +	if (xfs_ifork_is_realtime(ip, whichfork))
>  		qfield = XFS_TRANS_DQ_RTBCOUNT;
> -	} else {
> -		do_fx = 1;
> +	else
>  		qfield = XFS_TRANS_DQ_BCOUNT;
> -	}
>  	nblks = del->br_blockcount;
>  
>  	del_endblock = del->br_startblock + del->br_blockcount;
> @@ -5312,18 +5301,21 @@ xfs_bmap_del_extent_real(
>  	/*
>  	 * If we need to, add to list of extents to delete.
>  	 */
> -	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
> +	if (!(bflags & XFS_BMAPI_REMAP)) {
>  		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
>  			xfs_refcount_decrease_extent(tp, del);
> +		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
> +			error = xfs_rtfree_blocks(tp, del->br_startblock,
> +					del->br_blockcount);
>  		} else {
>  			error = xfs_free_extent_later(tp, del->br_startblock,
>  					del->br_blockcount, NULL,
>  					XFS_AG_RESV_NONE,
>  					((bflags & XFS_BMAPI_NODISCARD) ||
>  					del->br_state == XFS_EXT_UNWRITTEN));
> -			if (error)
> -				return error;
>  		}
> +		if (error)
> +			return error;
>  	}
>  
>  	/*
> -- 
> 2.39.2
> 
>
Christoph Hellwig March 27, 2024, 5:03 p.m. UTC | #2
On Wed, Mar 27, 2024 at 07:55:40AM -0700, Darrick J. Wong wrote:
> ...and the realtime rmap/reflink patchsets will want to reuse the data
> device's ordering (bmap -> rmap -> refcount -> efi) for the rt volume.

Yes.

> 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I'm ok with moving this now since I'm mostly going to pave over it later
> anyway :)

Actually the rt extent free item and refcount work will fit in very
nicely with this.  Basically the rt branch added here needs another
condition so that's it's skipped when using extent free items for
RT.  And the extent free item branch needs to pass the RT flag
as already done in your series of course.
Dave Chinner March 28, 2024, 4:13 a.m. UTC | #3
On Wed, Mar 27, 2024 at 12:03:08PM +0100, Christoph Hellwig wrote:
> Currently xfs_bmap_del_extent_real frees RT extents before updating
> the bmap btree, while it frees regular blocks after performing the bmap
> btree update.  While this behavior goes back to the original commit,
> I can't find any good reason for handling RT extent vs regular block
> freeing differently. 

It's to do with free space btree manipulations and ENOSPC.  The
truncate for data device extents was originally a two-phase
operation. First it removed the bmapbt record, but because this can
free BMBT extents, it can use up all the free space tree reservation
space. So the transaction gets rolled to commit the BMBT change and
the xfs_bmap_finish() call that frees the data extent runs with a
new transaction reservation that allows different free space btrees
to be logged without overrun.

However, on crash, this could lose the free space because there was
nothing to tell recovery about the extents removed from the BMBT,
hence EFIs were introduced. They tie the extent free operation to the
bmapbt record removal commit for recovery of the second phase of the
extent removal process.

Then RT extents came along. RT extent freeing does not require a
free space btree reservation because the free space metadata is
static and transaction size is bound. Hence we don't need to care if
the BMBT record removal modifies the per-ag free space trees and we
don't need a two-phase extent remove transaction. The only thing we
have to care about is not losing space on crash.

Hence instead of recording the extent for freeing in the bmap list
for xfs_bmap_finish() to process in a new transaction, it simply
freed the rtextent directly. So the original code (from 1994) simply
replaced the "free AG extent later" queueing with a direct free:

@@ -920,7 +937,10 @@ xfs_bmap_del_extent(
               (got.br_startblock == NULLSTARTBLOCK));
        delay = got.br_startblock == NULLSTARTBLOCK;
        if (!delay) {
-               xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist);
+               if (ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+                       xfs_rtfree_extent(ip->i_mount, ip->i_transp, del->br_startblock, del->br_blockcount);
+               else
+                       xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist);
                del_endblock = del->br_startblock + del->br_blockcount;
                if (cur)
                        xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock, got.br_blockcount);

This code was originally at the start of xfs_dmap_del_extent(), but
the xfs_bmap_add_free() got moved to the end of the function via the
"do_fx" flag (the current code logic) in 1997 because:

commit c4fac74eaa580edcc6b65e977151d73f2b6e9aa5
Author: Doug Doucette <doucette@engr.sgi.com>
Date:   Wed Jul 9 17:34:17 1997 +0000

    Fix xfs_bmap_del_extent, so it can back out of the case where an
    extent is being split, and the space allocation fails.

There was a shutdown occurring because of a case where splitting the
extent record failed because the BMBT split and the filesystem
didn't have enough space for the split to be done. (FWIW, I'm not
sure this can happen anymore.)

The commit backed out the BMBT change on ENOSPC error, and in doing
so I think this actually breaks RT free space tracking. However, it
then returns an ENOSPC error, and we have a dirty transaction in the
RT case so this will shut down the filesysetm when the transaction
is cancelled. Hence the corrupted "bmbt now points at freed rt dev
space" condition never make it to disk, but it's still the wrong way
to handle the issue.

IOWs, this proposed change fixes that "shutdown at ENOSPC on rt
devices" situation that was introduced by the above commit back in
1997.

Nice!

> We use the same transaction, and unless rmaps
> or reflink are enabled (which currently aren't support for RT inodes)
> there are no transactions rolls or deferred ops that can rely on this
> ordering.

Yup, I see no problem there.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig March 29, 2024, 4:14 a.m. UTC | #4
FYI, I've picked up your comments with minor edits for the commit
message, let me know if that is ok with you:

http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=445d786ae6e50f6631118e0fa378ad0cd72076a9
Dave Chinner March 30, 2024, 9:55 p.m. UTC | #5
On Fri, Mar 29, 2024 at 05:14:26AM +0100, Christoph Hellwig wrote:
> FYI, I've picked up your comments with minor edits for the commit
> message, let me know if that is ok with you:
> 
> http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=445d786ae6e50f6631118e0fa378ad0cd72076a9

Yes, that's fine. It certainly can't hurt having some of the
historic context of the code in new commits as it avoids the need
for archive spelunking to make sense of the change...

-Dave.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 09d4b730ee9709..282b44deb9f864 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5107,8 +5107,7 @@  xfs_bmap_del_extent_real(
 {
 	xfs_fsblock_t		del_endblock=0;	/* first block past del */
 	xfs_fileoff_t		del_endoff;	/* first offset past del */
-	int			do_fx;	/* free extent at end of routine */
-	int			error;	/* error return value */
+	int			error = 0;	/* error return value */
 	struct xfs_bmbt_irec	got;	/* current extent entry */
 	xfs_fileoff_t		got_endoff;	/* first offset past got */
 	int			i;	/* temp state */
@@ -5151,20 +5150,10 @@  xfs_bmap_del_extent_real(
 		return -ENOSPC;
 
 	*logflagsp = XFS_ILOG_CORE;
-	if (xfs_ifork_is_realtime(ip, whichfork)) {
-		if (!(bflags & XFS_BMAPI_REMAP)) {
-			error = xfs_rtfree_blocks(tp, del->br_startblock,
-					del->br_blockcount);
-			if (error)
-				return error;
-		}
-
-		do_fx = 0;
+	if (xfs_ifork_is_realtime(ip, whichfork))
 		qfield = XFS_TRANS_DQ_RTBCOUNT;
-	} else {
-		do_fx = 1;
+	else
 		qfield = XFS_TRANS_DQ_BCOUNT;
-	}
 	nblks = del->br_blockcount;
 
 	del_endblock = del->br_startblock + del->br_blockcount;
@@ -5312,18 +5301,21 @@  xfs_bmap_del_extent_real(
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
-	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
+	if (!(bflags & XFS_BMAPI_REMAP)) {
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
+		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
+			error = xfs_rtfree_blocks(tp, del->br_startblock,
+					del->br_blockcount);
 		} else {
 			error = xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					XFS_AG_RESV_NONE,
 					((bflags & XFS_BMAPI_NODISCARD) ||
 					del->br_state == XFS_EXT_UNWRITTEN));
-			if (error)
-				return error;
 		}
+		if (error)
+			return error;
 	}
 
 	/*