diff mbox series

[2/9] xfs: contiguous EOF allocation across AGs

Message ID 20231004001943.349265-3-david@fromorbit.com (mailing list archive)
State New, archived
Headers show
Series xfs: push perags further into allocation routines | expand

Commit Message

Dave Chinner Oct. 4, 2023, 12:19 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Currently when we allocate at EOF, we set the initial target to the
location of the inode. Then we call xfs_bmap_adjacent(), which sees
that we are doing an EOF extension, so it moves the target to the
last block of the previous extent. This may be in a different AG to
the inode.

When we then go to select the AG with the best free length, the AG
at the target block might not have sufficient free space for the
full allocation, so we may select a different AG. We then do an
exact BNO allocation with the original target (EOF block), which
reverts back to attempting an allocation in an AG that doesn't have
sufficient contiguous free space available.

This generally leads to allocation failure, and then we fall back to
scanning the AGs for one that the allocation will succeed in. This
scan also results in contended AGS being skipped, so we have no idea
what AG we are going to end up allocating in. For sequential writes,
this results in random extents being located in random places in
non-target AGs.

We want to guarantee that we can allocate in the AG that we have
selected as having the "best contiguous free space" efficiently,
so if we select a different AG, we should move the allocation target
and skip the exact EOF allocation as we know it will not succeed.
i.e. we should start with aligned allocation immediately, knowing it
will likely succeed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

Comments

Christoph Hellwig Oct. 5, 2023, 9:43 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong Oct. 6, 2023, 5:27 a.m. UTC | #2
On Wed, Oct 04, 2023 at 11:19:36AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently when we allocate at EOF, we set the initial target to the
> location of the inode. Then we call xfs_bmap_adjacent(), which sees
> that we are doing an EOF extension, so it moves the target to the
> last block of the previous extent. This may be in a different AG to
> the inode.
> 
> When we then go to select the AG with the best free length, the AG
> at the target block might not have sufficient free space for the
> full allocation, so we may select a different AG. We then do an
> exact BNO allocation with the original target (EOF block), which
> reverts back to attempting an allocation in an AG that doesn't have
> sufficient contiguous free space available.
> 
> This generally leads to allocation failure, and then we fall back to
> scanning the AGs for one that the allocation will succeed in. This
> scan also results in contended AGS being skipped, so we have no idea
> what AG we are going to end up allocating in. For sequential writes,
> this results in random extents being located in random places in
> non-target AGs.
> 
> We want to guarantee that we can allocate in the AG that we have
> selected as having the "best contiguous free space" efficiently,
> so if we select a different AG, we should move the allocation target
> and skip the exact EOF allocation as we know it will not succeed.
> i.e. we should start with aligned allocation immediately, knowing it
> will likely succeed.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Sounds good to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index e14671414afb..e64ba7e2d13d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3252,8 +3252,18 @@ xfs_bmap_btalloc_select_lengths(
>  		if (error && error != -EAGAIN)
>  			break;
>  		error = 0;
> -		if (*blen >= args->maxlen)
> +		if (*blen >= args->maxlen) {
> +			/*
> +			 * We are going to target a different AG than the
> +			 * incoming target, so we need to reset the target and
> +			 * skip exact EOF allocation attempts.
> +			 */
> +			if (agno != startag) {
> +				ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
> +				ap->aeof = false;
> +			}
>  			break;
> +		}
>  	}
>  	if (pag)
>  		xfs_perag_rele(pag);
> @@ -3514,10 +3524,10 @@ xfs_bmap_btalloc_aligned(
>  	int			error;
>  
>  	/*
> -	 * If we failed an exact EOF allocation already, stripe
> -	 * alignment will have already been taken into account in
> -	 * args->minlen. Hence we only adjust minlen to try to preserve
> -	 * alignment if no slop has been reserved for alignment
> +	 * If we failed an exact EOF allocation already, stripe alignment will
> +	 * have already been taken into account in args->minlen. Hence we only
> +	 * adjust minlen to try to preserve alignment if no slop has been
> +	 * reserved for alignment
>  	 */
>  	if (args->minalignslop == 0) {
>  		if (blen > stripe_align &&
> @@ -3653,6 +3663,16 @@ xfs_bmap_btalloc_best_length(
>  	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
>  	xfs_bmap_adjacent(ap);
>  
> +	/*
> +	 * We only use stripe alignment for EOF allocations. Hence if it isn't
> +	 * an EOF allocation, clear the stripe alignment. This allows us to
> +	 * skip exact block EOF allocation yet still do stripe aligned
> +	 * allocation if we select a different AG to the
> +	 * exact target block due to a lack of contiguous free space.
> +	 */
> +	if (!ap->aeof)
> +		stripe_align = 0;
> +
>  	/*
>  	 * Search for an allocation group with a single extent large enough for
>  	 * the request.  If one isn't found, then adjust the minimum allocation
> @@ -3675,7 +3695,7 @@ xfs_bmap_btalloc_best_length(
>  	}
>  
>  	/* attempt aligned allocation for new EOF extents */
> -	if (ap->aeof)
> +	if (stripe_align)
>  		error = xfs_bmap_btalloc_aligned(ap, args, blen, stripe_align,
>  				false);
>  	if (error || args->fsbno != NULLFSBLOCK)
> -- 
> 2.40.1
>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index e14671414afb..e64ba7e2d13d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3252,8 +3252,18 @@  xfs_bmap_btalloc_select_lengths(
 		if (error && error != -EAGAIN)
 			break;
 		error = 0;
-		if (*blen >= args->maxlen)
+		if (*blen >= args->maxlen) {
+			/*
+			 * We are going to target a different AG than the
+			 * incoming target, so we need to reset the target and
+			 * skip exact EOF allocation attempts.
+			 */
+			if (agno != startag) {
+				ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
+				ap->aeof = false;
+			}
 			break;
+		}
 	}
 	if (pag)
 		xfs_perag_rele(pag);
@@ -3514,10 +3524,10 @@  xfs_bmap_btalloc_aligned(
 	int			error;
 
 	/*
-	 * If we failed an exact EOF allocation already, stripe
-	 * alignment will have already been taken into account in
-	 * args->minlen. Hence we only adjust minlen to try to preserve
-	 * alignment if no slop has been reserved for alignment
+	 * If we failed an exact EOF allocation already, stripe alignment will
+	 * have already been taken into account in args->minlen. Hence we only
+	 * adjust minlen to try to preserve alignment if no slop has been
+	 * reserved for alignment
 	 */
 	if (args->minalignslop == 0) {
 		if (blen > stripe_align &&
@@ -3653,6 +3663,16 @@  xfs_bmap_btalloc_best_length(
 	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
 	xfs_bmap_adjacent(ap);
 
+	/*
+	 * We only use stripe alignment for EOF allocations. Hence if it isn't
+	 * an EOF allocation, clear the stripe alignment. This allows us to
+	 * skip exact block EOF allocation yet still do stripe aligned
+	 * allocation if we select a different AG to the
+	 * exact target block due to a lack of contiguous free space.
+	 */
+	if (!ap->aeof)
+		stripe_align = 0;
+
 	/*
 	 * Search for an allocation group with a single extent large enough for
 	 * the request.  If one isn't found, then adjust the minimum allocation
@@ -3675,7 +3695,7 @@  xfs_bmap_btalloc_best_length(
 	}
 
 	/* attempt aligned allocation for new EOF extents */
-	if (ap->aeof)
+	if (stripe_align)
 		error = xfs_bmap_btalloc_aligned(ap, args, blen, stripe_align,
 				false);
 	if (error || args->fsbno != NULLFSBLOCK)