diff mbox series

[1/3] xfs: simplify extent allocation alignment

Message ID 20240306053048.1656747-2-david@fromorbit.com (mailing list archive)
State Superseded
Headers show
Series xfs: forced extent alignment | expand

Commit Message

Dave Chinner March 6, 2024, 5:20 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

We currently align extent allocation to stripe unit or stripe width.
That is specified by an external parameter to the allocation code,
which then manipulates the xfs_alloc_args alignment configuration in
interesting ways.

The args->alignment field specifies extent start alignment, but
because we may be attempting non-aligned allocation first there are
also slop variables that allow for those allocation attempts to
account for aligned allocation if they fail.

This gets much more complex as we introduce forced allocation
alignment, where extent size hints are used to generate the extent
start alignment. extent size hints currently only affect extent
lengths (via args->prod and args->mod) and so with this change we
will have two different start alignment conditions.

Avoid this complexity by always using args->alignment to indicate
extent start alignment, and always using args->prod/mod to indicate
extent length adjustment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c |  2 +-
 fs/xfs/libxfs/xfs_alloc.h |  2 +-
 fs/xfs/libxfs/xfs_bmap.c  | 96 +++++++++++++++++----------------------
 3 files changed, 44 insertions(+), 56 deletions(-)

Comments

John Garry March 13, 2024, 11:03 a.m. UTC | #1
On 06/03/2024 05:20, Dave Chinner wrote:
>   		return false;
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 0b956f8b9d5a..aa2c103d98f0 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
>   	xfs_extlen_t	minleft;	/* min blocks must be left after us */
>   	xfs_extlen_t	total;		/* total blocks needed in xaction */
>   	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
>   	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
>   	xfs_agblock_t	max_agbno;	/* ... */
>   	xfs_extlen_t	len;		/* output: actual size of extent */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 656c95a22f2e..d56c82c07505 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3295,6 +3295,10 @@ xfs_bmap_select_minlen(
>   	xfs_extlen_t		blen)

Hi Dave,

>   {
>   
> +	/* Adjust best length for extent start alignment. */
> +	if (blen > args->alignment)
> +		blen -= args->alignment;
> +

This change seems to be causing or exposing some issue, in that I find 
that I am being allocated an extent which is aligned to but not a 
multiple of args->alignment.

For my test, I have forcealign=16KB and initially I write 1756 * 4096 = 
7192576B to the file, so I have this:

  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
    0: [0..14079]:      42432..56511      0 (42432..56511)   14080

That is 1760 FSBs for extent #0.

Then I write 340992B from offset 7195648, and I find this:
  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
    1: [14080..14711]:  177344..177975    0 (177344..177975)   632
    2: [14712..14719]:  350720..350727    1 (171520..171527)     8

extent #1 is 79 FSBs, which is not a multiple of 16KB.

In this case, in xfs_bmap_select_minlen() I find initially blen=80 
args->alignment=4, ->minlen=0, ->maxlen=80. Subsequently blen is reduced 
to 76 and args->minlen is set to 76, and then 
xfs_bmap_btalloc_best_length() -> xfs_alloc_vextent_start_ag() happens 
to find an extent of length 79.

Removing the specific change to modify blen seems to make things ok again.

I will also note something strange which could be the issue, that being 
that xfs_alloc_fix_len() does not fix this up - I thought that was its 
job. Firstly, in this same scenario, in xfs_alloc_space_available() we 
calculate alloc_len = args->minlen + (args->alignment - 1) + 
args->alignslop = 76 + (4 - 1) + 0 = 79, and then args->maxlen = 79. 
Then xfs_alloc_fix_len() allows this as args->len == args->maxlen (=79), 
even though args->prod, mod = 4, 0. To me, that (args->alignment - 1) 
component in calculating alloc_len is odd. I assume it is done as 
default args->alignment == 1.

Anyway, let me know what you think.

Cheers,
John

>   	/*
>   	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
>   	 * possible that there is enough contiguous free space for this request.
> @@ -3310,6 +3314,7 @@ xfs_bmap_select_minlen(
>   	if (blen < args->maxlen)
>   		return blen;
>   	return args->maxlen;
> +
Dave Chinner March 20, 2024, 4:35 a.m. UTC | #2
On Wed, Mar 13, 2024 at 11:03:18AM +0000, John Garry wrote:
> On 06/03/2024 05:20, Dave Chinner wrote:
> >   		return false;
> > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > index 0b956f8b9d5a..aa2c103d98f0 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.h
> > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
> >   	xfs_extlen_t	minleft;	/* min blocks must be left after us */
> >   	xfs_extlen_t	total;		/* total blocks needed in xaction */
> >   	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> > -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> > +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
> >   	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
> >   	xfs_agblock_t	max_agbno;	/* ... */
> >   	xfs_extlen_t	len;		/* output: actual size of extent */
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 656c95a22f2e..d56c82c07505 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3295,6 +3295,10 @@ xfs_bmap_select_minlen(
> >   	xfs_extlen_t		blen)
> 
> Hi Dave,
> 
> >   {
> > +	/* Adjust best length for extent start alignment. */
> > +	if (blen > args->alignment)
> > +		blen -= args->alignment;
> > +
> 
> This change seems to be causing or exposing some issue, in that I find that
> I am being allocated an extent which is aligned to but not a multiple of
> args->alignment.

Entirely possible the logic isn't correct ;)

IIRC, I added this because I thought that blen ends up influencing
args->maxlen and nothing else. The alignment isn't taken out of
"blen", it's supposed to be added to args->maxlen.

> For my test, I have forcealign=16KB and initially I write 1756 * 4096 =
> 7192576B to the file, so I have this:
> 
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
> 
> That is 1760 FSBs for extent #0.
> 
> Then I write 340992B from offset 7195648, and I find this:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>    1: [14080..14711]:  177344..177975    0 (177344..177975)   632
>    2: [14712..14719]:  350720..350727    1 (171520..171527)     8
> 
> extent #1 is 79 FSBs, which is not a multiple of 16KB.

> In this case, in xfs_bmap_select_minlen() I find initially blen=80

Ah, so you've hit the corner case of the largest free space being
exactly 80 in length and args->maxlen = 80.

> args->alignment=4, ->minlen=0, ->maxlen=80. Subsequently blen is reduced to
> 76 and args->minlen is set to 76, and then xfs_bmap_btalloc_best_length() ->
> xfs_alloc_vextent_start_ag() happens to find an extent of length 79.

So there's nothing wrong here. We've asked for any extent that
is between 76 and 80 blocks in length to be allocated, and we found
one that is 79 blocks in length.

Finding a 79 block extent instead of an 80 block extent like blen
indicated means that there was either:

- a block moved to the AGFL from that 80 block free extent prior to
  doing the free extent search.
- a block was busy and so was trimmed out via
  xfs_alloc_compute_aligned()
- the front edge wasn't aligned so it took a block off the front of
  the free space to align it. It's this condition that code I added
  above takes into account - an exact match on size does not imply
  that aligned allocation of exactly that size can be done.

Given the front edge is aligned, I'd say it was the latter that
occurred. The question is this: why wasn't the tail edge aligned
down to make the extent length 76 blocks?

> Removing the specific change to modify blen seems to make things ok again.

Right, because now the allocation ends up being set up with
args->minlen = args->maxlen = 80 and the allocation is unable to
align the extent and meet the args->minlen requirement from that
same unaligned 80 block free space. Hence that allocation fails and
we fall back to a different allocation strategy that searches other
AGs for a matching aligned allocation.

IOWs, removing the 'blen -= args->alignment' code simply kicks the
problem down the road until all AGs run out of 80 block contiguous
extents.

This really smells like a tail alignment bug, not a problem with the
allocation setup. Returning an extent that is 76 blocks in length
fulfils the 4 block alignment requirement, so why did tail alignment
fail?

> I will also note something strange which could be the issue, that being that
> xfs_alloc_fix_len() does not fix this up - I thought that was its job.

Yes, it should fix this up.

> Firstly, in this same scenario, in xfs_alloc_space_available() we calculate
> alloc_len = args->minlen + (args->alignment - 1) + args->alignslop = 76 + (4
> - 1) + 0 = 79, and then args->maxlen = 79.

That seems OK, we're doing aligned allocation and this is an ENOSPC
corner case so the aligned allocation should get rounded down in
xfs_alloc_fix_len() or rejected.

One thought I just had is that the args->maxlen adjustment shouldn't
be to "available space" - it should probably be set to args->minlen
because that's the aligned 'alloc_len' we checked available space
against. That would fix this, because then we'd have args->minlen =
args->maxlen = 76.

However, that only addresses this specific case, not the general
case of xfs_alloc_fix_len() failing to tail align the allocated
extent.

> Then xfs_alloc_fix_len() allows
> this as args->len == args->maxlen (=79), even though args->prod, mod = 4, 0.

Yeah, that smells wrong.

I'd suggest that we've never noticed this until now because we
have never guaranteed extent alignment. Hence the occasional
short/unaligned extent being allocated in dark ENOSPC corners was
never an issue for anyone.

However, introducing a new alignment guarantee turns these sorts of
latent non-issues into bugs that need to be fixed. i.e. This is
exactly the sort of rare corner case behaviour I expected to be
flushed out by guaranteeing and then extensively testing allocation
alignments.

If you drop the rlen == args->maxlen check from
xfs_alloc_space_available(), the problem should go away and the
extent gets trimmed to 76 blocks. This shouldn't affect anything
else because maxlen allocations should already be properly aligned -
it's only when something like ENOSPC causes args->maxlen to be
modified to an unaligned value that this issue arises.

In the end, I suspect we'll want to make both changes....

> To me, that (args->alignment - 1) component in calculating alloc_len is odd.
> I assume it is done as default args->alignment == 1.

No, it's done because guaranteeing aligned allocation requires
selecting an aligned region from an unaligned free space. i.e.  when
alignment is 4, then we can need up to 3 additional blocks to
guarantee front alignment for a given length extent.
i.e. we have to over-allocate to guarantee we can trim up
to alignment at the front edge and still guarantee that the extent
is as long as required by args->minlen/maxlen.

Cheers,

Dave.
John Garry March 26, 2024, 4:08 p.m. UTC | #3
On 20/03/2024 04:35, Dave Chinner wrote:

For some reason I never received this mail. I just noticed it on 
lore.kernel.org today by chance.

> On Wed, Mar 13, 2024 at 11:03:18AM +0000, John Garry wrote:
>> On 06/03/2024 05:20, Dave Chinner wrote:
>>>    		return false;
>>> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
>>> index 0b956f8b9d5a..aa2c103d98f0 100644
>>> --- a/fs/xfs/libxfs/xfs_alloc.h
>>> +++ b/fs/xfs/libxfs/xfs_alloc.h
>>> @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
>>>    	xfs_extlen_t	minleft;	/* min blocks must be left after us */
>>>    	xfs_extlen_t	total;		/* total blocks needed in xaction */
>>>    	xfs_extlen_t	alignment;	/* align answer to multiple of this */
>>> -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
>>> +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
>>>    	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
>>>    	xfs_agblock_t	max_agbno;	/* ... */
>>>    	xfs_extlen_t	len;		/* output: actual size of extent */
>>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>>> index 656c95a22f2e..d56c82c07505 100644
>>> --- a/fs/xfs/libxfs/xfs_bmap.c
>>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>>> @@ -3295,6 +3295,10 @@ xfs_bmap_select_minlen(
>>>    	xfs_extlen_t		blen)
>>
>> Hi Dave,
>>
>>>    {
>>> +	/* Adjust best length for extent start alignment. */
>>> +	if (blen > args->alignment)
>>> +		blen -= args->alignment;
>>> +
>>
>> This change seems to be causing or exposing some issue, in that I find that
>> I am being allocated an extent which is aligned to but not a multiple of
>> args->alignment.
> 
> Entirely possible the logic isn't correct ;)

Out of curiosity, how do you guys normally test all this sort of logic?

I found this issue with the small program which I wrote to generate 
traffic. I could not find anything similar.

> 
> IIRC, I added this because I thought that blen ends up influencing
> args->maxlen and nothing else. The alignment isn't taken out of
> "blen", it's supposed to be added to args->maxlen.
> 
>> For my test, I have forcealign=16KB and initially I write 1756 * 4096 =
>> 7192576B to the file, so I have this:
>>
>>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>     0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>>
>> That is 1760 FSBs for extent #0.
>>
>> Then I write 340992B from offset 7195648, and I find this:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>     0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>>     1: [14080..14711]:  177344..177975    0 (177344..177975)   632
>>     2: [14712..14719]:  350720..350727    1 (171520..171527)     8
>>
>> extent #1 is 79 FSBs, which is not a multiple of 16KB.
> 
>> In this case, in xfs_bmap_select_minlen() I find initially blen=80
> 
> Ah, so you've hit the corner case of the largest free space being
> exactly 80 in length and args->maxlen = 80.
> 
>> args->alignment=4, ->minlen=0, ->maxlen=80. Subsequently blen is reduced to
>> 76 and args->minlen is set to 76, and then xfs_bmap_btalloc_best_length() ->
>> xfs_alloc_vextent_start_ag() happens to find an extent of length 79.
> 
> So there's nothing wrong here. We've asked for any extent that
> is between 76 and 80 blocks in length to be allocated, and we found
> one that is 79 blocks in length.

Sure

> 
> Finding a 79 block extent instead of an 80 block extent like blen
> indicated means that there was either:
> 
> - a block moved to the AGFL from that 80 block free extent prior to
>    doing the free extent search.
> - a block was busy and so was trimmed out via
>    xfs_alloc_compute_aligned()
> - the front edge wasn't aligned so it took a block off the front of
>    the free space to align it. It's this condition that code I added
>    above takes into account - an exact match on size does not imply
>    that aligned allocation of exactly that size can be done.
> 
> Given the front edge is aligned, I'd say it was the latter that
> occurred. The question is this: why wasn't the tail edge aligned
> down to make the extent length 76 blocks?
> 
>> Removing the specific change to modify blen seems to make things ok again.
> 
> Right, because now the allocation ends up being set up with
> args->minlen = args->maxlen = 80 and the allocation is unable to
> align the extent and meet the args->minlen requirement from that
> same unaligned 80 block free space. Hence that allocation fails and
> we fall back to a different allocation strategy that searches other
> AGs for a matching aligned allocation.

ok

> 
> IOWs, removing the 'blen -= args->alignment' code simply kicks the
> problem down the road until all AGs run out of 80 block contiguous
> extents.

right

> 
> This really smells like a tail alignment bug, not a problem with the
> allocation setup. Returning an extent that is 76 blocks in length
> fulfils the 4 block alignment requirement, so why did tail alignment
> fail?
> 
>> I will also note something strange which could be the issue, that being that
>> xfs_alloc_fix_len() does not fix this up - I thought that was its job.
> 
> Yes, it should fix this up.

As below, xfs_alloc_fix_len() does nothing (useful).

> 
>> Firstly, in this same scenario, in xfs_alloc_space_available() we calculate
>> alloc_len = args->minlen + (args->alignment - 1) + args->alignslop = 76 + (4
>> - 1) + 0 = 79, and then args->maxlen = 79.
> 
> That seems OK, we're doing aligned allocation and this is an ENOSPC
> corner case so the aligned allocation should get rounded down in
> xfs_alloc_fix_len() or rejected.
> 
> One thought I just had is that the args->maxlen adjustment shouldn't
> be to "available space" - it should probably be set to args->minlen
> because that's the aligned 'alloc_len' we checked available space
> against. That would fix this, because then we'd have args->minlen =
> args->maxlen = 76.
> 
> However, that only addresses this specific case, not the general
> case of xfs_alloc_fix_len() failing to tail align the allocated
> extent.
> 
>> Then xfs_alloc_fix_len() allows
>> this as args->len == args->maxlen (=79), even though args->prod, mod = 4, 0.
> 
> Yeah, that smells wrong.

Would it be worth adding a debug assert for prod and mod being honoured 
from the allocator? xfs_alloc_fix_len() does have an assert later on and 
it does not help here.

> 
> I'd suggest that we've never noticed this until now because we
> have never guaranteed extent alignment. Hence the occasional
> short/unaligned extent being allocated in dark ENOSPC corners was
> never an issue for anyone.
> 
> However, introducing a new alignment guarantee turns these sorts of
> latent non-issues into bugs that need to be fixed. i.e. This is
> exactly the sort of rare corner case behaviour I expected to be
> flushed out by guaranteeing and then extensively testing allocation
> alignments.
> 
> If you drop the rlen == args->maxlen check from
> xfs_alloc_space_available(),

I assume that you mean xfs_alloc_fix_len()

> the problem should go away and the
> extent gets trimmed to 76 blocks.

..if so, then, yes, it does. We end up with this:

    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
    1: [14080..14687]:  177344..177951    0 (177344..177951)   608
    2: [14688..14719]:  350720..350751    1 (171520..171551)    32

> This shouldn't affect anything
> else because maxlen allocations should already be properly aligned -
> it's only when something like ENOSPC causes args->maxlen to be
> modified to an unaligned value that this issue arises.
> 
> In the end, I suspect we'll want to make both changes....
> 
>> To me, that (args->alignment - 1) component in calculating alloc_len is odd.
>> I assume it is done as default args->alignment == 1.
> 
> No, it's done because guaranteeing aligned allocation requires
> selecting an aligned region from an unaligned free space. i.e.  when
> alignment is 4, then we can need up to 3 additional blocks to
> guarantee front alignment for a given length extent.
> i.e. we have to over-allocate to guarantee we can trim up
> to alignment at the front edge and still guarantee that the extent
> is as long as required by args->minlen/maxlen.
> 

ok, understood.

Thanks,
John
Dave Chinner April 2, 2024, 5:58 a.m. UTC | #4
On Tue, Mar 26, 2024 at 04:08:04PM +0000, John Garry wrote:
> On 20/03/2024 04:35, Dave Chinner wrote:
> 
> For some reason I never received this mail. I just noticed it on
> lore.kernel.org today by chance.
> 
> > On Wed, Mar 13, 2024 at 11:03:18AM +0000, John Garry wrote:
> > > On 06/03/2024 05:20, Dave Chinner wrote:
> > > >    		return false;
> > > > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > > > index 0b956f8b9d5a..aa2c103d98f0 100644
> > > > --- a/fs/xfs/libxfs/xfs_alloc.h
> > > > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > > > @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
> > > >    	xfs_extlen_t	minleft;	/* min blocks must be left after us */
> > > >    	xfs_extlen_t	total;		/* total blocks needed in xaction */
> > > >    	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> > > > -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> > > > +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
> > > >    	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
> > > >    	xfs_agblock_t	max_agbno;	/* ... */
> > > >    	xfs_extlen_t	len;		/* output: actual size of extent */
> > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > > index 656c95a22f2e..d56c82c07505 100644
> > > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > > @@ -3295,6 +3295,10 @@ xfs_bmap_select_minlen(
> > > >    	xfs_extlen_t		blen)
> > > 
> > > Hi Dave,
> > > 
> > > >    {
> > > > +	/* Adjust best length for extent start alignment. */
> > > > +	if (blen > args->alignment)
> > > > +		blen -= args->alignment;
> > > > +
> > > 
> > > This change seems to be causing or exposing some issue, in that I find that
> > > I am being allocated an extent which is aligned to but not a multiple of
> > > args->alignment.
> > 
> > Entirely possible the logic isn't correct ;)
> 
> Out of curiosity, how do you guys normally test all this sort of logic?

With difficulty.

Exercising all the weird corner cases is really hard because the
combinatory explosion that occurs when you have 20 control
parameters, up to 5 different failure fallback strategies,
behavioural variations with delayed allocation, ENOSPC and AGFL
refilling accounting variations, etc, means it's basically
impossible to enumerate and iterate the behaviour space fully.
And then we have filesystem geometry and application concurrency
to consider, too.

All of the behaviours up to this point in time are best effort - we
don't guarantee allocation policy is followed when there is not
enough free space to execute the preferred policy - we slowly fall
back to mechanisms that are further from the policy but more likely
to succeed. i.e. as we approach ENOSPC, the allocation policies get
"looser" - they are less restrictive and more variable and don't
give as good results as when there is plenty of free space for the
allocation policy to make good decisions from.

As such, I only check that macro-level behaviour when there is lots
of free space is largely correct. e.g. by doing something like
copying a kernel tree onto a new filesystem, then checking inode
locality follows directories, block locality follows inodes, large
files are stripe aligned, extent size hint based inodes appear to
have the correct extent sizes, etc.

I then rely on the ENOSPC tests in fstests to find regressions that
might occur when the filesystem is stressed with little free space
available. These are a whole lot better than they used to be; root
cause analysis of ENOSPC corner case bugs has consumed months of my
working life over the past 20 years....

> I found this issue with the small program which I wrote to generate traffic.
> I could not find anything similar.

That's because it's largely impossible to write a test that is
deterministic and works on all possible test configurations. Even
changing the size of the filesystem even slightly can result in
vastly different but still 100% correct allocation
behaviour....

> > > Firstly, in this same scenario, in xfs_alloc_space_available() we calculate
> > > alloc_len = args->minlen + (args->alignment - 1) + args->alignslop = 76 + (4
> > > - 1) + 0 = 79, and then args->maxlen = 79.
> > 
> > That seems OK, we're doing aligned allocation and this is an ENOSPC
> > corner case so the aligned allocation should get rounded down in
> > xfs_alloc_fix_len() or rejected.
> > 
> > One thought I just had is that the args->maxlen adjustment shouldn't
> > be to "available space" - it should probably be set to args->minlen
> > because that's the aligned 'alloc_len' we checked available space
> > against. That would fix this, because then we'd have args->minlen =
> > args->maxlen = 76.
> > 
> > However, that only addresses this specific case, not the general
> > case of xfs_alloc_fix_len() failing to tail align the allocated
> > extent.
> > 
> > > Then xfs_alloc_fix_len() allows
> > > this as args->len == args->maxlen (=79), even though args->prod, mod = 4, 0.
> > 
> > Yeah, that smells wrong.
> 
> Would it be worth adding a debug assert for prod and mod being honoured from
> the allocator? xfs_alloc_fix_len() does have an assert later on and it does
> not help here.

I don't see any value in that because it's not actually a "fatal"
issue. See above about trading off policy strictness for allocation
success.

Again, this force alignment stuff is a fundamental change in this
behaviour - it wants "hard failure" rather than "trade off" and so
there isn't a general case for asserting that allocation must be
mod/prod aligned. Extent size hints are a -hint-, not a requirement,
and I don't want random assert failures in test systems because
debug kernels start treating hints as "must not fail" requirements.

> > I'd suggest that we've never noticed this until now because we
> > have never guaranteed extent alignment. Hence the occasional
> > short/unaligned extent being allocated in dark ENOSPC corners was
> > never an issue for anyone.
> > 
> > However, introducing a new alignment guarantee turns these sorts of
> > latent non-issues into bugs that need to be fixed. i.e. This is
> > exactly the sort of rare corner case behaviour I expected to be
> > flushed out by guaranteeing and then extensively testing allocation
> > alignments.
> > 
> > If you drop the rlen == args->maxlen check from
> > xfs_alloc_space_available(),
> 
> I assume that you mean xfs_alloc_fix_len()

Yes.

> > the problem should go away and the
> > extent gets trimmed to 76 blocks.
> 
> ..if so, then, yes, it does. We end up with this:
> 
>    0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>    1: [14080..14687]:  177344..177951    0 (177344..177951)   608
>    2: [14688..14719]:  350720..350751    1 (171520..171551)    32

Good, that's how it should work. :) 

I'll update the patchset I have with these fixes.

-Dave.
John Garry April 2, 2024, 7:49 a.m. UTC | #5
>>> the problem should go away and the
>>> extent gets trimmed to 76 blocks.
>> ..if so, then, yes, it does. We end up with this:
>>
>>     0: [0..14079]:      42432..56511      0 (42432..56511)   14080
>>     1: [14080..14687]:  177344..177951    0 (177344..177951)   608
>>     2: [14688..14719]:  350720..350751    1 (171520..171551)    32
> Good, that's how it should work. 
John Garry April 2, 2024, 3:11 p.m. UTC | #6
On 02/04/2024 08:49, John Garry wrote:
> Update:
> So I have some more patches from trying to support both truncate and 
> fallocate + punch/insert/collapse for forcealign.
> 
> I seem to have at least 2x problems:
> - unexpected -ENOSPC in some case

This -ENOSPC seems related to xfs_bmap_select_minlen() again.

I find that it occurs when calling xfs_bmap_select_minlen() and blen == 
maxlen again, like:
blen=64 args->alignment=16, minlen=0, maxlen=64

And then this gives:
args->minlen=48 blen=64

But xfs_alloc_vextent_start_ag() -> xfs_alloc_vextent_iterate_ags() does 
not seem to find something suitable.

I'm continuing to look...
Dave Chinner April 2, 2024, 9:26 p.m. UTC | #7
On Tue, Apr 02, 2024 at 04:11:20PM +0100, John Garry wrote:
> On 02/04/2024 08:49, John Garry wrote:
> > Update:
> > So I have some more patches from trying to support both truncate and
> > fallocate + punch/insert/collapse for forcealign.
> > 
> > I seem to have at least 2x problems:
> > - unexpected -ENOSPC in some case
> 
> This -ENOSPC seems related to xfs_bmap_select_minlen() again.
> 
> I find that it occurs when calling xfs_bmap_select_minlen() and blen ==
> maxlen again, like:
> blen=64 args->alignment=16, minlen=0, maxlen=64
> 
> And then this gives:
> args->minlen=48 blen=64

Which is perfectly reasonable - it fits the bounds specified just
fine, and we'll get a 64 block allocation if that free space is
exactly aligned. Otherwise we'll get a 48 block allocation.

> But xfs_alloc_vextent_start_ag() -> xfs_alloc_vextent_iterate_ags() does not
> seem to find something suitable.

Entirely possible. The AGFL might have needed refilling, so there
really wasn't enough blocks remaining for an aligned allocation to
take place after doing that. That's a real ENOSPC condition, despite
the best length sampling indicating that the allocation could be
done.

Remember, bestlen sampling is racy - it does not hold the AGF
locked from the point of sampling to the point of allocation. Hence
when we finally go to do the allocation after setting it all up, we
might have raced with another allocation that took the free space
sampled during the bestlen pass and so then the allocation fails
despite the setup saying it should succeed.

Fundamentally, if the filesystem's best free space length is the
same size as the requested allocation, *failure is expected* and the
fallback attempts progressively remove restrictions (such as
alignment) to allow sub-optimal extents to be allocated down to
minlen in size. Forced alignment turns off these fallbacks, so you
are going to see hard ENOSPC errors the moment the filesystem runs
out of contiguous free space extents large enough to hold aligned
allocations.

This can happen a -long- way away from a real enospc condition -
depending on alignment constraints, you can start seeing this sort
of behaviour (IIRC my math correctly) at around 70% full. The larger
the alignment and the older the filesystem, the earlier this sort of
ENOSPC event will occur.

Use `xfs_spaceman -c 'freesp'` to dump the free space extent size
histogram. That will quickly tell you if there is no remaining free
extents large enough for an aligned 48 block allocation to succeed.
With an alignment of 16 blocks, this requires at least a 63 block
free space extent to succeed.

IOWs, we should expect ENOSPC to occur long before the filesystem
reports that it is out of space when we are doing forced alignment
allocation.

-Dave.
Dave Chinner April 2, 2024, 11:44 p.m. UTC | #8
On Tue, Apr 02, 2024 at 08:49:09AM +0100, John Garry wrote:
> 
> > > > the problem should go away and the
> > > > extent gets trimmed to 76 blocks.
> > > ..if so, then, yes, it does. We end up with this:
> > > 
> > >     0: [0..14079]:      42432..56511      0 (42432..56511)   14080
> > >     1: [14080..14687]:  177344..177951    0 (177344..177951)   608
> > >     2: [14688..14719]:  350720..350751    1 (171520..171551)    32
> > Good, that's how it should work. 
John Garry April 3, 2024, 8:49 a.m. UTC | #9
On 02/04/2024 22:26, Dave Chinner wrote:
>> And then this gives:
>> args->minlen=48 blen=64
> Which is perfectly reasonable - it fits the bounds specified just
> fine, and we'll get a 64 block allocation if that free space is
> exactly aligned. Otherwise we'll get a 48 block allocation.
> 
>> But xfs_alloc_vextent_start_ag() -> xfs_alloc_vextent_iterate_ags() does not
>> seem to find something suitable.
> Entirely possible. The AGFL might have needed refilling, so there
> really wasn't enough blocks remaining for an aligned allocation to
> take place after doing that. That's a real ENOSPC condition, despite
> the best length sampling indicating that the allocation could be
> done.
> 
> Remember, bestlen sampling is racy - it does not hold the AGF
> locked from the point of sampling to the point of allocation.

ok, I assumed that some lock was held.

> Hence
> when we finally go to do the allocation after setting it all up, we
> might have raced with another allocation that took the free space
> sampled during the bestlen pass and so then the allocation fails
> despite the setup saying it should succeed.

My test is single threaded, so I did not think that would be an issue.

> 
> Fundamentally, if the filesystem's best free space length is the
> same size as the requested allocation,*failure is expected*  and the
> fallback attempts progressively remove restrictions (such as
> alignment) to allow sub-optimal extents to be allocated down to
> minlen in size. Forced alignment turns off these fallbacks, so you
> are going to see hard ENOSPC errors the moment the filesystem runs
> out of contiguous free space extents large enough to hold aligned
> allocations.
> 
> This can happen a -long- way away from a real enospc condition -
> depending on alignment constraints, you can start seeing this sort
> of behaviour (IIRC my math correctly) at around 70% full. The larger
> the alignment and the older the filesystem, the earlier this sort of
> ENOSPC event will occur.

For this scenario, statvfs returns - as a sample - f_blocks=73216, 
f_bfree=19950, f_bavail=19950

So ~27% free.

> 
> Use `xfs_spaceman -c 'freesp'` to dump the free space extent size
> histogram. That will quickly tell you if there is no remaining free
> extents large enough for an aligned 48 block allocation to succeed.
> With an alignment of 16 blocks, this requires at least a 63 block
> free space extent to succeed.

# xfs_spaceman -c 'freesp' /root/mnt2/
    from      to extents  blocks    pct
       4       7       4      25   0.10
      16      31      90    1440   5.77
      32      63      12     400   1.60
      64     127       1      64   0.26
     512    1023       1     640   2.56
   16384   22400       1   22390  89.71

> 
> IOWs, we should expect ENOSPC to occur long before the filesystem
> reports that it is out of space when we are doing forced alignment
> allocation.

For my test, once ENOSPC occurs and statvfs tells us more than 10% space 
free, we exit as something seems wrong. As you say, deducing an error 
for this condition may not be the proper thing to do.

I do also note that if I then manually attempt to write the same data to 
that same empty file after the test exits, it succeeds. So something 
racy. I also notice that the FSB range we scan in 
xfs_alloc_ag_vextent_size() -> xfs_alloc_compute_aligned() ->
xfs_extent_busy_trim() returns busy=1 when ENOSPC occurs - I have not 
checked that further.

Thanks,
John
John Garry April 3, 2024, 11:30 a.m. UTC | #10
On 03/04/2024 00:44, Dave Chinner wrote:
>> I seem to have at least 2x problems:
>> - unexpected -ENOSPC in some case
>> - sometimes misaligned extents from ordered combo of punch, truncate, write
> Punch and truncate do not enforce extent alignment and never have.
> They will trim extents to whatever the new EOF block is supposed to
> be.  This is by design - they are intended to be able to remove
> extents beyond EOF that may have been done due to extent size hints
> and/or speculative delayed allocation to minimise wasted space.
> 
> Again, forced alignment is introducing new constraints, so anything
> that is truncating EOF blocks (punch, truncate, eof block freeing
> during inactivation or blockgc) is going to need to take forced
> extent alignment constraints into account.

Sure

> 
> This is likely something that needs to be done in
> xfs_itruncate_extents_flags() for the truncate/inactivation/blockgc
> cases (i.e. correct calculation of first_unmap_block). Punching and
> insert/collapse are a bit more complex - they'll need their
> start/end offsets to be aligned, the chunk sizes they work in to be
> aligned, and the rounding in xfs_flush_unmap_range() to be forced
> alignment aware.

Ack

> 
>> I don't know if it is a good use of time for me to try to debug, as I guess
>> you could spot problems in the changes almost immediately.
>>
>> Next steps:
>> I would like to send out a new series for XFS support for atomic writes
>> soon, which so far included forcealign support.
>>
>> Please advise on your preference for what I should do, like wait for your
>> forcealign update and then combine into the XFS support for atomic write
>> series. Or just post the doubtful patches now, as above, and go from there?
> I just sent out the forced allocation alignment series for review.

cheers, I'll give them a spin.

> Forced truncate/punch extent alignment will need to be implemented
> and reviewed as a separate patch set...

Below are my changes.

I think that the xfs_is_falloc_aligned() change is sound. As for the 
other two, I'm really not sure.

There is also 
https://lore.kernel.org/linux-xfs/ZeeaKrmVEkcXYjbK@dread.disaster.area/T/#m73430d56d96e60f2908bab9ce3e6a0d56d27929c, 
which still is still a candidate change.

Please check them, below.

Thanks,
John

------>8-------

 From ec86dd3add7153062a612cb1141f36544f34e0cd Mon Sep 17 00:00:00 2001
From: John Garry <john.g.garry@oracle.com>
Date: Wed, 13 Mar 2024 18:14:37 +0000
Subject: [PATCH 1/3] fs: xfs: Update xfs_is_falloc_aligned() mask forcealign

For when forcealign is enabled, we want the alignment mask to cover an
aligned extent, similar to rtvol.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
  fs/xfs/xfs_file.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 632653e00906..e81e01e6b22b 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -61,7 +61,10 @@ xfs_is_falloc_aligned(
  		}
  		mask = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) - 1;
  	} else {
-		mask = mp->m_sb.sb_blocksize - 1;
+		if (xfs_inode_has_forcealign(ip) && ip->i_extsize > 1)
+			mask = (mp->m_sb.sb_blocksize * ip->i_extsize) - 1;
+		else
+			mask = mp->m_sb.sb_blocksize - 1;
  	}

  	return !((pos | len) & mask);



------8<--------


 From c0866d2a5753f1c487872ef3add4e08c03c22d00 Mon Sep 17 00:00:00 2001
From: John Garry <john.g.garry@oracle.com>
Date: Fri, 22 Mar 2024 11:24:45 +0000
Subject: [PATCH 2/3] fs: xfs: Unmap blocks according to forcealign

For when forcealign is enabled, blocks in an inode need to be unmapped
according to extent alignment, like what is already done for rtvol.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
  fs/xfs/libxfs/xfs_bmap.c | 41 ++++++++++++++++++++++++++++++++++------
  fs/xfs/xfs_inode.h       |  5 +++++
  2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7a0ef0900097..5dd7a62625db 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5322,6 +5322,15 @@ xfs_bmap_del_extent_real(
  	return 0;
  }

+/* Return the offset of an block number within an extent for forcealign. */
+static xfs_extlen_t
+xfs_forcealign_extent_offset(
+	struct xfs_inode	*ip,
+	xfs_fsblock_t		bno)
+{
+	return bno & (ip->i_extsize - 1);
+}
+
  /*
   * Unmap (remove) blocks from a file.
   * If nexts is nonzero then the number of extents to remove is limited to
@@ -5344,6 +5353,7 @@ __xfs_bunmapi(
  	struct xfs_bmbt_irec	got;		/* current extent record */
  	struct xfs_ifork	*ifp;		/* inode fork pointer */
  	int			isrt;		/* freeing in rt area */
+	int			isforcealign;	/* freeing for file inode with forcealign */
  	int			logflags;	/* transaction logging flags */
  	xfs_extlen_t		mod;		/* rt extent offset */
  	struct xfs_mount	*mp = ip->i_mount;
@@ -5380,7 +5390,10 @@ __xfs_bunmapi(
  		return 0;
  	}
  	XFS_STATS_INC(mp, xs_blk_unmap);
-	isrt = xfs_ifork_is_realtime(ip, whichfork);
+	isrt = (whichfork == XFS_DATA_FORK) && XFS_IS_REALTIME_INODE(ip);
+	isforcealign = (whichfork == XFS_DATA_FORK) &&
+			xfs_inode_has_forcealign(ip) &&
+			xfs_inode_has_extsize(ip) && ip->i_extsize > 1;
  	end = start + len;

  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
@@ -5442,11 +5455,17 @@ __xfs_bunmapi(
  		if (del.br_startoff + del.br_blockcount > end + 1)
  			del.br_blockcount = end + 1 - del.br_startoff;

-		if (!isrt || (flags & XFS_BMAPI_REMAP))
+		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
  			goto delete;

-		mod = xfs_rtb_to_rtxoff(mp,
-				del.br_startblock + del.br_blockcount);
+		if (isrt)
+			mod = xfs_rtb_to_rtxoff(mp,
+					del.br_startblock + del.br_blockcount);
+		else if (isforcealign)
+			mod = xfs_forcealign_extent_offset(ip,
+					del.br_startblock + del.br_blockcount);
  		if (mod) {
  			/*
  			 * Realtime extent not lined up at the end.
@@ -5494,9 +5513,19 @@ __xfs_bunmapi(
  			goto nodelete;
  		}

-		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
+		if (isrt)
+			mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
+		else if (isforcealign)
+			mod = xfs_forcealign_extent_offset(ip,
+					del.br_startblock);
+
  		if (mod) {
-			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
+			xfs_extlen_t off;
+			if (isrt)
+				off = mp->m_sb.sb_rextsize - mod;
+			else if (isforcealign) {
+				off = ip->i_extsize - mod;
+			}

  			/*
  			 * Realtime extent is lined up at the end but not
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 065028789473..3f13943ab3a3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -316,6 +316,11 @@ static inline bool xfs_inode_has_forcealign(struct 
xfs_inode *ip)
  	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
  }

+static inline bool xfs_inode_has_extsize(struct xfs_inode *ip)
+{
+	return ip->i_diflags & XFS_DIFLAG_EXTSIZE;
+}
+
  /*
   * Return the buftarg used for data allocations on a given inode.



------>8-------

 From 8cb2b61fa419b961d22609e12f2d941ce976d0f0 Mon Sep 17 00:00:00 2001
From: John Garry <john.g.garry@oracle.com>
Date: Wed, 20 Mar 2024 13:05:39 +0000
Subject: [PATCH 3/3] fs: xfs: Only free full extents for forcealign

Like we already do for rtvol, only free full extents for forcealign in
xfs_free_file_space().

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
  fs/xfs/xfs_bmap_util.c | 7 +++++--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 178a4865d1ed..e3e6e27a33bf 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -848,8 +848,11 @@ xfs_free_file_space(
  	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
  	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);

-	/* We can only free complete realtime extents. */
-	if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) {
+	/* Free only complete extents. */
+	if (xfs_inode_has_forcealign(ip) && ip->i_extsize > 1) {
+		startoffset_fsb = roundup_64(startoffset_fsb, ip->i_extsize);
+		endoffset_fsb = rounddown_64(endoffset_fsb, ip->i_extsize);
+	} else if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) {
  		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
  		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
  	}

------8<--------
EOM
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 9da52e92172a..5ed9c8a96e32 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2395,7 +2395,7 @@  xfs_alloc_space_available(
 	reservation = xfs_ag_resv_needed(pag, args->resv);
 
 	/* do we have enough contiguous free space for the allocation? */
-	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
 	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
 	if (longest < alloc_len)
 		return false;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 0b956f8b9d5a..aa2c103d98f0 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -46,7 +46,7 @@  typedef struct xfs_alloc_arg {
 	xfs_extlen_t	minleft;	/* min blocks must be left after us */
 	xfs_extlen_t	total;		/* total blocks needed in xaction */
 	xfs_extlen_t	alignment;	/* align answer to multiple of this */
-	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
+	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
 	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
 	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 656c95a22f2e..d56c82c07505 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3295,6 +3295,10 @@  xfs_bmap_select_minlen(
 	xfs_extlen_t		blen)
 {
 
+	/* Adjust best length for extent start alignment. */
+	if (blen > args->alignment)
+		blen -= args->alignment;
+
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
 	 * possible that there is enough contiguous free space for this request.
@@ -3310,6 +3314,7 @@  xfs_bmap_select_minlen(
 	if (blen < args->maxlen)
 		return blen;
 	return args->maxlen;
+
 }
 
 static int
@@ -3403,35 +3408,43 @@  xfs_bmap_alloc_account(
 	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
 }
 
-static int
+/*
+ * Calculate the extent start alignment and the extent length adjustments that
+ * constrain this allocation.
+ *
+ * Extent start alignment is currently determined by stripe configuration and is
+ * carried in args->alignment, whilst extent length adjustment is determined by
+ * extent size hints and is carried by args->prod and args->mod.
+ *
+ * Low level allocation code is free to either ignore or override these values
+ * as required.
+ */
+static void
 xfs_bmap_compute_alignments(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
-	int			stripe_align = 0;
 
 	/* stripe alignment for allocation is determined by mount parameters */
 	if (mp->m_swidth && xfs_has_swalloc(mp))
-		stripe_align = mp->m_swidth;
+		args->alignment = mp->m_swidth;
 	else if (mp->m_dalign)
-		stripe_align = mp->m_dalign;
+		args->alignment = mp->m_dalign;
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
 					&ap->length))
 			ASSERT(0);
 		ASSERT(ap->length);
-	}
 
-	/* apply extent size hints if obtained earlier */
-	if (align) {
 		args->prod = align;
 		div_u64_rem(ap->offset, args->prod, &args->mod);
 		if (args->mod)
@@ -3446,7 +3459,6 @@  xfs_bmap_compute_alignments(
 			args->mod = args->prod - args->mod;
 	}
 
-	return stripe_align;
 }
 
 static void
@@ -3518,7 +3530,7 @@  xfs_bmap_exact_minlen_extent_alloc(
 	args.total = ap->total;
 
 	args.alignment = 1;
-	args.minalignslop = 0;
+	args.alignslop = 0;
 
 	args.minleft = ap->minleft;
 	args.wasdel = ap->wasdel;
@@ -3558,7 +3570,6 @@  xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen,
-	int			stripe_align,
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
@@ -3572,23 +3583,15 @@  xfs_bmap_btalloc_at_eof(
 	 * allocation.
 	 */
 	if (ap->offset) {
-		xfs_extlen_t	nextminlen = 0;
+		xfs_extlen_t	alignment = args->alignment;
 
 		/*
-		 * Compute the minlen+alignment for the next case.  Set slop so
-		 * that the value of minlen+alignment+slop doesn't go up between
-		 * the calls.
+		 * Compute the alignment slop for the fallback path so we ensure
+		 * we account for the potential alignemnt space required by the
+		 * fallback paths before we modify the AGF and AGFL here.
 		 */
 		args->alignment = 1;
-		if (blen > stripe_align && blen <= args->maxlen)
-			nextminlen = blen - stripe_align;
-		else
-			nextminlen = args->minlen;
-		if (nextminlen + stripe_align > args->minlen + 1)
-			args->minalignslop = nextminlen + stripe_align -
-					args->minlen - 1;
-		else
-			args->minalignslop = 0;
+		args->alignslop = alignment - args->alignment;
 
 		if (!caller_pag)
 			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
@@ -3606,19 +3609,8 @@  xfs_bmap_btalloc_at_eof(
 		 * Exact allocation failed. Reset to try an aligned allocation
 		 * according to the original allocation specification.
 		 */
-		args->alignment = stripe_align;
-		args->minlen = nextminlen;
-		args->minalignslop = 0;
-	} else {
-		/*
-		 * Adjust minlen to try and preserve alignment if we
-		 * can't guarantee an aligned maxlen extent.
-		 */
-		args->alignment = stripe_align;
-		if (blen > args->alignment &&
-		    blen <= args->maxlen + args->alignment)
-			args->minlen = blen - args->alignment;
-		args->minalignslop = 0;
+		args->alignment = alignment;
+		args->alignslop = 0;
 	}
 
 	if (ag_only) {
@@ -3636,9 +3628,8 @@  xfs_bmap_btalloc_at_eof(
 		return 0;
 
 	/*
-	 * Allocation failed, so turn return the allocation args to their
-	 * original non-aligned state so the caller can proceed on allocation
-	 * failure as if this function was never called.
+	 * Aligned allocation failed, so all fallback paths from here drop the
+	 * start alignment requirement as we know it will not succeed.
 	 */
 	args->alignment = 1;
 	return 0;
@@ -3646,7 +3637,9 @@  xfs_bmap_btalloc_at_eof(
 
 /*
  * We have failed multiple allocation attempts so now are in a low space
- * allocation situation. Try a locality first full filesystem minimum length
+ * allocation situation. We give up on any attempt at aligned allocation here.
+ *
+ * Try a locality first full filesystem minimum length
  * allocation whilst still maintaining necessary total block reservation
  * requirements.
  *
@@ -3663,6 +3656,7 @@  xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
@@ -3682,13 +3676,11 @@  xfs_bmap_btalloc_low_space(
 static int
 xfs_bmap_btalloc_filestreams(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
-
 	error = xfs_filestream_select_ag(ap, args, &blen);
 	if (error)
 		return error;
@@ -3707,8 +3699,7 @@  xfs_bmap_btalloc_filestreams(
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				true);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
 
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
@@ -3732,8 +3723,7 @@  xfs_bmap_btalloc_filestreams(
 static int
 xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error;
@@ -3757,8 +3747,7 @@  xfs_bmap_btalloc_best_length(
 	 * trying.
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				false);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
 		if (error || args->fsbno != NULLFSBLOCK)
 			return error;
 	}
@@ -3785,27 +3774,26 @@  xfs_bmap_btalloc(
 		.resv		= XFS_AG_RESV_NONE,
 		.datatype	= ap->datatype,
 		.alignment	= 1,
-		.minalignslop	= 0,
+		.alignslop	= 0,
 	};
 	xfs_fileoff_t		orig_offset;
 	xfs_extlen_t		orig_length;
 	int			error;
-	int			stripe_align;
 
 	ASSERT(ap->length);
 	orig_offset = ap->offset;
 	orig_length = ap->length;
 
-	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+	xfs_bmap_compute_alignments(ap, &args);
 
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip))
-		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_filestreams(ap, &args);
 	else
-		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_best_length(ap, &args);
 	if (error)
 		return error;