diff mbox series

[13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag

Message ID 20230929102726.2985188-14-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series block atomic writes | expand

Commit Message

John Garry Sept. 29, 2023, 10:27 a.m. UTC
From: "Darrick J. Wong" <djwong@kernel.org>

The existing extsize hint code already did the work of expanding file
range mapping requests so that the range is aligned to the hint value.
Now add the code we need to guarantee that the space allocations are
also always aligned.

XXX: still need to check all this with reflink

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Co-developed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 22 +++++++++++++++++-----
 fs/xfs/xfs_iomap.c       |  4 +++-
 2 files changed, 20 insertions(+), 6 deletions(-)

Comments

Dave Chinner Oct. 3, 2023, 1:42 a.m. UTC | #1
On Fri, Sep 29, 2023 at 10:27:18AM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> The existing extsize hint code already did the work of expanding file
> range mapping requests so that the range is aligned to the hint value.
> Now add the code we need to guarantee that the space allocations are
> also always aligned.
> 
> XXX: still need to check all this with reflink
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Co-developed-by: John Garry <john.g.garry@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 22 +++++++++++++++++-----
>  fs/xfs/xfs_iomap.c       |  4 +++-
>  2 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 328134c22104..6c864dc0a6ff 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3328,6 +3328,19 @@ xfs_bmap_compute_alignments(
>  		align = xfs_get_cowextsz_hint(ap->ip);
>  	else if (ap->datatype & XFS_ALLOC_USERDATA)
>  		align = xfs_get_extsz_hint(ap->ip);
> +
> +	/*
> +	 * xfs_get_cowextsz_hint() returns extsz_hint for when forcealign is
> +	 * set as forcealign and cowextsz_hint are mutually exclusive
> +	 */
> +	if (xfs_inode_forcealign(ap->ip) && align) {
> +		args->alignment = align;
> +		if (stripe_align % align)
> +			stripe_align = align;
> +	} else {
> +		args->alignment = 1;
> +	}

This smells wrong.

If a filesystem has a stripe unit set (hence stripe_align is
non-zero) then any IO that crosses stripe unit boundaries will not
be atomic - they will require multiple IOs to different devices.

Hence if the filesystem has a stripe unit set, then all forced
alignment hints for atomic IO *must* be an exact integer divider
of the stripe unit. hence when an atomic IO bundle is aligned, the
atomic boundaries within the bundle always fall on a stripe unit
boundary and never cross devices.

IOWs, for a striped filesystem, the maximum size/alignment for a
single atomic IO unit is the stripe unit.

This should be enforced when the forced align flag is set on the
inode (i.e. from the ioctl)


> +
>  	if (align) {
>  		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>  					ap->eof, 0, ap->conv, &ap->offset,
> @@ -3423,7 +3436,6 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	args.minlen = args.maxlen = ap->minlen;
>  	args.total = ap->total;
>  
> -	args.alignment = 1;
>  	args.minalignslop = 0;
>  
>  	args.minleft = ap->minleft;
> @@ -3469,6 +3481,7 @@ xfs_bmap_btalloc_at_eof(
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*caller_pag = args->pag;
> +	int			orig_alignment = args->alignment;
>  	int			error;
>  
>  	/*
> @@ -3543,10 +3556,10 @@ xfs_bmap_btalloc_at_eof(
>  
>  	/*
>  	 * Allocation failed, so turn return the allocation args to their
> -	 * original non-aligned state so the caller can proceed on allocation
> -	 * failure as if this function was never called.
> +	 * original state so the caller can proceed on allocation failure as
> +	 * if this function was never called.
>  	 */
> -	args->alignment = 1;
> +	args->alignment = orig_alignment;
>  	return 0;
>  }

Urk. Not sure that is right, it's certainly a change of behaviour.

> @@ -3694,7 +3707,6 @@ xfs_bmap_btalloc(
>  		.wasdel		= ap->wasdel,
>  		.resv		= XFS_AG_RESV_NONE,
>  		.datatype	= ap->datatype,
> -		.alignment	= 1,
>  		.minalignslop	= 0,
>  	};
>  	xfs_fileoff_t		orig_offset;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 18c8f168b153..70fe873951f3 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -181,7 +181,9 @@ xfs_eof_alignment(
>  		 * If mounted with the "-o swalloc" option the alignment is
>  		 * increased from the strip unit size to the stripe width.
>  		 */
> -		if (mp->m_swidth && xfs_has_swalloc(mp))
> +		if (xfs_inode_forcealign(ip))
> +			align = xfs_get_extsz_hint(ip);
> +		else if (mp->m_swidth && xfs_has_swalloc(mp))
>  			align = mp->m_swidth;
>  		else if (mp->m_dalign)
>  			align = mp->m_dalign;

Ah. Now I see. This abuses the stripe alignment code to try to
implement this new inode allocation alignment restriction, rather
than just making the extent size hint alignment mandatory....

Yeah, this can be done better... :)

As it is, I have been working on a series that reworks all this
allocator code to separate out the aligned IO from the exact EOF
allocation case to help clean this up for better perag selection
during allocation. I think that needs to be done first before we go
making the alignment code more intricate like this....

-Dave.
John Garry Oct. 3, 2023, 10:13 a.m. UTC | #2
On 03/10/2023 02:42, Dave Chinner wrote:
> On Fri, Sep 29, 2023 at 10:27:18AM +0000, John Garry wrote:
>> From: "Darrick J. Wong" <djwong@kernel.org>
>>
>> The existing extsize hint code already did the work of expanding file
>> range mapping requests so that the range is aligned to the hint value.
>> Now add the code we need to guarantee that the space allocations are
>> also always aligned.
>>
>> XXX: still need to check all this with reflink
>>
>> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>> Co-developed-by: John Garry <john.g.garry@oracle.com>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_bmap.c | 22 +++++++++++++++++-----
>>   fs/xfs/xfs_iomap.c       |  4 +++-
>>   2 files changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 328134c22104..6c864dc0a6ff 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -3328,6 +3328,19 @@ xfs_bmap_compute_alignments(
>>   		align = xfs_get_cowextsz_hint(ap->ip);
>>   	else if (ap->datatype & XFS_ALLOC_USERDATA)
>>   		align = xfs_get_extsz_hint(ap->ip);
>> +
>> +	/*
>> +	 * xfs_get_cowextsz_hint() returns extsz_hint for when forcealign is
>> +	 * set as forcealign and cowextsz_hint are mutually exclusive
>> +	 */
>> +	if (xfs_inode_forcealign(ap->ip) && align) {
>> +		args->alignment = align;
>> +		if (stripe_align % align)
>> +			stripe_align = align;
>> +	} else {
>> +		args->alignment = 1;
>> +	}
> 
> This smells wrong.
> 
> If a filesystem has a stripe unit set (hence stripe_align is
> non-zero) then any IO that crosses stripe unit boundaries will not
> be atomic - they will require multiple IOs to different devices.
> 
> Hence if the filesystem has a stripe unit set, then all forced
> alignment hints for atomic IO *must* be an exact integer divider
> of the stripe unit. hence when an atomic IO bundle is aligned, the
> atomic boundaries within the bundle always fall on a stripe unit
> boundary and never cross devices.
> 
> IOWs, for a striped filesystem, the maximum size/alignment for a
> single atomic IO unit is the stripe unit.
> 

ok, when I added this I was looking at being robust against wacky 
scenarios when that is not true, like forcealign = stripe alignment * 2.

Please note that this forcealign feature is being added with the view 
that it can be useful for other scenarios, and not just atomic writes.

> This should be enforced when the forced align flag is set on the
> inode (i.e. from the ioctl)

ok, fine.

> 
> 
>> +
>>   	if (align) {
>>   		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>>   					ap->eof, 0, ap->conv, &ap->offset,
>> @@ -3423,7 +3436,6 @@ xfs_bmap_exact_minlen_extent_alloc(
>>   	args.minlen = args.maxlen = ap->minlen;
>>   	args.total = ap->total;
>>   
>> -	args.alignment = 1;
>>   	args.minalignslop = 0;
>>   
>>   	args.minleft = ap->minleft;
>> @@ -3469,6 +3481,7 @@ xfs_bmap_btalloc_at_eof(
>>   {
>>   	struct xfs_mount	*mp = args->mp;
>>   	struct xfs_perag	*caller_pag = args->pag;
>> +	int			orig_alignment = args->alignment;
>>   	int			error;
>>   
>>   	/*
>> @@ -3543,10 +3556,10 @@ xfs_bmap_btalloc_at_eof(
>>   
>>   	/*
>>   	 * Allocation failed, so turn return the allocation args to their
>> -	 * original non-aligned state so the caller can proceed on allocation
>> -	 * failure as if this function was never called.
>> +	 * original state so the caller can proceed on allocation failure as
>> +	 * if this function was never called.
>>   	 */
>> -	args->alignment = 1;
>> +	args->alignment = orig_alignment;
>>   	return 0;
>>   }
> 
> Urk. Not sure that is right, it's certainly a change of behaviour.

Is it really a change in behaviour? We just restore the args->alignment 
value, which was originally always 1.

As described in the comment, above, args->alignment is temporarily set 
to the stripe align to try to align a new alloc on a stripe boundary.

> 
>> @@ -3694,7 +3707,6 @@ xfs_bmap_btalloc(
>>   		.wasdel		= ap->wasdel,
>>   		.resv		= XFS_AG_RESV_NONE,
>>   		.datatype	= ap->datatype,
>> -		.alignment	= 1,
>>   		.minalignslop	= 0,
>>   	};
>>   	xfs_fileoff_t		orig_offset;
>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>> index 18c8f168b153..70fe873951f3 100644
>> --- a/fs/xfs/xfs_iomap.c
>> +++ b/fs/xfs/xfs_iomap.c
>> @@ -181,7 +181,9 @@ xfs_eof_alignment(
>>   		 * If mounted with the "-o swalloc" option the alignment is
>>   		 * increased from the strip unit size to the stripe width.
>>   		 */
>> -		if (mp->m_swidth && xfs_has_swalloc(mp))
>> +		if (xfs_inode_forcealign(ip))
>> +			align = xfs_get_extsz_hint(ip);
>> +		else if (mp->m_swidth && xfs_has_swalloc(mp))
>>   			align = mp->m_swidth;
>>   		else if (mp->m_dalign)
>>   			align = mp->m_dalign;
> 
> Ah. Now I see. This abuses the stripe alignment code to try to
> implement this new inode allocation alignment restriction, rather
> than just making the extent size hint alignment mandatory....
> 
> Yeah, this can be done better... :)
> 
> As it is, I have been working on a series that reworks all this
> allocator code to separate out the aligned IO from the exact EOF
> allocation case to help clean this up for better perag selection
> during allocation. I think that needs to be done first before we go
> making the alignment code more intricate like this....
> 
> -Dave.

ok, fine. I think that we'll just keep this code as is until that code 
you mention appears, apart from enforcing stripe alignment % forcealign 
== 0.

Thanks,
John
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 328134c22104..6c864dc0a6ff 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3328,6 +3328,19 @@  xfs_bmap_compute_alignments(
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
+	/*
+	 * xfs_get_cowextsz_hint() returns extsz_hint for when forcealign is
+	 * set as forcealign and cowextsz_hint are mutually exclusive
+	 */
+	if (xfs_inode_forcealign(ap->ip) && align) {
+		args->alignment = align;
+		if (stripe_align % align)
+			stripe_align = align;
+	} else {
+		args->alignment = 1;
+	}
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
@@ -3423,7 +3436,6 @@  xfs_bmap_exact_minlen_extent_alloc(
 	args.minlen = args.maxlen = ap->minlen;
 	args.total = ap->total;
 
-	args.alignment = 1;
 	args.minalignslop = 0;
 
 	args.minleft = ap->minleft;
@@ -3469,6 +3481,7 @@  xfs_bmap_btalloc_at_eof(
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*caller_pag = args->pag;
+	int			orig_alignment = args->alignment;
 	int			error;
 
 	/*
@@ -3543,10 +3556,10 @@  xfs_bmap_btalloc_at_eof(
 
 	/*
 	 * Allocation failed, so turn return the allocation args to their
-	 * original non-aligned state so the caller can proceed on allocation
-	 * failure as if this function was never called.
+	 * original state so the caller can proceed on allocation failure as
+	 * if this function was never called.
 	 */
-	args->alignment = 1;
+	args->alignment = orig_alignment;
 	return 0;
 }
 
@@ -3694,7 +3707,6 @@  xfs_bmap_btalloc(
 		.wasdel		= ap->wasdel,
 		.resv		= XFS_AG_RESV_NONE,
 		.datatype	= ap->datatype,
-		.alignment	= 1,
 		.minalignslop	= 0,
 	};
 	xfs_fileoff_t		orig_offset;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 18c8f168b153..70fe873951f3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -181,7 +181,9 @@  xfs_eof_alignment(
 		 * If mounted with the "-o swalloc" option the alignment is
 		 * increased from the strip unit size to the stripe width.
 		 */
-		if (mp->m_swidth && xfs_has_swalloc(mp))
+		if (xfs_inode_forcealign(ip))
+			align = xfs_get_extsz_hint(ip);
+		else if (mp->m_swidth && xfs_has_swalloc(mp))
 			align = mp->m_swidth;
 		else if (mp->m_dalign)
 			align = mp->m_dalign;