diff mbox series

[v5,09/10] xfs: Allow block allocator to take an alignment hint

Message ID 20250310183946.932054-10-john.g.garry@oracle.com (mailing list archive)
State New
Headers show
Series large atomic writes for xfs with CoW | expand

Commit Message

John Garry March 10, 2025, 6:39 p.m. UTC
When issuing an atomic write by the CoW method, give the block allocator a
hint to align to the extszhint.

This means that we have a better chance to issuing the atomic write via
HW offload next time.

It does mean that the inode extszhint should be set appropriately for the
expected atomic write size.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 4 ++++
 fs/xfs/libxfs/xfs_bmap.h | 6 +++++-
 fs/xfs/xfs_reflink.c     | 8 ++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig March 12, 2025, 7:42 a.m. UTC | #1
>  	else if (ap->datatype & XFS_ALLOC_USERDATA)
>  		align = xfs_get_extsz_hint(ap->ip);
> +
> +	if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN))
> +		args->alignment = align;
> +

Add a comment please.

> +/* Try to align allocations to the extent size hint */
> +#define XFS_BMAPI_EXTSZALIGN	(1u << 11)

Shouldn't we be doing this by default for any extent size hint
based allocations?

>  	bool			found;
>  	bool			atomic_sw = flags & XFS_REFLINK_ATOMIC_SW;
> +	uint32_t		bmapi_flags = XFS_BMAPI_COWFORK |
> +					      XFS_BMAPI_PREALLOC;
> +
> +	if (atomic_sw)
> +		bmapi_flags |= XFS_BMAPI_EXTSZALIGN;

Please add a comment why you are doing this.
John Garry March 12, 2025, 8:05 a.m. UTC | #2
On 12/03/2025 07:42, Christoph Hellwig wrote:
>>   	else if (ap->datatype & XFS_ALLOC_USERDATA)
>>   		align = xfs_get_extsz_hint(ap->ip);
>> +
>> +	if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN))
>> +		args->alignment = align;
>> +
> 
> Add a comment please.

ok

> 
>> +/* Try to align allocations to the extent size hint */
>> +#define XFS_BMAPI_EXTSZALIGN	(1u << 11)
> 
> Shouldn't we be doing this by default for any extent size hint
> based allocations?

I'm not sure.

I think that currently users just expect extszhint to hint at the 
granularity only.

Maybe users don't require alignment and adding an alignment requirement 
just leads to more fragmentation.

> 
>>   	bool			found;
>>   	bool			atomic_sw = flags & XFS_REFLINK_ATOMIC_SW;
>> +	uint32_t		bmapi_flags = XFS_BMAPI_COWFORK |
>> +					      XFS_BMAPI_PREALLOC;
>> +
>> +	if (atomic_sw)
>> +		bmapi_flags |= XFS_BMAPI_EXTSZALIGN;
> 
> Please add a comment why you are doing this.
> 

Sure

Thanks,
John
Christoph Hellwig March 12, 2025, 1:45 p.m. UTC | #3
On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote:
> > Shouldn't we be doing this by default for any extent size hint
> > based allocations?
> 
> I'm not sure.
> 
> I think that currently users just expect extszhint to hint at the
> granularity only.
> 
> Maybe users don't require alignment and adding an alignment requirement just
> leads to more fragmentation.

But does it?  Once an extsize hint is set I'd expect that we keep
getting more allocation with it.  And keeping the aligned is the concept
of a buddy allocator which reduces fragmentation.  Because of that I
wonder why we aren't doing that by default.
John Garry March 12, 2025, 2:47 p.m. UTC | #4
On 12/03/2025 13:45, Christoph Hellwig wrote:
> On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote:
>>> Shouldn't we be doing this by default for any extent size hint
>>> based allocations?
>>
>> I'm not sure.
>>
>> I think that currently users just expect extszhint to hint at the
>> granularity only.
>>
>> Maybe users don't require alignment and adding an alignment requirement just
>> leads to more fragmentation.
> 
> But does it?  Once an extsize hint is set I'd expect that we keep
> getting more allocation with it. 

Sure, but that value is configurable per inode (so not all inodes may 
have it set)...but then it is also inherited.

> And keeping the aligned is the concept
> of a buddy allocator which reduces fragmentation.  Because of that I
> wonder why we aren't doing that by default.
> 

As I see, we just pay attention to stripe alignment. Dave also 
questioned this - Dave, any further idea on why this?

To me it could sense to use extszhint as the alignment requirement, but 
only if no stripe alignment. We also need to ensure to ignore this in 
low space allocator.

Thanks,
John
Darrick J. Wong March 12, 2025, 4 p.m. UTC | #5
On Wed, Mar 12, 2025 at 06:45:12AM -0700, Christoph Hellwig wrote:
> On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote:
> > > Shouldn't we be doing this by default for any extent size hint
> > > based allocations?
> > 
> > I'm not sure.
> > 
> > I think that currently users just expect extszhint to hint at the
> > granularity only.

Yes, the current behavior is that extszhint only affects the granularity
of the file range that's passed into the allocator.  To align the actual
space, you have to set the raid stripe parameters.

I can see how that sorta made sense in the old days -- the fs could get
moved between raid arrays (or the raid array gets reconfigured), so you
want the actual allocations to be aligned to whatever the current
hardware config advertises.  The extent size hint is merely a means to
amortize the cost of allocation/second-guess the delalloc machinery.

> > Maybe users don't require alignment and adding an alignment requirement just
> > leads to more fragmentation.
> 
> But does it?  Once an extsize hint is set I'd expect that we keep
> getting more allocation with it.  And keeping the aligned is the concept
> of a buddy allocator which reduces fragmentation.  Because of that I
> wonder why we aren't doing that by default.

Histerical raisins?

We /could/ let extszhint influence allocation alignment by default, but
then anyone who had (say) a 8k hint on a 32k raid stripe might be
surprised when the allocator behavior changes.

What do you say about logic like this?

	if (software_atomic) {
		/*
		 * align things so we can use hw atomic on the next
		 * overwrite, no matter what hw says
		 */
		args->alignment = ip->i_extsize;
	} else if (raid_stripe) {
		/* otherwise try to align for better raid performance */
		args->alignment = mp->m_dalign;
	} else if (ip->i_extsize) {
		/* if no raid, align to the hint provided */
		args->alignment = ip->i_extsize;
	} else {
		args->alignment = 1;
	}

Hm?  (I'm probably forgetting something...)

--D
John Garry March 12, 2025, 4:28 p.m. UTC | #6
On 12/03/2025 16:00, Darrick J. Wong wrote:
> On Wed, Mar 12, 2025 at 06:45:12AM -0700, Christoph Hellwig wrote:
>> On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote:
>>>> Shouldn't we be doing this by default for any extent size hint
>>>> based allocations?
>>>
>>> I'm not sure.
>>>
>>> I think that currently users just expect extszhint to hint at the
>>> granularity only.
> 
> Yes, the current behavior is that extszhint only affects the granularity
> of the file range that's passed into the allocator.  To align the actual
> space, you have to set the raid stripe parameters.
> 
> I can see how that sorta made sense in the old days -- the fs could get
> moved between raid arrays (or the raid array gets reconfigured), so you
> want the actual allocations to be aligned to whatever the current
> hardware config advertises.  The extent size hint is merely a means to
> amortize the cost of allocation/second-guess the delalloc machinery.
> 
>>> Maybe users don't require alignment and adding an alignment requirement just
>>> leads to more fragmentation.
>>
>> But does it?  Once an extsize hint is set I'd expect that we keep
>> getting more allocation with it.  And keeping the aligned is the concept
>> of a buddy allocator which reduces fragmentation.  Because of that I
>> wonder why we aren't doing that by default.
> 
> Histerical raisins?
> 
> We /could/ let extszhint influence allocation alignment by default, but
> then anyone who had (say) a 8k hint on a 32k raid stripe might be
> surprised when the allocator behavior changes.
> 
> What do you say about logic like this?
> 
> 	if (software_atomic) {
> 		/*
> 		 * align things so we can use hw atomic on the next
> 		 * overwrite, no matter what hw says
> 		 */
> 		args->alignment = ip->i_extsize;
 > 	} else if (raid_stripe) {> 		/* otherwise try to align for better 
raid performance */
> 		args->alignment = mp->m_dalign;
> 	} else if (ip->i_extsize) {
> 		/* if no raid, align to the hint provided */
> 		args->alignment = ip->i_extsize;
> 	} else {
> 		args->alignment = 1;
> 	}
> 
> Hm?  (I'm probably forgetting something...)
> 

note that for forcealign support, there was a prep patch to do something 
similar to this:
https://lore.kernel.org/linux-xfs/20240429174746.2132161-5-john.g.garry@oracle.com/


> --D
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 63255820b58a..5ef5e4476209 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3312,6 +3312,10 @@  xfs_bmap_compute_alignments(
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
+	if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN))
+		args->alignment = align;
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index b4d9c6e0f3f9..d5f2729305fa 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -87,6 +87,9 @@  struct xfs_bmalloca {
 /* Do not update the rmap btree.  Used for reconstructing bmbt from rmapbt. */
 #define XFS_BMAPI_NORMAP	(1u << 10)
 
+/* Try to align allocations to the extent size hint */
+#define XFS_BMAPI_EXTSZALIGN	(1u << 11)
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -98,7 +101,8 @@  struct xfs_bmalloca {
 	{ XFS_BMAPI_REMAP,	"REMAP" }, \
 	{ XFS_BMAPI_COWFORK,	"COWFORK" }, \
 	{ XFS_BMAPI_NODISCARD,	"NODISCARD" }, \
-	{ XFS_BMAPI_NORMAP,	"NORMAP" }
+	{ XFS_BMAPI_NORMAP,	"NORMAP" },\
+	{ XFS_BMAPI_EXTSZALIGN,	"EXTSZALIGN" }
 
 
 static inline int xfs_bmapi_aflag(int w)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ce1fd58dff35..d577db42baab 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -445,6 +445,11 @@  xfs_reflink_fill_cow_hole(
 	int			error;
 	bool			found;
 	bool			atomic_sw = flags & XFS_REFLINK_ATOMIC_SW;
+	uint32_t		bmapi_flags = XFS_BMAPI_COWFORK |
+					      XFS_BMAPI_PREALLOC;
+
+	if (atomic_sw)
+		bmapi_flags |= XFS_BMAPI_EXTSZALIGN;
 
 	resaligned = xfs_aligned_fsb_count(imap->br_startoff,
 		imap->br_blockcount, xfs_get_cowextsz_hint(ip));
@@ -478,8 +483,7 @@  xfs_reflink_fill_cow_hole(
 	/* Allocate the entire reservation as unwritten blocks. */
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
-			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, cmap,
-			&nimaps);
+			bmapi_flags, 0, cmap, &nimaps);
 	if (error)
 		goto out_trans_cancel;