Message ID | 20250310183946.932054-10-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | large atomic writes for xfs with CoW | expand |
> else if (ap->datatype & XFS_ALLOC_USERDATA) > align = xfs_get_extsz_hint(ap->ip); > + > + if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN)) > + args->alignment = align; > + Add a comment please. > +/* Try to align allocations to the extent size hint */ > +#define XFS_BMAPI_EXTSZALIGN (1u << 11) Shouldn't we be doing this by default for any extent size hint based allocations? > bool found; > bool atomic_sw = flags & XFS_REFLINK_ATOMIC_SW; > + uint32_t bmapi_flags = XFS_BMAPI_COWFORK | > + XFS_BMAPI_PREALLOC; > + > + if (atomic_sw) > + bmapi_flags |= XFS_BMAPI_EXTSZALIGN; Please add a comment why you are doing this.
On 12/03/2025 07:42, Christoph Hellwig wrote: >> else if (ap->datatype & XFS_ALLOC_USERDATA) >> align = xfs_get_extsz_hint(ap->ip); >> + >> + if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN)) >> + args->alignment = align; >> + > > Add a comment please. ok > >> +/* Try to align allocations to the extent size hint */ >> +#define XFS_BMAPI_EXTSZALIGN (1u << 11) > > Shouldn't we be doing this by default for any extent size hint > based allocations? I'm not sure. I think that currently users just expect extszhint to hint at the granularity only. Maybe users don't require alignment and adding an alignment requirement just leads to more fragmentation. > >> bool found; >> bool atomic_sw = flags & XFS_REFLINK_ATOMIC_SW; >> + uint32_t bmapi_flags = XFS_BMAPI_COWFORK | >> + XFS_BMAPI_PREALLOC; >> + >> + if (atomic_sw) >> + bmapi_flags |= XFS_BMAPI_EXTSZALIGN; > > Please add a comment why you are doing this. > Sure Thanks, John
On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote: > > Shouldn't we be doing this by default for any extent size hint > > based allocations? > > I'm not sure. > > I think that currently users just expect extszhint to hint at the > granularity only. > > Maybe users don't require alignment and adding an alignment requirement just > leads to more fragmentation. But does it? Once an extsize hint is set I'd expect that we keep getting more allocation with it. And keeping the aligned is the concept of a buddy allocator which reduces fragmentation. Because of that I wonder why we aren't doing that by default.
On 12/03/2025 13:45, Christoph Hellwig wrote: > On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote: >>> Shouldn't we be doing this by default for any extent size hint >>> based allocations? >> >> I'm not sure. >> >> I think that currently users just expect extszhint to hint at the >> granularity only. >> >> Maybe users don't require alignment and adding an alignment requirement just >> leads to more fragmentation. > > But does it? Once an extsize hint is set I'd expect that we keep > getting more allocation with it. Sure, but that value is configurable per inode (so not all inodes may have it set)...but then it is also inherited. > And keeping the aligned is the concept > of a buddy allocator which reduces fragmentation. Because of that I > wonder why we aren't doing that by default. > As I see, we just pay attention to stripe alignment. Dave also questioned this - Dave, any further idea on why this? To me it could sense to use extszhint as the alignment requirement, but only if no stripe alignment. We also need to ensure to ignore this in low space allocator. Thanks, John
On Wed, Mar 12, 2025 at 06:45:12AM -0700, Christoph Hellwig wrote: > On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote: > > > Shouldn't we be doing this by default for any extent size hint > > > based allocations? > > > > I'm not sure. > > > > I think that currently users just expect extszhint to hint at the > > granularity only. Yes, the current behavior is that extszhint only affects the granularity of the file range that's passed into the allocator. To align the actual space, you have to set the raid stripe parameters. I can see how that sorta made sense in the old days -- the fs could get moved between raid arrays (or the raid array gets reconfigured), so you want the actual allocations to be aligned to whatever the current hardware config advertises. The extent size hint is merely a means to amortize the cost of allocation/second-guess the delalloc machinery. > > Maybe users don't require alignment and adding an alignment requirement just > > leads to more fragmentation. > > But does it? Once an extsize hint is set I'd expect that we keep > getting more allocation with it. And keeping the aligned is the concept > of a buddy allocator which reduces fragmentation. Because of that I > wonder why we aren't doing that by default. Histerical raisins? We /could/ let extszhint influence allocation alignment by default, but then anyone who had (say) a 8k hint on a 32k raid stripe might be surprised when the allocator behavior changes. What do you say about logic like this? if (software_atomic) { /* * align things so we can use hw atomic on the next * overwrite, no matter what hw says */ args->alignment = ip->i_extsize; } else if (raid_stripe) { /* otherwise try to align for better raid performance */ args->alignment = mp->m_dalign; } else if (ip->i_extsize) { /* if no raid, align to the hint provided */ args->alignment = ip->i_extsize; } else { args->alignment = 1; } Hm? (I'm probably forgetting something...) --D
On 12/03/2025 16:00, Darrick J. Wong wrote: > On Wed, Mar 12, 2025 at 06:45:12AM -0700, Christoph Hellwig wrote: >> On Wed, Mar 12, 2025 at 08:05:14AM +0000, John Garry wrote: >>>> Shouldn't we be doing this by default for any extent size hint >>>> based allocations? >>> >>> I'm not sure. >>> >>> I think that currently users just expect extszhint to hint at the >>> granularity only. > > Yes, the current behavior is that extszhint only affects the granularity > of the file range that's passed into the allocator. To align the actual > space, you have to set the raid stripe parameters. > > I can see how that sorta made sense in the old days -- the fs could get > moved between raid arrays (or the raid array gets reconfigured), so you > want the actual allocations to be aligned to whatever the current > hardware config advertises. The extent size hint is merely a means to > amortize the cost of allocation/second-guess the delalloc machinery. > >>> Maybe users don't require alignment and adding an alignment requirement just >>> leads to more fragmentation. >> >> But does it? Once an extsize hint is set I'd expect that we keep >> getting more allocation with it. And keeping the aligned is the concept >> of a buddy allocator which reduces fragmentation. Because of that I >> wonder why we aren't doing that by default. > > Histerical raisins? > > We /could/ let extszhint influence allocation alignment by default, but > then anyone who had (say) a 8k hint on a 32k raid stripe might be > surprised when the allocator behavior changes. > > What do you say about logic like this? > > if (software_atomic) { > /* > * align things so we can use hw atomic on the next > * overwrite, no matter what hw says > */ > args->alignment = ip->i_extsize; > } else if (raid_stripe) {> /* otherwise try to align for better raid performance */ > args->alignment = mp->m_dalign; > } else if (ip->i_extsize) { > /* if no raid, align to the hint provided */ > args->alignment = ip->i_extsize; > } else { > args->alignment = 1; > } > > Hm? (I'm probably forgetting something...) > note that for forcealign support, there was a prep patch to do something similar to this: https://lore.kernel.org/linux-xfs/20240429174746.2132161-5-john.g.garry@oracle.com/ > --D
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 63255820b58a..5ef5e4476209 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3312,6 +3312,10 @@ xfs_bmap_compute_alignments( align = xfs_get_cowextsz_hint(ap->ip); else if (ap->datatype & XFS_ALLOC_USERDATA) align = xfs_get_extsz_hint(ap->ip); + + if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN)) + args->alignment = align; + if (align) { if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0, ap->eof, 0, ap->conv, &ap->offset, diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index b4d9c6e0f3f9..d5f2729305fa 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -87,6 +87,9 @@ struct xfs_bmalloca { /* Do not update the rmap btree. Used for reconstructing bmbt from rmapbt. */ #define XFS_BMAPI_NORMAP (1u << 10) +/* Try to align allocations to the extent size hint */ +#define XFS_BMAPI_EXTSZALIGN (1u << 11) + #define XFS_BMAPI_FLAGS \ { XFS_BMAPI_ENTIRE, "ENTIRE" }, \ { XFS_BMAPI_METADATA, "METADATA" }, \ @@ -98,7 +101,8 @@ struct xfs_bmalloca { { XFS_BMAPI_REMAP, "REMAP" }, \ { XFS_BMAPI_COWFORK, "COWFORK" }, \ { XFS_BMAPI_NODISCARD, "NODISCARD" }, \ - { XFS_BMAPI_NORMAP, "NORMAP" } + { XFS_BMAPI_NORMAP, "NORMAP" },\ + { XFS_BMAPI_EXTSZALIGN, "EXTSZALIGN" } static inline int xfs_bmapi_aflag(int w) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index ce1fd58dff35..d577db42baab 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -445,6 +445,11 @@ xfs_reflink_fill_cow_hole( int error; bool found; bool atomic_sw = flags & XFS_REFLINK_ATOMIC_SW; + uint32_t bmapi_flags = XFS_BMAPI_COWFORK | + XFS_BMAPI_PREALLOC; + + if (atomic_sw) + bmapi_flags |= XFS_BMAPI_EXTSZALIGN; resaligned = xfs_aligned_fsb_count(imap->br_startoff, imap->br_blockcount, xfs_get_cowextsz_hint(ip)); @@ -478,8 +483,7 @@ xfs_reflink_fill_cow_hole( /* Allocate the entire reservation as unwritten blocks. */ nimaps = 1; error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount, - XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, cmap, - &nimaps); + bmapi_flags, 0, cmap, &nimaps); if (error) goto out_trans_cancel;