diff mbox series

[8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write

Message ID 20240408145454.718047-9-hch@lst.de (mailing list archive)
State Superseded
Headers show
Series [1/8] xfs: fix error returns from xfs_bmapi_write | expand

Commit Message

Christoph Hellwig April 8, 2024, 2:54 p.m. UTC
While trying to convert the entire delalloc extent is a good decision
for regular writeback as it leads to larger contigous on-disk extents,
but for other callers of xfs_bmapi_write is is rather questionable as
it forced them to loop creating new transactions just in case there
is no large enough contiguous extent to cover the whole delalloc
reservation.

Change xfs_bmapi_write to only allocate the passed in range instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Darrick J. Wong April 9, 2024, 11:16 p.m. UTC | #1
On Mon, Apr 08, 2024 at 04:54:54PM +0200, Christoph Hellwig wrote:
> While trying to convert the entire delalloc extent is a good decision
> for regular writeback as it leads to larger contigous on-disk extents,
> but for other callers of xfs_bmapi_write is is rather questionable as
> it forced them to loop creating new transactions just in case there
> is no large enough contiguous extent to cover the whole delalloc
> reservation.
> 
> Change xfs_bmapi_write to only allocate the passed in range instead.

Looking at this... I guess xfs_map_blocks -> xfs_convert_blocks ->
xfs_bmapi_convert_delalloc -> xfs_bmapi_allocate is now how writeback
converts delalloc extents before scheduling writeout.  This is how the
mass-conversions of large da reservations got done before this series,
and that's still how it works, right?

Whereas xfs_bmapi_write is for targeted conversions only?

> Signed-off-by: Christoph Hellwig <hch@lst.de>

If yes and yes, then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7700a48e013d5a..748809b13113ab 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4533,8 +4533,9 @@ xfs_bmapi_write(
>  			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
>  
>  			if (wasdelay) {
> -				bma.offset = bma.got.br_startoff;
> -				bma.length = bma.got.br_blockcount;
> +				bma.length = XFS_FILBLKS_MIN(bma.length,
> +					bma.got.br_blockcount -
> +					(bno - bma.got.br_startoff));
>  			} else {
>  				if (!eof)
>  					bma.length = XFS_FILBLKS_MIN(bma.length,
> -- 
> 2.39.2
> 
>
Christoph Hellwig April 10, 2024, 4:07 a.m. UTC | #2
On Tue, Apr 09, 2024 at 04:16:01PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 08, 2024 at 04:54:54PM +0200, Christoph Hellwig wrote:
> > While trying to convert the entire delalloc extent is a good decision
> > for regular writeback as it leads to larger contigous on-disk extents,
> > but for other callers of xfs_bmapi_write is is rather questionable as
> > it forced them to loop creating new transactions just in case there
> > is no large enough contiguous extent to cover the whole delalloc
> > reservation.
> > 
> > Change xfs_bmapi_write to only allocate the passed in range instead.
> 
> Looking at this... I guess xfs_map_blocks -> xfs_convert_blocks ->
> xfs_bmapi_convert_delalloc -> xfs_bmapi_allocate is now how writeback
> converts delalloc extents before scheduling writeout.  This is how the
> mass-conversions of large da reservations got done before this series,
> and that's still how it works, right?

Yes and yes.

> Whereas xfs_bmapi_write is for targeted conversions only?

targeted is one way to describe it, the other way to look at it is
that xfs_bmapi_write is used where the callers want to allocate
real (or unwritten) extents for a range, which just happens to
convert delalloc as a side-effect as those callers don't want to
deal with delalloc extents.

> 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> If yes and yes, then:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

So as mentioned in the cover letter I'm quite worried about the
new behavior we expose here as we always converted delalloc
extents from the start and tried to convert it to the end,
and this now changes that.  So while the changes looks quite
simple they expose a lot of previously untested code and behavior.

It's probably the right thing to do but quite risky, let me know
what you think about it.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7700a48e013d5a..748809b13113ab 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4533,8 +4533,9 @@  xfs_bmapi_write(
 			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
 
 			if (wasdelay) {
-				bma.offset = bma.got.br_startoff;
-				bma.length = bma.got.br_blockcount;
+				bma.length = XFS_FILBLKS_MIN(bma.length,
+					bma.got.br_blockcount -
+					(bno - bma.got.br_startoff));
 			} else {
 				if (!eof)
 					bma.length = XFS_FILBLKS_MIN(bma.length,