diff mbox series

[2/4] xfs: measure all contiguous previous extents for prealloc size

Message ID 159011598984.76931.15076402801787913960.stgit@magnolia (mailing list archive)
State Superseded, archived
Headers show
Series xfs: fix stale disk exposure after crash | expand

Commit Message

Darrick J. Wong May 22, 2020, 2:53 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

When we're estimating a new speculative preallocation length for an
extending write, we should walk backwards through the extent list to
determine the number of number of blocks that are physically and
logically contiguous with the write offset, and use that as an input to
the preallocation size computation.

This way, preallocation length is truly measured by the effectiveness of
the allocator in giving us contiguous allocations without being
influenced by the state of a given extent.  This fixes both the problem
where ZERO_RANGE within an EOF can reduce preallocation, and prevents
the unnecessary shrinkage of preallocation when delalloc extents are
turned into unwritten extents.

This was found as a regression in xfs/014 after changing delalloc writes
to create unwritten extents during writeback.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c |   37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

Comments

Christoph Hellwig May 22, 2020, 6:56 a.m. UTC | #1
On Thu, May 21, 2020 at 07:53:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're estimating a new speculative preallocation length for an
> extending write, we should walk backwards through the extent list to
> determine the number of number of blocks that are physically and
> logically contiguous with the write offset, and use that as an input to
> the preallocation size computation.
> 
> This way, preallocation length is truly measured by the effectiveness of
> the allocator in giving us contiguous allocations without being
> influenced by the state of a given extent.  This fixes both the problem
> where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> the unnecessary shrinkage of preallocation when delalloc extents are
> turned into unwritten extents.
> 
> This was found as a regression in xfs/014 after changing delalloc writes
> to create unwritten extents during writeback.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

A minor nitpick, though:

> +	struct xfs_iext_cursor	ncur = *icur; /* struct copy */
>
> +	struct xfs_bmbt_irec	prev, got;

The comment is pretty pointless, as the struct copy is obviously form
the syntax (and we do it for the xfs_iext_cursor structure in quite a
few other places).

Also please don't add empty lines between the variable declarations.
Brian Foster May 22, 2020, 11:27 a.m. UTC | #2
On Thu, May 21, 2020 at 07:53:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're estimating a new speculative preallocation length for an
> extending write, we should walk backwards through the extent list to
> determine the number of number of blocks that are physically and
> logically contiguous with the write offset, and use that as an input to
> the preallocation size computation.
> 
> This way, preallocation length is truly measured by the effectiveness of
> the allocator in giving us contiguous allocations without being
> influenced by the state of a given extent.  This fixes both the problem
> where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> the unnecessary shrinkage of preallocation when delalloc extents are
> turned into unwritten extents.
> 
> This was found as a regression in xfs/014 after changing delalloc writes
> to create unwritten extents during writeback.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_iomap.c |   37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index ac970b13b1f8..6a308af93893 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -377,15 +377,17 @@ xfs_iomap_prealloc_size(
>  	loff_t			count,
>  	struct xfs_iext_cursor	*icur)
>  {
> +	struct xfs_iext_cursor	ncur = *icur; /* struct copy */
> +	struct xfs_bmbt_irec	prev, got;
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	struct xfs_bmbt_irec	prev;
> -	int			shift = 0;
>  	int64_t			freesp;
>  	xfs_fsblock_t		qblocks;
> -	int			qshift = 0;
>  	xfs_fsblock_t		alloc_blocks = 0;
> +	xfs_extlen_t		plen;
> +	int			shift = 0;
> +	int			qshift = 0;
>  
>  	if (offset + count <= XFS_ISIZE(ip))
>  		return 0;
> @@ -413,16 +415,27 @@ xfs_iomap_prealloc_size(
>  	 * preallocation size.
>  	 *
>  	 * If the extent is a hole, then preallocation is essentially disabled.
> -	 * Otherwise we take the size of the preceding data extent as the basis
> -	 * for the preallocation size. If the size of the extent is greater than
> -	 * half the maximum extent length, then use the current offset as the
> -	 * basis. This ensures that for large files the preallocation size
> -	 * always extends to MAXEXTLEN rather than falling short due to things
> -	 * like stripe unit/width alignment of real extents.
> +	 * Otherwise we take the size of the preceding data extents as the basis
> +	 * for the preallocation size. Note that we don't care if the previous
> +	 * extents are written or not.
> +	 *
> +	 * If the size of the extents is greater than half the maximum extent
> +	 * length, then use the current offset as the basis. This ensures that
> +	 * for large files the preallocation size always extends to MAXEXTLEN
> +	 * rather than falling short due to things like stripe unit/width
> +	 * alignment of real extents.
>  	 */
> -	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
> -		alloc_blocks = prev.br_blockcount << 1;
> -	else
> +	plen = prev.br_blockcount;

If prev is initialized by peeking the previous extent, then it looks
like the first iteration of this loop compares the immediately previous
extent with itself..

> +	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
> +		if (plen > MAXEXTLEN / 2 ||
> +		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
> +		    got.br_startblock + got.br_blockcount != prev.br_startblock)

We should probably check for nullstartblock (delalloc) extents
explicitly here rather than rely on the calculation to fail.

> +			break;
> +		plen += got.br_blockcount;



> +		prev = got;
> +	}
> +	alloc_blocks = plen * 2;

Why do we replace the bit shifts with division/multiplication? I'd
prefer to see the former for obvious power of 2 operations, even if this
happens to be 32-bit arithmetic. I don't see any particular reason to
change it in this patch.

Brian

> +	if (alloc_blocks > MAXEXTLEN)
>  		alloc_blocks = XFS_B_TO_FSB(mp, offset);
>  	if (!alloc_blocks)
>  		goto check_writeio;
>
Darrick J. Wong May 23, 2020, 12:25 a.m. UTC | #3
On Thu, May 21, 2020 at 11:56:50PM -0700, Christoph Hellwig wrote:
> On Thu, May 21, 2020 at 07:53:09PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're estimating a new speculative preallocation length for an
> > extending write, we should walk backwards through the extent list to
> > determine the number of number of blocks that are physically and
> > logically contiguous with the write offset, and use that as an input to
> > the preallocation size computation.
> > 
> > This way, preallocation length is truly measured by the effectiveness of
> > the allocator in giving us contiguous allocations without being
> > influenced by the state of a given extent.  This fixes both the problem
> > where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> > the unnecessary shrinkage of preallocation when delalloc extents are
> > turned into unwritten extents.
> > 
> > This was found as a regression in xfs/014 after changing delalloc writes
> > to create unwritten extents during writeback.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> A minor nitpick, though:
> 
> > +	struct xfs_iext_cursor	ncur = *icur; /* struct copy */
> >
> > +	struct xfs_bmbt_irec	prev, got;
> 
> The comment is pretty pointless, as the struct copy is obviously form
> the syntax (and we do it for the xfs_iext_cursor structure in quite a
> few other places).
> 
> Also please don't add empty lines between the variable declarations.

I didn't... not sure where that came from.

--D
Darrick J. Wong May 23, 2020, 12:27 a.m. UTC | #4
On Fri, May 22, 2020 at 07:27:22AM -0400, Brian Foster wrote:
> On Thu, May 21, 2020 at 07:53:09PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're estimating a new speculative preallocation length for an
> > extending write, we should walk backwards through the extent list to
> > determine the number of number of blocks that are physically and
> > logically contiguous with the write offset, and use that as an input to
> > the preallocation size computation.
> > 
> > This way, preallocation length is truly measured by the effectiveness of
> > the allocator in giving us contiguous allocations without being
> > influenced by the state of a given extent.  This fixes both the problem
> > where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> > the unnecessary shrinkage of preallocation when delalloc extents are
> > turned into unwritten extents.
> > 
> > This was found as a regression in xfs/014 after changing delalloc writes
> > to create unwritten extents during writeback.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_iomap.c |   37 +++++++++++++++++++++++++------------
> >  1 file changed, 25 insertions(+), 12 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index ac970b13b1f8..6a308af93893 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -377,15 +377,17 @@ xfs_iomap_prealloc_size(
> >  	loff_t			count,
> >  	struct xfs_iext_cursor	*icur)
> >  {
> > +	struct xfs_iext_cursor	ncur = *icur; /* struct copy */
> > +	struct xfs_bmbt_irec	prev, got;
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
> >  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > -	struct xfs_bmbt_irec	prev;
> > -	int			shift = 0;
> >  	int64_t			freesp;
> >  	xfs_fsblock_t		qblocks;
> > -	int			qshift = 0;
> >  	xfs_fsblock_t		alloc_blocks = 0;
> > +	xfs_extlen_t		plen;
> > +	int			shift = 0;
> > +	int			qshift = 0;
> >  
> >  	if (offset + count <= XFS_ISIZE(ip))
> >  		return 0;
> > @@ -413,16 +415,27 @@ xfs_iomap_prealloc_size(
> >  	 * preallocation size.
> >  	 *
> >  	 * If the extent is a hole, then preallocation is essentially disabled.
> > -	 * Otherwise we take the size of the preceding data extent as the basis
> > -	 * for the preallocation size. If the size of the extent is greater than
> > -	 * half the maximum extent length, then use the current offset as the
> > -	 * basis. This ensures that for large files the preallocation size
> > -	 * always extends to MAXEXTLEN rather than falling short due to things
> > -	 * like stripe unit/width alignment of real extents.
> > +	 * Otherwise we take the size of the preceding data extents as the basis
> > +	 * for the preallocation size. Note that we don't care if the previous
> > +	 * extents are written or not.
> > +	 *
> > +	 * If the size of the extents is greater than half the maximum extent
> > +	 * length, then use the current offset as the basis. This ensures that
> > +	 * for large files the preallocation size always extends to MAXEXTLEN
> > +	 * rather than falling short due to things like stripe unit/width
> > +	 * alignment of real extents.
> >  	 */
> > -	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
> > -		alloc_blocks = prev.br_blockcount << 1;
> > -	else
> > +	plen = prev.br_blockcount;
> 
> If prev is initialized by peeking the previous extent, then it looks
> like the first iteration of this loop compares the immediately previous
> extent with itself..

D'oh.  I misported that when I was munging patches around.  Since we
copy *icur to ncur, we can replace the previous peek against icur with a
call to xfs_iext_prev_extent on ncur.

> > +	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
> > +		if (plen > MAXEXTLEN / 2 ||
> > +		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
> > +		    got.br_startblock + got.br_blockcount != prev.br_startblock)
> 
> We should probably check for nullstartblock (delalloc) extents
> explicitly here rather than rely on the calculation to fail.

Ok.

> > +			break;
> > +		plen += got.br_blockcount;
> 
> 
> 
> > +		prev = got;
> > +	}
> > +	alloc_blocks = plen * 2;
> 
> Why do we replace the bit shifts with division/multiplication? I'd
> prefer to see the former for obvious power of 2 operations, even if this
> happens to be 32-bit arithmetic. I don't see any particular reason to
> change it in this patch.

Fair enough.  Thanks for catching those things.

--D

> Brian
> 
> > +	if (alloc_blocks > MAXEXTLEN)
> >  		alloc_blocks = XFS_B_TO_FSB(mp, offset);
> >  	if (!alloc_blocks)
> >  		goto check_writeio;
> > 
>
Christoph Hellwig May 23, 2020, 7:09 a.m. UTC | #5
On Fri, May 22, 2020 at 07:27:22AM -0400, Brian Foster wrote:
> Why do we replace the bit shifts with division/multiplication? I'd
> prefer to see the former for obvious power of 2 operations, even if this
> happens to be 32-bit arithmetic. I don't see any particular reason to
> change it in this patch.

Because it is the natural way to express the operation and matches what
is said in the comments.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ac970b13b1f8..6a308af93893 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -377,15 +377,17 @@  xfs_iomap_prealloc_size(
 	loff_t			count,
 	struct xfs_iext_cursor	*icur)
 {
+	struct xfs_iext_cursor	ncur = *icur; /* struct copy */
+	struct xfs_bmbt_irec	prev, got;
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	struct xfs_bmbt_irec	prev;
-	int			shift = 0;
 	int64_t			freesp;
 	xfs_fsblock_t		qblocks;
-	int			qshift = 0;
 	xfs_fsblock_t		alloc_blocks = 0;
+	xfs_extlen_t		plen;
+	int			shift = 0;
+	int			qshift = 0;
 
 	if (offset + count <= XFS_ISIZE(ip))
 		return 0;
@@ -413,16 +415,27 @@  xfs_iomap_prealloc_size(
 	 * preallocation size.
 	 *
 	 * If the extent is a hole, then preallocation is essentially disabled.
-	 * Otherwise we take the size of the preceding data extent as the basis
-	 * for the preallocation size. If the size of the extent is greater than
-	 * half the maximum extent length, then use the current offset as the
-	 * basis. This ensures that for large files the preallocation size
-	 * always extends to MAXEXTLEN rather than falling short due to things
-	 * like stripe unit/width alignment of real extents.
+	 * Otherwise we take the size of the preceding data extents as the basis
+	 * for the preallocation size. Note that we don't care if the previous
+	 * extents are written or not.
+	 *
+	 * If the size of the extents is greater than half the maximum extent
+	 * length, then use the current offset as the basis. This ensures that
+	 * for large files the preallocation size always extends to MAXEXTLEN
+	 * rather than falling short due to things like stripe unit/width
+	 * alignment of real extents.
 	 */
-	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
-		alloc_blocks = prev.br_blockcount << 1;
-	else
+	plen = prev.br_blockcount;
+	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
+		if (plen > MAXEXTLEN / 2 ||
+		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
+		    got.br_startblock + got.br_blockcount != prev.br_startblock)
+			break;
+		plen += got.br_blockcount;
+		prev = got;
+	}
+	alloc_blocks = plen * 2;
+	if (alloc_blocks > MAXEXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
 	if (!alloc_blocks)
 		goto check_writeio;