diff mbox series

[v2] xfs: ensure st_blocks never goes to zero during COW writes

Message ID 20240824033814.1162964-1-hch@lst.de (mailing list archive)
State Superseded, archived
Headers show
Series [v2] xfs: ensure st_blocks never goes to zero during COW writes | expand

Commit Message

Christoph Hellwig Aug. 24, 2024, 3:37 a.m. UTC
COW writes remove the amount overwritten either directly for delalloc
reservations, or in earlier deferred transactions than adding the new
amount back in the bmap map transaction.  This means st_blocks on an
inode where all data is overwritten using the COW path can temporarily
show a 0 st_blocks.  This can easily be reproduced with the pending
zoned device support where all writes use this path and trips the
check in generic/615, but could also happen on a reflink file without
that.

Fix this by temporarily add the pending blocks to be mapped to
i_delayed_blks while the item is queued.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---

Changes since v1:
 - slightly more and slightly improved comments

 fs/xfs/libxfs/xfs_bmap.c |  1 +
 fs/xfs/xfs_bmap_item.c   | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

Comments

Darrick J. Wong Aug. 24, 2024, 4:47 a.m. UTC | #1
On Sat, Aug 24, 2024 at 05:37:52AM +0200, Christoph Hellwig wrote:
> COW writes remove the amount overwritten either directly for delalloc
> reservations, or in earlier deferred transactions than adding the new
> amount back in the bmap map transaction.  This means st_blocks on an
> inode where all data is overwritten using the COW path can temporarily
> show a 0 st_blocks.  This can easily be reproduced with the pending
> zoned device support where all writes use this path and trips the
> check in generic/615, but could also happen on a reflink file without
> that.
> 
> Fix this by temporarily add the pending blocks to be mapped to
> i_delayed_blks while the item is queued.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> 
> Changes since v1:
>  - slightly more and slightly improved comments
> 
>  fs/xfs/libxfs/xfs_bmap.c |  1 +
>  fs/xfs/xfs_bmap_item.c   | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7df74c35d9f900..177735c30d273a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4847,6 +4847,7 @@ xfs_bmapi_remap(
>  	}
>  
>  	ip->i_nblocks += len;
> +	ip->i_delayed_blks -= len; /* see xfs_bmap_defer_add */
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
>  	if (ifp->if_format == XFS_DINODE_FMT_BTREE)
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index e224b49b7cff6d..4b8838e5b5bebc 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -346,6 +346,18 @@ xfs_bmap_defer_add(
>  	trace_xfs_bmap_defer(bi);
>  
>  	xfs_bmap_update_get_group(tp->t_mountp, bi);
> +
> +	/*
> +	 * Ensure the deferred mapping is pre-recorded in i_delayed_blks.
> +	 *
> +	 * Otherwise stat can report zero blocks for an inode that actually has
> +	 * data when the entire mapping is in the process of being overwritten
> +	 * using the out of place write path. This is undone in after

Nit: "...undone after xfs_bmapi_remap..."

With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +	 * xfs_bmapi_remap has incremented di_nblocks for a successful
> +	 * operation.
> +	 */
> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
>  	xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
>  }
>  
> @@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item(
>  {
>  	struct xfs_bmap_intent		*bi = bi_entry(item);
>  
> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
> +
>  	xfs_bmap_update_put_group(bi);
>  	kmem_cache_free(xfs_bmap_intent_cache, bi);
>  }
> @@ -464,6 +479,9 @@ xfs_bui_recover_work(
>  	bi->bi_owner = *ipp;
>  	xfs_bmap_update_get_group(mp, bi);
>  
> +	/* see xfs_bmap_defer_add for details */
> +	if (bi->bi_type == XFS_BMAP_MAP)
> +		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
>  	xfs_defer_add_item(dfp, &bi->bi_list);
>  	return bi;
>  }
> -- 
> 2.43.0
> 
>
Christoph Hellwig Aug. 24, 2024, 4:50 a.m. UTC | #2
On Fri, Aug 23, 2024 at 09:47:18PM -0700, Darrick J. Wong wrote:
> > +	 * data when the entire mapping is in the process of being overwritten
> > +	 * using the out of place write path. This is undone in after
> 
> Nit: "...undone after xfs_bmapi_remap..."

It's actually undone inside xfs_bmapi_remap, and has to because that
clears the blockcount to zero.
Darrick J. Wong Aug. 24, 2024, 6:31 a.m. UTC | #3
On Sat, Aug 24, 2024 at 06:50:04AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 23, 2024 at 09:47:18PM -0700, Darrick J. Wong wrote:
> > > +	 * data when the entire mapping is in the process of being overwritten
> > > +	 * using the out of place write path. This is undone in after
> > 
> > Nit: "...undone after xfs_bmapi_remap..."
> 
> It's actually undone inside xfs_bmapi_remap, and has to because that
> clears the blockcount to zero.

"This is undone inside xfs_bmapi_remap after it has incremented
di_nblocks for a successful operation."

(and you can still add the rvb)

--D
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7df74c35d9f900..177735c30d273a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4847,6 +4847,7 @@  xfs_bmapi_remap(
 	}
 
 	ip->i_nblocks += len;
+	ip->i_delayed_blks -= len; /* see xfs_bmap_defer_add */
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	if (ifp->if_format == XFS_DINODE_FMT_BTREE)
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e224b49b7cff6d..4b8838e5b5bebc 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -346,6 +346,18 @@  xfs_bmap_defer_add(
 	trace_xfs_bmap_defer(bi);
 
 	xfs_bmap_update_get_group(tp->t_mountp, bi);
+
+	/*
+	 * Ensure the deferred mapping is pre-recorded in i_delayed_blks.
+	 *
+	 * Otherwise stat can report zero blocks for an inode that actually has
+	 * data when the entire mapping is in the process of being overwritten
+	 * using the out of place write path. This is undone in after
+	 * xfs_bmapi_remap has incremented di_nblocks for a successful
+	 * operation.
+	 */
+	if (bi->bi_type == XFS_BMAP_MAP)
+		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
 	xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
 }
 
@@ -367,6 +379,9 @@  xfs_bmap_update_cancel_item(
 {
 	struct xfs_bmap_intent		*bi = bi_entry(item);
 
+	if (bi->bi_type == XFS_BMAP_MAP)
+		bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
+
 	xfs_bmap_update_put_group(bi);
 	kmem_cache_free(xfs_bmap_intent_cache, bi);
 }
@@ -464,6 +479,9 @@  xfs_bui_recover_work(
 	bi->bi_owner = *ipp;
 	xfs_bmap_update_get_group(mp, bi);
 
+	/* see xfs_bmap_defer_add for details */
+	if (bi->bi_type == XFS_BMAP_MAP)
+		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
 	xfs_defer_add_item(dfp, &bi->bi_list);
 	return bi;
 }