Message ID | 20240824033814.1162964-1-hch@lst.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [v2] xfs: ensure st_blocks never goes to zero during COW writes | expand |
On Sat, Aug 24, 2024 at 05:37:52AM +0200, Christoph Hellwig wrote: > COW writes remove the amount overwritten either directly for delalloc > reservations, or in earlier deferred transactions than adding the new > amount back in the bmap map transaction. This means st_blocks on an > inode where all data is overwritten using the COW path can temporarily > show a 0 st_blocks. This can easily be reproduced with the pending > zoned device support where all writes use this path and trips the > check in generic/615, but could also happen on a reflink file without > that. > > Fix this by temporarily add the pending blocks to be mapped to > i_delayed_blks while the item is queued. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > > Changes since v1: > - slightly more and slightly improved comments > > fs/xfs/libxfs/xfs_bmap.c | 1 + > fs/xfs/xfs_bmap_item.c | 18 ++++++++++++++++++ > 2 files changed, 19 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 7df74c35d9f900..177735c30d273a 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -4847,6 +4847,7 @@ xfs_bmapi_remap( > } > > ip->i_nblocks += len; > + ip->i_delayed_blks -= len; /* see xfs_bmap_defer_add */ > xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > if (ifp->if_format == XFS_DINODE_FMT_BTREE) > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c > index e224b49b7cff6d..4b8838e5b5bebc 100644 > --- a/fs/xfs/xfs_bmap_item.c > +++ b/fs/xfs/xfs_bmap_item.c > @@ -346,6 +346,18 @@ xfs_bmap_defer_add( > trace_xfs_bmap_defer(bi); > > xfs_bmap_update_get_group(tp->t_mountp, bi); > + > + /* > + * Ensure the deferred mapping is pre-recorded in i_delayed_blks. > + * > + * Otherwise stat can report zero blocks for an inode that actually has > + * data when the entire mapping is in the process of being overwritten > + * using the out of place write path. This is undone in after Nit: "...undone after xfs_bmapi_remap..." With that fixed, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + * xfs_bmapi_remap has incremented di_nblocks for a successful > + * operation. > + */ > + if (bi->bi_type == XFS_BMAP_MAP) > + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount; > xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type); > } > > @@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item( > { > struct xfs_bmap_intent *bi = bi_entry(item); > > + if (bi->bi_type == XFS_BMAP_MAP) > + bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount; > + > xfs_bmap_update_put_group(bi); > kmem_cache_free(xfs_bmap_intent_cache, bi); > } > @@ -464,6 +479,9 @@ xfs_bui_recover_work( > bi->bi_owner = *ipp; > xfs_bmap_update_get_group(mp, bi); > > + /* see xfs_bmap_defer_add for details */ > + if (bi->bi_type == XFS_BMAP_MAP) > + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount; > xfs_defer_add_item(dfp, &bi->bi_list); > return bi; > } > -- > 2.43.0 > >
On Fri, Aug 23, 2024 at 09:47:18PM -0700, Darrick J. Wong wrote: > > + * data when the entire mapping is in the process of being overwritten > > + * using the out of place write path. This is undone in after > > Nit: "...undone after xfs_bmapi_remap..." It's actually undone inside xfs_bmapi_remap, and has to because that clears the blockcount to zero.
On Sat, Aug 24, 2024 at 06:50:04AM +0200, Christoph Hellwig wrote: > On Fri, Aug 23, 2024 at 09:47:18PM -0700, Darrick J. Wong wrote: > > > + * data when the entire mapping is in the process of being overwritten > > > + * using the out of place write path. This is undone in after > > > > Nit: "...undone after xfs_bmapi_remap..." > > It's actually undone inside xfs_bmapi_remap, and has to because that > clears the blockcount to zero. "This is undone inside xfs_bmapi_remap after it has incremented di_nblocks for a successful operation." (and you can still add the rvb) --D
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7df74c35d9f900..177735c30d273a 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4847,6 +4847,7 @@ xfs_bmapi_remap( } ip->i_nblocks += len; + ip->i_delayed_blks -= len; /* see xfs_bmap_defer_add */ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); if (ifp->if_format == XFS_DINODE_FMT_BTREE) diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index e224b49b7cff6d..4b8838e5b5bebc 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -346,6 +346,18 @@ xfs_bmap_defer_add( trace_xfs_bmap_defer(bi); xfs_bmap_update_get_group(tp->t_mountp, bi); + + /* + * Ensure the deferred mapping is pre-recorded in i_delayed_blks. + * + * Otherwise stat can report zero blocks for an inode that actually has + * data when the entire mapping is in the process of being overwritten + * using the out of place write path. This is undone in after + * xfs_bmapi_remap has incremented di_nblocks for a successful + * operation. + */ + if (bi->bi_type == XFS_BMAP_MAP) + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount; xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type); } @@ -367,6 +379,9 @@ xfs_bmap_update_cancel_item( { struct xfs_bmap_intent *bi = bi_entry(item); + if (bi->bi_type == XFS_BMAP_MAP) + bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount; + xfs_bmap_update_put_group(bi); kmem_cache_free(xfs_bmap_intent_cache, bi); } @@ -464,6 +479,9 @@ xfs_bui_recover_work( bi->bi_owner = *ipp; xfs_bmap_update_get_group(mp, bi); + /* see xfs_bmap_defer_add for details */ + if (bi->bi_type == XFS_BMAP_MAP) + bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount; xfs_defer_add_item(dfp, &bi->bi_list); return bi; }
COW writes remove the amount overwritten either directly for delalloc reservations, or in earlier deferred transactions than adding the new amount back in the bmap map transaction. This means st_blocks on an inode where all data is overwritten using the COW path can temporarily show a 0 st_blocks. This can easily be reproduced with the pending zoned device support where all writes use this path and trips the check in generic/615, but could also happen on a reflink file without that. Fix this by temporarily add the pending blocks to be mapped to i_delayed_blks while the item is queued. Signed-off-by: Christoph Hellwig <hch@lst.de> --- Changes since v1: - slightly more and slightly improved comments fs/xfs/libxfs/xfs_bmap.c | 1 + fs/xfs/xfs_bmap_item.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+)