Message ID | 20240529095206.2568162-4-yi.zhang@huaweicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iomap/xfs: fix stale data exposure when truncating realtime inodes | expand |
> - const struct iomap_ops *ops) > +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, > + bool *did_zero, const struct iomap_ops *ops) > { > - unsigned int blocksize = i_blocksize(inode); > - unsigned int off = pos & (blocksize - 1); > + unsigned int off = rem_u64(pos, blocksize); > > /* Block boundary? Nothing to do */ > if (!off) Instad of passing yet another argument here, can we just kill iomap_truncate_page? I.e. just open code the rem_u64 and 0 offset check in the only caller and call iomap_zero_range. Same for the DAX variant and it's two callers.
On Fri, May 31, 2024 at 05:39:10AM -0700, Christoph Hellwig wrote: > > - const struct iomap_ops *ops) > > +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, > > + bool *did_zero, const struct iomap_ops *ops) > > { > > - unsigned int blocksize = i_blocksize(inode); > > - unsigned int off = pos & (blocksize - 1); > > + unsigned int off = rem_u64(pos, blocksize); > > > > /* Block boundary? Nothing to do */ > > if (!off) > > Instad of passing yet another argument here, can we just kill > iomap_truncate_page? > > I.e. just open code the rem_u64 and 0 offset check in the only caller > and call iomap_zero_range. Same for the DAX variant and it's two > callers. > > Hey Christoph, I've wondered the same about killing off iomap_truncate_page(), but JFYI one of the several prototypes I have around of other potential ways to address this problem slightly splits off truncate page from being a straight zero range wrapper. A quick diff [2] of that is inlined below for reference (only lightly tested, may be busted, etc.). The idea is that IIRC truncate_page() was really the only zero range user that might actually encounter dirty cache over unwritten mappings, so given that and the typically constrained range size of truncate page, just let it be more aggressive about bypassing the unwritten mapping optimization in iomap_zero_iter(). Just something else to consider, and this is definitely not something you'd want to do for zero range proper. Brian P.S., I think the last patches I actually posted around this were here [1], but I also have multiple versions of that selective flush approach ported to iomap instead of being XFS specific as well. [1] https://lore.kernel.org/linux-xfs/20221104182358.2007475-1-bfoster@redhat.com/ https://lore.kernel.org/linux-xfs/20221128173945.3953659-1-bfoster@redhat.com/ [2] truncate page POC: --- 8< --- diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index c5802a459334..a261e732ea05 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1380,7 +1380,8 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, } EXPORT_SYMBOL_GPL(iomap_file_unshare); -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero, + bool dirty_cache) { const struct iomap *srcmap = iomap_iter_srcmap(iter); loff_t pos = iter->pos; @@ -1388,7 +1389,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) loff_t written = 0; /* already zeroed? we're done. */ - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) + if (srcmap->type == IOMAP_HOLE || + (!dirty_cache && srcmap->type == IOMAP_UNWRITTEN)) return length; do { @@ -1439,7 +1441,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, int ret; while ((ret = iomap_iter(&iter, ops)) > 0) - iter.processed = iomap_zero_iter(&iter, did_zero); + iter.processed = iomap_zero_iter(&iter, did_zero, false); return ret; } EXPORT_SYMBOL_GPL(iomap_zero_range); @@ -1450,11 +1452,29 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, { unsigned int blocksize = i_blocksize(inode); unsigned int off = pos & (blocksize - 1); + struct iomap_iter iter = { + .inode = inode, + .pos = pos, + .len = blocksize - off, + .flags = IOMAP_ZERO, + }; + loff_t end; + int ret; + bool dirty_cache = false; /* Block boundary? Nothing to do */ if (!off) return 0; - return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops); + + /* overwrite unwritten ranges if any part of the range is dirty for + * truncate page */ + end = iter.pos + iter.len - 1; + if (filemap_range_needs_writeback(inode->i_mapping, iter.pos, end)) + dirty_cache = true; + + while ((ret = iomap_iter(&iter, ops)) > 0) + iter.processed = iomap_zero_iter(&iter, did_zero, dirty_cache); + return ret; } EXPORT_SYMBOL_GPL(iomap_truncate_page);
On 2024/5/31 20:39, Christoph Hellwig wrote: >> - const struct iomap_ops *ops) >> +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, >> + bool *did_zero, const struct iomap_ops *ops) >> { >> - unsigned int blocksize = i_blocksize(inode); >> - unsigned int off = pos & (blocksize - 1); >> + unsigned int off = rem_u64(pos, blocksize); >> >> /* Block boundary? Nothing to do */ >> if (!off) > > Instad of passing yet another argument here, can we just kill > iomap_truncate_page? > > I.e. just open code the rem_u64 and 0 offset check in the only caller > and call iomap_zero_range. Same for the DAX variant and it's two > callers. > Yeah, we could drop iomap_truncate_page() and dax_truncate_page(), but that means we have to open code the zeroing length calculation or add a fs private helper to do that in every filesystems. Now we only have xfs and ext2 two caller, so it looks fine, but if the caller becomes more in the future, this could becomes repetitive, if we keep them, all filesystems could don't pay attention to these details. Thanks, Yi.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a3a5d4eb7289..30b49d6e9d48 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -17,6 +17,7 @@ #include <linux/bio.h> #include <linux/sched/signal.h> #include <linux/migrate.h> +#include <linux/math64.h> #include "trace.h" #include "../internal.h" @@ -1483,11 +1484,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_zero_range); int -iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops) +iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops) { - unsigned int blocksize = i_blocksize(inode); - unsigned int off = pos & (blocksize - 1); + unsigned int off = rem_u64(pos, blocksize); /* Block boundary? Nothing to do */ if (!off) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 378342673925..32306804b01b 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1471,10 +1471,11 @@ xfs_truncate_page( bool *did_zero) { struct inode *inode = VFS_I(ip); + unsigned int blocksize = i_blocksize(inode); if (IS_DAX(inode)) return dax_truncate_page(inode, pos, did_zero, &xfs_dax_write_iomap_ops); - return iomap_truncate_page(inode, pos, did_zero, + return iomap_truncate_page(inode, pos, blocksize, did_zero, &xfs_buffered_write_iomap_ops); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 6fc1c858013d..d67bf86ec582 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -273,8 +273,8 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); -int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, - const struct iomap_ops *ops); +int iomap_truncate_page(struct inode *inode, loff_t pos, unsigned int blocksize, + bool *did_zero, const struct iomap_ops *ops); vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops); int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,