Message ID | 20240822145910.188974-3-bfoster@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | iomap: flush dirty cache over unwritten mappings on zero range | expand |
On Thu, Aug 22, 2024 at 10:59:10AM -0400, Brian Foster wrote: > Note that we also flush for hole mappings because iomap_zero_range() > is used for partial folio zeroing in some cases. For example, if a > folio straddles EOF on a sub-page FSB size fs, the post-eof portion > is hole-backed and dirtied/written via mapped write, and then i_size > increases before writeback can occur (which otherwise zeroes the > post-eof portion of the EOF folio), then the folio becomes > inconsistent with disk until reclaimed. Eww. I'm not sure iomap_zero_range is the right way to handle this even if that's what we have now and it kinda work. > + /* > + * We can skip pre-zeroed mappings so long as either the mapping was > + * clean before we started or we've flushed at least once since. > + * Otherwise we don't know whether the current mapping had dirty > + * pagecache, so flush it now, stale the current mapping, and proceed > + * from there. > + */ > + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) { .. at very least the above needs to be documented here as a big fat reminder, though. Otherwise the series looks sensible to me.
On Mon, Aug 26, 2024 at 11:11:42PM -0700, Christoph Hellwig wrote: > On Thu, Aug 22, 2024 at 10:59:10AM -0400, Brian Foster wrote: > > Note that we also flush for hole mappings because iomap_zero_range() > > is used for partial folio zeroing in some cases. For example, if a > > folio straddles EOF on a sub-page FSB size fs, the post-eof portion > > is hole-backed and dirtied/written via mapped write, and then i_size > > increases before writeback can occur (which otherwise zeroes the > > post-eof portion of the EOF folio), then the folio becomes > > inconsistent with disk until reclaimed. > > Eww. I'm not sure iomap_zero_range is the right way to handle this > even if that's what we have now and it kinda work. > Yeah, I agree with that. That was one of the minor appeals (to me) of the prototype I posted a while back that customizes iomap_truncate_page() to do unconditional zeroing instead of being an almost pointless wrapper for iomap_zero_range(). I don't quite recall if that explicitly handled the hole case since it was quickly hacked together, but the general idea is that it could be more purpose built for these partial zeroing cases than zero range might be. There are some other caveats with that approach that would still require some variant of zero range, however, so I'm not immediately clear on what the ideal high level solution looks like quite yet. I'd like to get the existing mechanism working correctly, improve the test situation and go from there. > > + /* > > + * We can skip pre-zeroed mappings so long as either the mapping was > > + * clean before we started or we've flushed at least once since. > > + * Otherwise we don't know whether the current mapping had dirty > > + * pagecache, so flush it now, stale the current mapping, and proceed > > + * from there. > > + */ > > + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) { > > .. at very least the above needs to be documented here as a big fat > reminder, though. > Sure, I'll include that explanation in the code comments in v2. Thanks. Brian > Otherwise the series looks sensible to me. > >
On Tue, Aug 27, 2024 at 10:23:10AM -0400, Brian Foster wrote: > Yeah, I agree with that. That was one of the minor appeals (to me) of the > prototype I posted a while back that customizes iomap_truncate_page() to > do unconditional zeroing instead of being an almost pointless wrapper > for iomap_zero_range(). I only very vaguely remember that, you don't happen to have a pointer to that?
On Tue, Aug 27, 2024 at 09:32:35PM -0700, Christoph Hellwig wrote: > On Tue, Aug 27, 2024 at 10:23:10AM -0400, Brian Foster wrote: > > Yeah, I agree with that. That was one of the minor appeals (to me) of the > > prototype I posted a while back that customizes iomap_truncate_page() to > > do unconditional zeroing instead of being an almost pointless wrapper > > for iomap_zero_range(). > > I only very vaguely remember that, you don't happen to have a pointer > to that? > > Yeah, it was buried in a separate review around potentially killing off iomap_truncate_page(): https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ The idea is pretty simple.. use the same kind of check this patch does for doing a flush, but instead open code and isolate it to iomap_truncate_page() so we can just default to doing the buffered write instead. Note that I don't think this replaces the need for patch 1, but it might arguably make further optimization of the flush kind of pointless because I'm not sure zero range would ever be called from somewhere that doesn't flush already. The tradeoffs I can think of are this might introduce some false positives where an EOF folio might be dirty but a sub-folio size block backing EOF might be clean, and again that callers like truncate and write extension would need to both truncate the eof page and zero the broader post-eof range. Neither of those seem all that significant to me, but just my .02. Brian
On Wed, Aug 28, 2024 at 08:35:47AM -0400, Brian Foster wrote: > Yeah, it was buried in a separate review around potentially killing off > iomap_truncate_page(): > > https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ > > The idea is pretty simple.. use the same kind of check this patch does > for doing a flush, but instead open code and isolate it to > iomap_truncate_page() so we can just default to doing the buffered write > instead. > > Note that I don't think this replaces the need for patch 1, but it might > arguably make further optimization of the flush kind of pointless > because I'm not sure zero range would ever be called from somewhere that > doesn't flush already. > > The tradeoffs I can think of are this might introduce some false > positives where an EOF folio might be dirty but a sub-folio size block > backing EOF might be clean, and again that callers like truncate and > write extension would need to both truncate the eof page and zero the > broader post-eof range. Neither of those seem all that significant to > me, but just my .02. Looking at that patch and your current series I kinda like not having to deal with the dirty caches in the loop, and in fact I'd also prefer to not do any writeback from the low-level zero helpers if we can. That is not doing your patch 1 but instead auditing the callers if any of them needs them and documenting the expectation. But please let Dave and Darrick chime in first before investing any work into this.
On Wed, Aug 28, 2024 at 10:41:56PM -0700, Christoph Hellwig wrote: > On Wed, Aug 28, 2024 at 08:35:47AM -0400, Brian Foster wrote: > > Yeah, it was buried in a separate review around potentially killing off > > iomap_truncate_page(): > > > > https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ > > > > The idea is pretty simple.. use the same kind of check this patch does > > for doing a flush, but instead open code and isolate it to > > iomap_truncate_page() so we can just default to doing the buffered write > > instead. > > > > Note that I don't think this replaces the need for patch 1, but it might > > arguably make further optimization of the flush kind of pointless > > because I'm not sure zero range would ever be called from somewhere that > > doesn't flush already. > > > > The tradeoffs I can think of are this might introduce some false > > positives where an EOF folio might be dirty but a sub-folio size block > > backing EOF might be clean, and again that callers like truncate and > > write extension would need to both truncate the eof page and zero the > > broader post-eof range. Neither of those seem all that significant to > > me, but just my .02. > > Looking at that patch and your current series I kinda like not having > to deal with the dirty caches in the loop, and in fact I'd also prefer > to not do any writeback from the low-level zero helpers if we can. > That is not doing your patch 1 but instead auditing the callers if > any of them needs them and documenting the expectation. > I agree this seems better in some ways, but I don't like complicating or putting more responsibility on the callers. I think if we had a high level iomap function that wrapped a combination of this proposed variant of truncate_page() and zero_range() for general inode size changes, that might alleviate that concern. Otherwise IME even if we audited and fixed all callers today, over time we'll just reintroduce the same sorts of errors if the low level mechanisms aren't made to function correctly. > But please let Dave and Darrick chime in first before investing any > work into this. > > Based on the feedback to v2, it sounds like there's general consensus on the approach modulo some code factoring discussion. Unless there is objection, I think I'll stick with that for now for the sake of progress and keep this option in mind on the back burner. None of this is really that hard to change if we come up with something better. Brian
On Thu, Aug 29, 2024 at 11:05:59AM -0400, Brian Foster wrote: > On Wed, Aug 28, 2024 at 10:41:56PM -0700, Christoph Hellwig wrote: > > On Wed, Aug 28, 2024 at 08:35:47AM -0400, Brian Foster wrote: > > > Yeah, it was buried in a separate review around potentially killing off > > > iomap_truncate_page(): > > > > > > https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ > > > > > > The idea is pretty simple.. use the same kind of check this patch does > > > for doing a flush, but instead open code and isolate it to > > > iomap_truncate_page() so we can just default to doing the buffered write > > > instead. > > > > > > Note that I don't think this replaces the need for patch 1, but it might > > > arguably make further optimization of the flush kind of pointless > > > because I'm not sure zero range would ever be called from somewhere that > > > doesn't flush already. > > > > > > The tradeoffs I can think of are this might introduce some false > > > positives where an EOF folio might be dirty but a sub-folio size block > > > backing EOF might be clean, and again that callers like truncate and > > > write extension would need to both truncate the eof page and zero the > > > broader post-eof range. Neither of those seem all that significant to > > > me, but just my .02. > > > > Looking at that patch and your current series I kinda like not having > > to deal with the dirty caches in the loop, and in fact I'd also prefer > > to not do any writeback from the low-level zero helpers if we can. > > That is not doing your patch 1 but instead auditing the callers if > > any of them needs them and documenting the expectation. I looked, and was pretty sure that XFS is the only one that has that expectation. > I agree this seems better in some ways, but I don't like complicating or > putting more responsibility on the callers. I think if we had a high > level iomap function that wrapped a combination of this proposed variant > of truncate_page() and zero_range() for general inode size changes, that > might alleviate that concern. > > Otherwise IME even if we audited and fixed all callers today, over time > we'll just reintroduce the same sorts of errors if the low level > mechanisms aren't made to function correctly. Yeah. What /are/ the criteria for needing the flush and wait? AFAICT, a filesystem only needs the flush if it's possible to have dirty pagecache backed either by a hole or an unwritten extent, right? I suppose we could amend the iomap ops so that filesystems could signal that they allow either of those things, and then we wouldn't have to query the mapping for filesystems that don't, right? IOWs, one can opt out of safety features if there's no risk of a garbage, right? (Also: does xfs allow dirty page cache backed by a hole? I didn't think that was possible.) > > But please let Dave and Darrick chime in first before investing any > > work into this. > > > > > > Based on the feedback to v2, it sounds like there's general consensus on > the approach modulo some code factoring discussion. Unless there is > objection, I think I'll stick with that for now for the sake of progress > and keep this option in mind on the back burner. None of this is really > that hard to change if we come up with something better. > > Brian > >
On Thu, Aug 29, 2024 at 02:48:00PM -0700, Darrick J. Wong wrote: > On Thu, Aug 29, 2024 at 11:05:59AM -0400, Brian Foster wrote: > > On Wed, Aug 28, 2024 at 10:41:56PM -0700, Christoph Hellwig wrote: > > > On Wed, Aug 28, 2024 at 08:35:47AM -0400, Brian Foster wrote: > > > > Yeah, it was buried in a separate review around potentially killing off > > > > iomap_truncate_page(): > > > > > > > > https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ > > > > > > > > The idea is pretty simple.. use the same kind of check this patch does > > > > for doing a flush, but instead open code and isolate it to > > > > iomap_truncate_page() so we can just default to doing the buffered write > > > > instead. > > > > > > > > Note that I don't think this replaces the need for patch 1, but it might > > > > arguably make further optimization of the flush kind of pointless > > > > because I'm not sure zero range would ever be called from somewhere that > > > > doesn't flush already. > > > > > > > > The tradeoffs I can think of are this might introduce some false > > > > positives where an EOF folio might be dirty but a sub-folio size block > > > > backing EOF might be clean, and again that callers like truncate and > > > > write extension would need to both truncate the eof page and zero the > > > > broader post-eof range. Neither of those seem all that significant to > > > > me, but just my .02. > > > > > > Looking at that patch and your current series I kinda like not having > > > to deal with the dirty caches in the loop, and in fact I'd also prefer > > > to not do any writeback from the low-level zero helpers if we can. > > > That is not doing your patch 1 but instead auditing the callers if > > > any of them needs them and documenting the expectation. > > I looked, and was pretty sure that XFS is the only one that has that > expectation. > > > I agree this seems better in some ways, but I don't like complicating or > > putting more responsibility on the callers. I think if we had a high > > level iomap function that wrapped a combination of this proposed variant > > of truncate_page() and zero_range() for general inode size changes, that > > might alleviate that concern. > > > > Otherwise IME even if we audited and fixed all callers today, over time > > we'll just reintroduce the same sorts of errors if the low level > > mechanisms aren't made to function correctly. > > Yeah. What /are/ the criteria for needing the flush and wait? AFAICT, > a filesystem only needs the flush if it's possible to have dirty > pagecache backed either by a hole or an unwritten extent, right? > Yeah, but this flush behavior shouldn't be a caller consideration at all. It's just an implementation detail. All the caller should care about is that zero range works As Expected (tm). The pre-iomap way of doing this in XFS was xfs_zero_eof() -> xfs_iozero(), which was an internally coded buffered write loop that wrote zeroes into pagecache. That was ultimately replaced with iomap_zero_range() with the same sort of usage expectations, but iomap_zero_range() just didn't work quite correctly in all cases. > I suppose we could amend the iomap ops so that filesystems could signal > that they allow either of those things, and then we wouldn't have to > query the mapping for filesystems that don't, right? IOWs, one can opt > out of safety features if there's no risk of a garbage, right? > Not sure I parse.. In general I think we could let ops signal whether they want certain checks. This is how I used the IOMAP_F_DIRTY_CACHE flag mentioned in the other thread. If the operation handler is interested in pagecache state, set an IOMAP_DIRTY_CACHE flag in ops to trigger a pre iomap_begin() check and then set the corresponding _F_DIRTY_CACHE flag on the mapping if dirty, but I'm not sure if that's the same concept you're alluding to here. > (Also: does xfs allow dirty page cache backed by a hole? I didn't think > that was possible.) > It's a corner case. A mapped write can write to any portion of a folio so long as it starts within eof. So if you have a mapped write that writes past EOF, there's no guarantee that range of the folio is mapped by blocks. That post-eof part of the folio would be zeroed at writeback time, but that assumes i_size doesn't change before writeback. If it does and the size change operation doesn't do the zeroing itself (enter zero range via write extension), then we end up with a dirty folio at least partially backed by a hole with non-zero data within EOF. There's nothing written back to disk in this hole backed example, but the pagecache is still inconsistent with what's on disk and therefore I suspect data corruption is possible if the folio is redirtied before reclaimed. Brian > > > But please let Dave and Darrick chime in first before investing any > > > work into this. > > > > > > > > > > Based on the feedback to v2, it sounds like there's general consensus on > > the approach modulo some code factoring discussion. Unless there is > > objection, I think I'll stick with that for now for the sake of progress > > and keep this option in mind on the back burner. None of this is really > > that hard to change if we come up with something better. > > > > Brian > > > > >
On Fri, Aug 30, 2024 at 07:57:41AM -0400, Brian Foster wrote: > On Thu, Aug 29, 2024 at 02:48:00PM -0700, Darrick J. Wong wrote: > > On Thu, Aug 29, 2024 at 11:05:59AM -0400, Brian Foster wrote: > > > On Wed, Aug 28, 2024 at 10:41:56PM -0700, Christoph Hellwig wrote: > > > > On Wed, Aug 28, 2024 at 08:35:47AM -0400, Brian Foster wrote: > > > > > Yeah, it was buried in a separate review around potentially killing off > > > > > iomap_truncate_page(): > > > > > > > > > > https://lore.kernel.org/linux-fsdevel/ZlxUpYvb9dlOHFR3@bfoster/ > > > > > > > > > > The idea is pretty simple.. use the same kind of check this patch does > > > > > for doing a flush, but instead open code and isolate it to > > > > > iomap_truncate_page() so we can just default to doing the buffered write > > > > > instead. > > > > > > > > > > Note that I don't think this replaces the need for patch 1, but it might > > > > > arguably make further optimization of the flush kind of pointless > > > > > because I'm not sure zero range would ever be called from somewhere that > > > > > doesn't flush already. > > > > > > > > > > The tradeoffs I can think of are this might introduce some false > > > > > positives where an EOF folio might be dirty but a sub-folio size block > > > > > backing EOF might be clean, and again that callers like truncate and > > > > > write extension would need to both truncate the eof page and zero the > > > > > broader post-eof range. Neither of those seem all that significant to > > > > > me, but just my .02. > > > > > > > > Looking at that patch and your current series I kinda like not having > > > > to deal with the dirty caches in the loop, and in fact I'd also prefer > > > > to not do any writeback from the low-level zero helpers if we can. > > > > That is not doing your patch 1 but instead auditing the callers if > > > > any of them needs them and documenting the expectation. > > > > I looked, and was pretty sure that XFS is the only one that has that > > expectation. > > > > > I agree this seems better in some ways, but I don't like complicating or > > > putting more responsibility on the callers. I think if we had a high > > > level iomap function that wrapped a combination of this proposed variant > > > of truncate_page() and zero_range() for general inode size changes, that > > > might alleviate that concern. > > > > > > Otherwise IME even if we audited and fixed all callers today, over time > > > we'll just reintroduce the same sorts of errors if the low level > > > mechanisms aren't made to function correctly. > > > > Yeah. What /are/ the criteria for needing the flush and wait? AFAICT, > > a filesystem only needs the flush if it's possible to have dirty > > pagecache backed either by a hole or an unwritten extent, right? > > > > Yeah, but this flush behavior shouldn't be a caller consideration at > all. It's just an implementation detail. All the caller should care > about is that zero range works As Expected (tm). > > The pre-iomap way of doing this in XFS was xfs_zero_eof() -> > xfs_iozero(), which was an internally coded buffered write loop that > wrote zeroes into pagecache. That was ultimately replaced with > iomap_zero_range() with the same sort of usage expectations, but > iomap_zero_range() just didn't work quite correctly in all cases. > > > I suppose we could amend the iomap ops so that filesystems could signal > > that they allow either of those things, and then we wouldn't have to > > query the mapping for filesystems that don't, right? IOWs, one can opt > > out of safety features if there's no risk of a garbage, right? > > > > Not sure I parse.. In general I think we could let ops signal whether > they want certain checks. This is how I used the IOMAP_F_DIRTY_CACHE > flag mentioned in the other thread. If the operation handler is > interested in pagecache state, set an IOMAP_DIRTY_CACHE flag in ops to > trigger a pre iomap_begin() check and then set the corresponding > _F_DIRTY_CACHE flag on the mapping if dirty, but I'm not sure if that's > the same concept you're alluding to here. Nope. I was thinking about adding a field to iomap_ops so that filesystems could declare which types of mappings they could return: const struct iomap_ops xfs_buffered_write_iomap_ops = { .iomap_begin = xfs_buffered_write_iomap_begin, .iomap_end = xfs_buffered_write_iomap_end, /* xfs allows sparse holes and unwritten extents */ .iomap_types = (1U << IOMAP_UNWRITTEN) | (1U << IOMAP_HOLE), }; But given your statement below about dirtying a post-eof region for which the filesystem does not allocate a block, I suspect we just have to enable the flush thing for everyone and don't need the flags thing. > > (Also: does xfs allow dirty page cache backed by a hole? I didn't think > > that was possible.) > > > > It's a corner case. A mapped write can write to any portion of a folio > so long as it starts within eof. So if you have a mapped write that > writes past EOF, there's no guarantee that range of the folio is mapped > by blocks. > > That post-eof part of the folio would be zeroed at writeback time, but > that assumes i_size doesn't change before writeback. If it does and the > size change operation doesn't do the zeroing itself (enter zero range > via write extension), then we end up with a dirty folio at least > partially backed by a hole with non-zero data within EOF. There's > nothing written back to disk in this hole backed example, but the > pagecache is still inconsistent with what's on disk and therefore I > suspect data corruption is possible if the folio is redirtied before > reclaimed. Ah. Yikes. --D > Brian > > > > > But please let Dave and Darrick chime in first before investing any > > > > work into this. > > > > > > > > > > > > > > Based on the feedback to v2, it sounds like there's general consensus on > > > the approach modulo some code factoring discussion. Unless there is > > > objection, I think I'll stick with that for now for the sake of progress > > > and keep this option in mind on the back burner. None of this is really > > > that hard to change if we come up with something better. > > > > > > Brian > > > > > > > > > >
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 3e846f43ff48..841cd01d8194 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1393,16 +1393,42 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, } EXPORT_SYMBOL_GPL(iomap_file_unshare); -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) +/* + * Flush the remaining range of the iter and mark the current mapping stale. + * This is used when zero range sees an unwritten mapping that may have had + * dirty pagecache over it. + */ +static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i) +{ + struct address_space *mapping = i->inode->i_mapping; + loff_t end = i->pos + i->len - 1; + + i->iomap.flags |= IOMAP_F_STALE; + return filemap_write_and_wait_range(mapping, i->pos, end); +} + +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero, + bool *range_dirty) { const struct iomap *srcmap = iomap_iter_srcmap(iter); loff_t pos = iter->pos; loff_t length = iomap_length(iter); loff_t written = 0; - /* already zeroed? we're done. */ - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) + /* + * We can skip pre-zeroed mappings so long as either the mapping was + * clean before we started or we've flushed at least once since. + * Otherwise we don't know whether the current mapping had dirty + * pagecache, so flush it now, stale the current mapping, and proceed + * from there. + */ + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) { + if (*range_dirty) { + *range_dirty = false; + return iomap_zero_iter_flush_and_stale(iter); + } return length; + } do { struct folio *folio; @@ -1450,19 +1476,27 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, .flags = IOMAP_ZERO, }; int ret; + bool range_dirty; /* * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but * pagecache must be flushed to ensure stale data from previous - * buffered writes is not exposed. + * buffered writes is not exposed. A flush is only required for certain + * types of mappings, but checking pagecache after mapping lookup is + * racy with writeback and reclaim. + * + * Therefore, check the entire range first and pass along whether any + * part of it is dirty. If so and an underlying mapping warrants it, + * flush the cache at that point. This trades off the occasional false + * positive (and spurious flush, if the dirty data and mapping don't + * happen to overlap) for simplicity in handling a relatively uncommon + * situation. */ - ret = filemap_write_and_wait_range(inode->i_mapping, - pos, pos + len - 1); - if (ret) - return ret; + range_dirty = filemap_range_needs_writeback(inode->i_mapping, + pos, pos + len - 1); while ((ret = iomap_iter(&iter, ops)) > 0) - iter.processed = iomap_zero_iter(&iter, did_zero); + iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty); return ret; } EXPORT_SYMBOL_GPL(iomap_zero_range);
iomap_zero_range() flushes pagecache to mitigate consistency problems with dirty pagecache and unwritten mappings. The flush is unconditional over the entire range because checking pagecache state after mapping lookup is racy with writeback and reclaim. There are ways around this using iomap's mapping revalidation mechanism, but this is not supported by all iomap based filesystems and so is not a generic solution. There is another way around this limitation that is good enough to filter the flush for most cases in practice. If we check for dirty pagecache over the target range (instead of unconditionally flush), we can keep track of whether the range was dirty before lookup and defer the flush until/unless we see a combination of dirty cache backed by an unwritten mapping. We don't necessarily know whether the dirty cache was backed by the unwritten maping or some other (written) part of the range, but the impliciation of a false positive here is a spurious flush and thus relatively harmless. Note that we also flush for hole mappings because iomap_zero_range() is used for partial folio zeroing in some cases. For example, if a folio straddles EOF on a sub-page FSB size fs, the post-eof portion is hole-backed and dirtied/written via mapped write, and then i_size increases before writeback can occur (which otherwise zeroes the post-eof portion of the EOF folio), then the folio becomes inconsistent with disk until reclaimed. A flush in this case executes partial zeroing from writeback, and iomap knows that there is otherwise no I/O to submit for hole backed mappings. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 9 deletions(-)