[RFC] xfs: skip discard of unwritten extents

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi all,

What do folks think of something like this? The motivation here is that
the VDO (dedup) devs had reported seeing online discards during
write-only workloads. These turn out to be related to trimming post-eof
preallocation blocks after large file copies. To my knowledge, this
isn't really a prevalent or serious issue, but I think that technically
these discards are unnecessary and so I was looking into how we could
avoid them.

This behavior is of course not directly related to unwritten extents,
but the immediate/obvious solution to bubble up a bmapi flag of some
kind to xfs_free_eofblocks() seemed rather crude. From there, I figured
that we technically don't need to discard any unwritten extents (within
or beyond EOF) because they haven't been written to since being
allocated. In fact, I'm not sure we have to even busy them, but it's
roughly equivalent logic either way and I'm trying to avoid getting too
clever.

I also recall that we've discussed using unwritten extents for delalloc
-> real conversion to avoid the small stale data exposure window that
exists in writeback. Without getting too deep into the reason we don't
currently do an initial unwritten allocation [1], I don't think there's
anything blocking us from converting any post-eof blocks that happen to
be part of the resulting normal allocation. As it is, the imap is
already trimmed to EOF by the writeback code for coherency reasons. If
we were to convert post-eof blocks (not part of this patch) along with
something like this patch, then we'd indirectly prevent discards for
eofblocks trims.

Beyond the whole discard thing, conversion of post-eof blocks may have a
couple other advantages. First, we eliminate the aforementioned
writeback stale data exposure problem for writes over preallocated
blocks (which doesn't solve the fundamental problem, but closes the
gap). Second, the zeroing required for post-eof writes that jump over
eofblocks (see xfs_file_aio_write_checks()) becomes a much lighter
weight operation. Normal blocks are zeroed using buffered writes whereas
this is essentially a no-op for unwritten extents.

Thoughts? Flames? Other ideas?

Brian

[1] I think I've actually attempted this change in the past, but I
haven't dug through my old git branches as of yet to completely jog my
memory. IIRC, this may have been held up by the remnants of buffer_heads
being used to track state for the writeback code.

 fs/xfs/libxfs/xfs_alloc.c          | 8 ++++++--
 fs/xfs/libxfs/xfs_alloc.h          | 3 ++-
 fs/xfs/libxfs/xfs_bmap.c           | 9 ++++++---
 fs/xfs/libxfs/xfs_bmap.h           | 3 ++-
 fs/xfs/libxfs/xfs_bmap_btree.c     | 2 +-
 fs/xfs/libxfs/xfs_ialloc.c         | 4 ++--
 fs/xfs/libxfs/xfs_ialloc_btree.c   | 2 +-
 fs/xfs/libxfs/xfs_refcount.c       | 7 ++++---
 fs/xfs/libxfs/xfs_refcount_btree.c | 2 +-
 fs/xfs/xfs_extfree_item.c          | 2 +-
 fs/xfs/xfs_fsops.c                 | 2 +-
 fs/xfs/xfs_reflink.c               | 2 +-
 fs/xfs/xfs_trans.h                 | 3 ++-
 fs/xfs/xfs_trans_extfree.c         | 7 ++++---
 14 files changed, 34 insertions(+), 22 deletions(-)

[RFC] xfs: skip discard of unwritten extents

Commit Message

Comments

Patch