mbox series

[v5,0/5] btrfs: fix corruption caused by partial dio writes

Message ID cover.1679512207.git.boris@bur.io (mailing list archive)
Headers show
Series btrfs: fix corruption caused by partial dio writes | expand

Message

Boris Burkov March 22, 2023, 7:11 p.m. UTC
The final patch in this series ensures that bios submitted by btrfs dio
match the corresponding ordered_extent and extent_map exactly. As a
result, there is no hole or deadlock as a result of partial writes, even
if the write buffer is a partly overlapping mapping of the range being
written to.

This required a bit of refactoring and setup. Specifically, the zoned
device code for "extracting" an ordered extent matching a bio could be
reused with some refactoring to return the new ordered extents after the
split.

Patch 1: Generic patch for returning an ordered extent while creating it
Patch 2: Cache the dio ordered extent so that we can efficiently detect
         partial writes during bio submission, without an extra lookup.
Patch 3: Use Patch 1 to track the new ordered_extent(s) that result from
         splitting an ordered_extent.
Patch 4: Fix a bug in ordered extent splitting
Patch 5: Use the new more general split logic of Patch 4 and the stashed
         ordered extent from Patch 2 to split partial dio bios to fix
         the corruption while avoiding the deadlock.

---
Changelog:
v5:
- skip splitting em on nocow writes, this removes the need to refactor
  split_em, so remove that patch, and just rename split_zoned_em to
  split_em.
v4:
- significant changes; redesign the fix to use bio splitting instead of
  extending the ordered_extent lifetime across calls into iomap.
- all the oe/em splitting refactoring and fixes
v3:
- handle BTRFS_IOERR set on the ordered_extent in btrfs_dio_iomap_end.
  If the bio fails before we loop in the submission loop and exit from
  the loop early, we never submit a second bio covering the rest of the
  extent range, resulting in leaking the ordered_extent, which hangs umount.
  We can distinguish this from a short write in btrfs_dio_iomap_end by
  checking the ordered_extent.
v2:
- rename new ordered extent function
- pull the new function into a prep patch
- reorganize how the ordered_extent is stored/passed around to avoid so
many annoying memsets and exposing it to fs/btrfs/file.c
- lots of small code style improvements
- remove unintentional whitespace changes
- commit message improvements
- various ASSERTs for clarity/debugging


Boris Burkov (5):
  btrfs: add function to create and return an ordered extent
  btrfs: stash ordered extent in dio_data during iomap dio
  btrfs: return ordered_extent splits from bio extraction
  btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent
  btrfs: split partial dio bios before submit

 fs/btrfs/bio.c          |  2 +-
 fs/btrfs/btrfs_inode.h  |  5 ++-
 fs/btrfs/inode.c        | 94 +++++++++++++++++++++++++++++++----------
 fs/btrfs/ordered-data.c | 88 ++++++++++++++++++++++++++++----------
 fs/btrfs/ordered-data.h | 13 ++++--
 5 files changed, 152 insertions(+), 50 deletions(-)