mbox series

[v2,0/2] btrfs: error handling fixes for extent_writepage()

Message ID cover.1732695237.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: error handling fixes for extent_writepage() | expand

Message

Qu Wenruo Nov. 27, 2024, 8:15 a.m. UTC
[CHANGELOG]
v2:
- Update the commit message for the first patch
  It turns out the root cause is the race between
  btrfs_finish_one_ordered() and btrfs_mark_ordered_io_finished()

  And that race is possible no matter if the sector size is smaller than
  page size.

  So update the commit message to reflect that.

- Remove comments update in the patch
  To make backport easier.

- Update the commit message for the second patch
  Mostly to down play the possible problem, as the extent map is already
  pinned, thus no way to failed to grab the extent map.

It's more and more common to hit crash in my aarch64 testing VM.

The main symptom is ordered extent double accounting, causing various
problems mostly kernel warning and crashes (for debug builds).

The direct cause the failure from writepage_delalloc() with -ENOSPC,
which is another rabbit hole, but here we need to focus on the error
handling.

All the call traces points to the btrfs_mark_ordered_io_finished()
inside extent_writepage() for error handling.

It turns out that btrfs_mark_ordered_io_finished() inside
extent_writepage() is racing with the same cleanup inside
btrfs_run_delalloc_range().

And if the one inside extent_writepage() is called before the ordered
extent removed from the ordered tree (the removal is queued in a
workqueue), then we hit the double accounting.


There is also a theoretical failure path from submit_one_sector(),  but
I have never hit a case caused by that failure, the fix is only for the
sake of consistency.

Both fixes are similar, by moving the btrfs_mark_ordered_io_finished()
calls for error handling into each function, so that we can avoid
touching ranges that is already covered.

Although these fixes are mostly for backports, the proper fix in the end
would be reworking how we handle dirty folio writeback.

The current way is map-map-map, then submit-submit-submit (run delalloc
for every dirty sector of the folio, then submit all dirty sectors).

The planned new fix would be more like iomap to go
map-submit-map-submit-map-submit (run one delalloc, then immeidately submit
it).


Qu Wenruo (2):
  btrfs: fix double accounting race in extent_writepage()
  btrfs: handle submit_one_sector() error inside extent_writepage_io()

 fs/btrfs/extent_io.c | 63 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 15 deletions(-)