[v2,0/2] btrfs: error handling fixes for extent_writepage()

Message ID	cover.1732695237.git.wqu@suse.com (mailing list archive)
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB074208A5 for <linux-btrfs@vger.kernel.org>; Wed, 27 Nov 2024 08:15:49 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 0/2] btrfs: error handling fixes for extent_writepage() Date: Wed, 27 Nov 2024 18:45:27 +1030 Message-ID: <cover.1732695237.git.wqu@suse.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: error handling fixes for extent_writepage() \| expand [v2,0/2] btrfs: error handling fixes for extent_writepage() [v2,1/2] btrfs: fix double accounting race in extent_writepage() [v2,2/2] btrfs: handle submit_one_sector() error inside extent_writepage_io()

Message ID

cover.1732695237.git.wqu@suse.com (mailing list archive)

Headers

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 0/2] btrfs: error handling fixes for extent_writepage()
Date: Wed, 27 Nov 2024 18:45:27 +1030
Message-ID: <cover.1732695237.git.wqu@suse.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

btrfs: error handling fixes for extent_writepage() | expand

Message

Qu Wenruo Nov. 27, 2024, 8:15 a.m. UTC

[CHANGELOG]
v2:
- Update the commit message for the first patch
  It turns out the root cause is the race between
  btrfs_finish_one_ordered() and btrfs_mark_ordered_io_finished()

  And that race is possible no matter if the sector size is smaller than
  page size.

  So update the commit message to reflect that.

- Remove comments update in the patch
  To make backport easier.

- Update the commit message for the second patch
  Mostly to down play the possible problem, as the extent map is already
  pinned, thus no way to failed to grab the extent map.

It's more and more common to hit crash in my aarch64 testing VM.

The main symptom is ordered extent double accounting, causing various
problems mostly kernel warning and crashes (for debug builds).

The direct cause the failure from writepage_delalloc() with -ENOSPC,
which is another rabbit hole, but here we need to focus on the error
handling.

All the call traces points to the btrfs_mark_ordered_io_finished()
inside extent_writepage() for error handling.

It turns out that btrfs_mark_ordered_io_finished() inside
extent_writepage() is racing with the same cleanup inside
btrfs_run_delalloc_range().

And if the one inside extent_writepage() is called before the ordered
extent removed from the ordered tree (the removal is queued in a
workqueue), then we hit the double accounting.


There is also a theoretical failure path from submit_one_sector(),  but
I have never hit a case caused by that failure, the fix is only for the
sake of consistency.

Both fixes are similar, by moving the btrfs_mark_ordered_io_finished()
calls for error handling into each function, so that we can avoid
touching ranges that is already covered.

Although these fixes are mostly for backports, the proper fix in the end
would be reworking how we handle dirty folio writeback.

The current way is map-map-map, then submit-submit-submit (run delalloc
for every dirty sector of the folio, then submit all dirty sectors).

The planned new fix would be more like iomap to go
map-submit-map-submit-map-submit (run one delalloc, then immeidately submit
it).


Qu Wenruo (2):
  btrfs: fix double accounting race in extent_writepage()
  btrfs: handle submit_one_sector() error inside extent_writepage_io()

 fs/btrfs/extent_io.c | 63 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 15 deletions(-)