diff mbox series

[v2,1/2] btrfs: fix the never finishing ordered extent when it get cleaned up

Message ID 20210518023803.107768-2-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fixes for the 13 subpage preparation patches | expand

Commit Message

Qu Wenruo May 18, 2021, 2:38 a.m. UTC
[BUG]
When running subpage preparation patches on x86, btrfs/125 will hang
forever with one ordered extent never finished.

[CAUSE]
The test case btrfs/125 itself will always fail as the fix is never merged.

When the test fails at balance, btrfs needs to cleanup the ordered
extent in btrfs_cleanup_ordered_extents() for data reloc inode.

The problem is in the sequence how we cleanup the page Order bit.

Currently it works like:

  btrfs_cleanup_ordered_extents()
  |- find_get_page();
  |- btrfs_page_clear_ordered(page);
  |  Now the page doesn't have Ordered bit anymore.
  |  !!! This also includes the first (locked) page !!!
  |
  |- offset += PAGE_SIZE
  |  This is to skip the first page
  |- __endio_write_update_ordered()
     |- btrfs_mark_ordered_io_finished(NULL)
        Except the first page, all ordered extent is finished.

Then the locked page is cleaned up in __extent_writepage():

  __extent_writepage()
  |- If (PageError(page))
  |- end_extent_writepage()
     |- btrfs_mark_ordered_io_finished(page)
        |- if (btrfs_test_page_ordered(page))
        |-  !!! The page get skipped !!!
            The ordered extent is not decreased as the page doesn't
            have ordered bit anymore.

This leaves the ordered extent with bytes_left == sectorsize, thus never
finish.

[FIX]
The proper fix is to ensure the one cleaning up the locked page to clear
page Ordered and finish ordered io range at the same time.

Thus it's no longer possible to clear page ordered then do the ordered
extent io accounting.

This patch choose to fix the call site in btrfs_cleanup_ordered_extents().

Also since we're here, also make btrfs_cleanup_ordered_extents() to
properly clamp the range for the incoming subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 48bcb351aad6..6004f72d0474 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -144,6 +144,8 @@  void btrfs_inode_unlock(struct inode *inode, unsigned int ilock_flags)
 		inode_unlock(inode);
 }
 
+static void finish_ordered_fn(struct btrfs_work *work);
+
 /*
  * Cleanup all submitted ordered extents in specified range to handle errors
  * from the btrfs_run_delalloc_range() callback.
@@ -166,11 +168,28 @@  static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode,
 	struct page *page;
 
 	while (index <= end_index) {
+		u64 range_start;
+		u64 range_len;
+
 		page = find_get_page(inode->vfs_inode.i_mapping, index);
 		index++;
 		if (!page)
 			continue;
-		ClearPageOrdered(page);
+
+		range_start = max_t(u64, page_offset(page), offset);
+		range_len = min(offset + bytes, page_offset(page) + PAGE_SIZE) -
+			    range_start;
+		/*
+		 * Clear page Ordered and its ordered range manually.
+		 *
+		 * We used to clear page Ordered first, but since Ordered bit
+		 * indicates whether we have ordered extent, if it get cleared
+		 * without finishing the ordered io range, the ordered extent
+		 * will hang forever, as later btrfs_mark_ordered_io_finished()
+		 * will just skip the range.
+		 */
+		btrfs_mark_ordered_io_finished(inode, page, range_start,
+				range_len, finish_ordered_fn, false);
 		put_page(page);
 	}