diff mbox series

[v2,3/3] btrfs: remove processed_extent infrastructure

Message ID 5b3edb0ed26aa790fa92d0319739adfd71b3b2f5.1685411033.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: small cleanups mostly for subpage cases | expand

Commit Message

Qu Wenruo May 30, 2023, 1:45 a.m. UTC
The structure processed_extent and the helper
endio_readpage_release_extent() are used to reduce the number of calls of
unlock_extent() during end_bio_extent_readpage()

This is done by merging the range and only call the function either the
status (uptodate or not) changes or the range is no longer continuous.

However the behavior has been changed:

- The range is always continuous
  Since it's the endio function of a btrfs bio, it's ensured the range
  is always continuous inside the same file.

- The uptodate status is now per-bio (aka, will not change)
  Since commit 7609afac6775 ("btrfs: handle checksum validation and
  repair at the storage layer"), the function end_bio_extent_readpage()
  no longer handles the metadata/data verification.

  This means the @uptodate variable will not change during the function
  end_bio_extent_readpage()

Thus there is no longer the need for processed_extent and the helper
endio_readpage_release_extent().

Just call unlock_extent() at the end of end_bio_extent_readpage().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 87 +++++---------------------------------------
 1 file changed, 9 insertions(+), 78 deletions(-)

Comments

Christoph Hellwig May 30, 2023, 5:43 a.m. UTC | #1
I recently posted an equivalent patch:

https://www.spinics.net/lists/linux-btrfs/msg134188.html

but it turns out this actually breaks with compression enabled, where
due to the ompression we can get gaps in the logical addresses.  IIRC
you had to do an auto run with -o compress to catch it.
Qu Wenruo May 30, 2023, 6:19 a.m. UTC | #2
On 2023/5/30 13:43, Christoph Hellwig wrote:
> I recently posted an equivalent patch:
>
> https://www.spinics.net/lists/linux-btrfs/msg134188.html
>
> but it turns out this actually breaks with compression enabled, where
> due to the ompression we can get gaps in the logical addresses.  IIRC
> you had to do an auto run with -o compress to catch it.
>
Great thanks for the mentioning on the existing patch.

However I'm still not sure why compression can be broken.

Yes, data read would also use the same endio hook for both compressed
and regular reads.

But that endio hook is always handling file pages, thus the range it's
handling should always be continuous.
Or did I miss something?

Thanks,
Qu
Qu Wenruo May 30, 2023, 11:35 p.m. UTC | #3
On 2023/5/30 14:19, Qu Wenruo wrote:
>
>
> On 2023/5/30 13:43, Christoph Hellwig wrote:
>> I recently posted an equivalent patch:
>>
>> https://www.spinics.net/lists/linux-btrfs/msg134188.html
>>
>> but it turns out this actually breaks with compression enabled, where
>> due to the ompression we can get gaps in the logical addresses.  IIRC
>> you had to do an auto run with -o compress to catch it.
>>
> Great thanks for the mentioning on the existing patch.
>
> However I'm still not sure why compression can be broken.
>
> Yes, data read would also use the same endio hook for both compressed
> and regular reads.
>
> But that endio hook is always handling file pages, thus the range it's
> handling should always be continuous.
> Or did I miss something?

All my bad, indeed we can submit a btrfs_bio that contains two or more
ranges which are not continuous, for compressed read.

For compressed read, we always use the the on-disk bytenr as the
@disk_bytenr for submit_extent_page().

Which means, if we have the following file layout, we will submit just
one btrfs_bio:

	item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
		generation 7 type 1 (regular)
		extent data disk byte 13631488 nr 4096
		extent data offset 0 nr 4096 ram 32768
		extent compression 1 (zlib)
	item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
		generation 8 type 1 (regular)
		extent data disk byte 0 nr 0
		extent data offset 0 nr 4096 ram 4096
		extent compression 0 (none)
	item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
		generation 7 type 1 (regular)
		extent data disk byte 13631488 nr 4096
		extent data offset 8192 nr 24576 ram 32768
		extent compression 1 (zlib)

When reading the whole file, at file range [0, 4k), we will use 13631488
as @disk_bytenr, but at this stage we don't submit the btrfs_bio yet.

Then we hit the whole, just zeroing the pages.

Finally we hit the range [8K, 32K), we still use 13631488 as
@disk_bytenr, and since we have a btrfs_bio with the same disk_bytenr
and it's compressed read, we merge it.

This results the btrfs_bio containing two ranges, [0, 4K) and [8K 32K).

The objective looks like an optimization, if we submit two btrfs_bio for
the two ranges, we will read the compressed extents twice, and do
de-compression twice.

Although this is anti-instinct, it at least has a valid reason.

But I can argue that, this only works for holes, as if the range [4K,
8K) is not a hole, then we still need to submit two different btrfs_bio
for [0, 4K) and [8K, 32K).

Maybe it's time for us to determine whether the behavior is worthy.

Thanks,
Qu

>
> Thanks,
> Qu
Christoph Hellwig May 31, 2023, 4:33 a.m. UTC | #4
On Wed, May 31, 2023 at 07:35:44AM +0800, Qu Wenruo wrote:
> The objective looks like an optimization, if we submit two btrfs_bio for
> the two ranges, we will read the compressed extents twice, and do
> de-compression twice.
> 
> Although this is anti-instinct, it at least has a valid reason.
> 
> But I can argue that, this only works for holes, as if the range [4K,
> 8K) is not a hole, then we still need to submit two different btrfs_bio
> for [0, 4K) and [8K, 32K).
> 
> Maybe it's time for us to determine whether the behavior is worthy.

To me this behaviour looks reaѕonable and worthwhile, but a comment
in end_bio_extent_readpage would probably have saved both of us a
fair amount of time..
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d228cc8b401..4f5d26194768 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -581,75 +581,6 @@  static void end_bio_extent_writepage(struct btrfs_bio *bbio)
 	bio_put(bio);
 }
 
-/*
- * Record previously processed extent range
- *
- * For endio_readpage_release_extent() to handle a full extent range, reducing
- * the extent io operations.
- */
-struct processed_extent {
-	struct btrfs_inode *inode;
-	/* Start of the range in @inode */
-	u64 start;
-	/* End of the range in @inode */
-	u64 end;
-	bool uptodate;
-};
-
-/*
- * Try to release processed extent range
- *
- * May not release the extent range right now if the current range is
- * contiguous to processed extent.
- *
- * Will release processed extent when any of @inode, @uptodate, the range is
- * no longer contiguous to the processed range.
- *
- * Passing @inode == NULL will force processed extent to be released.
- */
-static void endio_readpage_release_extent(struct processed_extent *processed,
-			      struct btrfs_inode *inode, u64 start, u64 end,
-			      bool uptodate)
-{
-	struct extent_state *cached = NULL;
-	struct extent_io_tree *tree;
-
-	/* The first extent, initialize @processed */
-	if (!processed->inode)
-		goto update;
-
-	/*
-	 * Contiguous to processed extent, just uptodate the end.
-	 *
-	 * Several things to notice:
-	 *
-	 * - bio can be merged as long as on-disk bytenr is contiguous
-	 *   This means we can have page belonging to other inodes, thus need to
-	 *   check if the inode still matches.
-	 * - bvec can contain range beyond current page for multi-page bvec
-	 *   Thus we need to do processed->end + 1 >= start check
-	 */
-	if (processed->inode == inode && processed->uptodate == uptodate &&
-	    processed->end + 1 >= start && end >= processed->end) {
-		processed->end = end;
-		return;
-	}
-
-	tree = &processed->inode->io_tree;
-	/*
-	 * Now we don't have range contiguous to the processed range, release
-	 * the processed range now.
-	 */
-	unlock_extent(tree, processed->start, processed->end, &cached);
-
-update:
-	/* Update processed to current range */
-	processed->inode = inode;
-	processed->start = start;
-	processed->end = end;
-	processed->uptodate = uptodate;
-}
-
 static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 {
 	ASSERT(PageLocked(page));
@@ -674,20 +605,21 @@  static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 {
 	struct bio *bio = &bbio->bio;
+	struct inode *inode = bio_first_page_all(bio)->mapping->host;
 	struct bio_vec *bvec;
-	struct processed_extent processed = { 0 };
+	struct bvec_iter_all iter_all;
+	bool uptodate = !bio->bi_status;
+	u64 file_offset = page_offset(bio_first_page_all(bio)) +
+			  bio_first_bvec_all(bio)->bv_offset;
 	/*
 	 * The offset to the beginning of a bio, since one bio can never be
 	 * larger than UINT_MAX, u32 here is enough.
 	 */
 	u32 bio_offset = 0;
-	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		bool uptodate = !bio->bi_status;
 		struct page *page = bvec->bv_page;
-		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
 		u64 start;
@@ -742,17 +674,16 @@  static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 			}
 		}
 
-		/* Update page status and unlock. */
+		/* Update page status. */
 		end_page_read(page, uptodate, start, len);
-		endio_readpage_release_extent(&processed, BTRFS_I(inode),
-					      start, end, uptodate);
 
 		ASSERT(bio_offset + len > bio_offset);
 		bio_offset += len;
 
 	}
-	/* Release the last extent */
-	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
+	/* Unlock the extent io tree. */
+	unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
+		      file_offset + bio_offset - 1, NULL);
 	bio_put(bio);
 }