Message ID | 5b3edb0ed26aa790fa92d0319739adfd71b3b2f5.1685411033.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: small cleanups mostly for subpage cases | expand |
I recently posted an equivalent patch: https://www.spinics.net/lists/linux-btrfs/msg134188.html but it turns out this actually breaks with compression enabled, where due to the ompression we can get gaps in the logical addresses. IIRC you had to do an auto run with -o compress to catch it.
On 2023/5/30 13:43, Christoph Hellwig wrote: > I recently posted an equivalent patch: > > https://www.spinics.net/lists/linux-btrfs/msg134188.html > > but it turns out this actually breaks with compression enabled, where > due to the ompression we can get gaps in the logical addresses. IIRC > you had to do an auto run with -o compress to catch it. > Great thanks for the mentioning on the existing patch. However I'm still not sure why compression can be broken. Yes, data read would also use the same endio hook for both compressed and regular reads. But that endio hook is always handling file pages, thus the range it's handling should always be continuous. Or did I miss something? Thanks, Qu
On 2023/5/30 14:19, Qu Wenruo wrote: > > > On 2023/5/30 13:43, Christoph Hellwig wrote: >> I recently posted an equivalent patch: >> >> https://www.spinics.net/lists/linux-btrfs/msg134188.html >> >> but it turns out this actually breaks with compression enabled, where >> due to the ompression we can get gaps in the logical addresses. IIRC >> you had to do an auto run with -o compress to catch it. >> > Great thanks for the mentioning on the existing patch. > > However I'm still not sure why compression can be broken. > > Yes, data read would also use the same endio hook for both compressed > and regular reads. > > But that endio hook is always handling file pages, thus the range it's > handling should always be continuous. > Or did I miss something? All my bad, indeed we can submit a btrfs_bio that contains two or more ranges which are not continuous, for compressed read. For compressed read, we always use the the on-disk bytenr as the @disk_bytenr for submit_extent_page(). Which means, if we have the following file layout, we will submit just one btrfs_bio: item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 7 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 32768 extent compression 1 (zlib) item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53 generation 8 type 1 (regular) extent data disk byte 0 nr 0 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53 generation 7 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 8192 nr 24576 ram 32768 extent compression 1 (zlib) When reading the whole file, at file range [0, 4k), we will use 13631488 as @disk_bytenr, but at this stage we don't submit the btrfs_bio yet. Then we hit the whole, just zeroing the pages. Finally we hit the range [8K, 32K), we still use 13631488 as @disk_bytenr, and since we have a btrfs_bio with the same disk_bytenr and it's compressed read, we merge it. This results the btrfs_bio containing two ranges, [0, 4K) and [8K 32K). The objective looks like an optimization, if we submit two btrfs_bio for the two ranges, we will read the compressed extents twice, and do de-compression twice. Although this is anti-instinct, it at least has a valid reason. But I can argue that, this only works for holes, as if the range [4K, 8K) is not a hole, then we still need to submit two different btrfs_bio for [0, 4K) and [8K, 32K). Maybe it's time for us to determine whether the behavior is worthy. Thanks, Qu > > Thanks, > Qu
On Wed, May 31, 2023 at 07:35:44AM +0800, Qu Wenruo wrote: > The objective looks like an optimization, if we submit two btrfs_bio for > the two ranges, we will read the compressed extents twice, and do > de-compression twice. > > Although this is anti-instinct, it at least has a valid reason. > > But I can argue that, this only works for holes, as if the range [4K, > 8K) is not a hole, then we still need to submit two different btrfs_bio > for [0, 4K) and [8K, 32K). > > Maybe it's time for us to determine whether the behavior is worthy. To me this behaviour looks reaѕonable and worthwhile, but a comment in end_bio_extent_readpage would probably have saved both of us a fair amount of time..
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2d228cc8b401..4f5d26194768 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -581,75 +581,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio) bio_put(bio); } -/* - * Record previously processed extent range - * - * For endio_readpage_release_extent() to handle a full extent range, reducing - * the extent io operations. - */ -struct processed_extent { - struct btrfs_inode *inode; - /* Start of the range in @inode */ - u64 start; - /* End of the range in @inode */ - u64 end; - bool uptodate; -}; - -/* - * Try to release processed extent range - * - * May not release the extent range right now if the current range is - * contiguous to processed extent. - * - * Will release processed extent when any of @inode, @uptodate, the range is - * no longer contiguous to the processed range. - * - * Passing @inode == NULL will force processed extent to be released. - */ -static void endio_readpage_release_extent(struct processed_extent *processed, - struct btrfs_inode *inode, u64 start, u64 end, - bool uptodate) -{ - struct extent_state *cached = NULL; - struct extent_io_tree *tree; - - /* The first extent, initialize @processed */ - if (!processed->inode) - goto update; - - /* - * Contiguous to processed extent, just uptodate the end. - * - * Several things to notice: - * - * - bio can be merged as long as on-disk bytenr is contiguous - * This means we can have page belonging to other inodes, thus need to - * check if the inode still matches. - * - bvec can contain range beyond current page for multi-page bvec - * Thus we need to do processed->end + 1 >= start check - */ - if (processed->inode == inode && processed->uptodate == uptodate && - processed->end + 1 >= start && end >= processed->end) { - processed->end = end; - return; - } - - tree = &processed->inode->io_tree; - /* - * Now we don't have range contiguous to the processed range, release - * the processed range now. - */ - unlock_extent(tree, processed->start, processed->end, &cached); - -update: - /* Update processed to current range */ - processed->inode = inode; - processed->start = start; - processed->end = end; - processed->uptodate = uptodate; -} - static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page) { ASSERT(PageLocked(page)); @@ -674,20 +605,21 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page) static void end_bio_extent_readpage(struct btrfs_bio *bbio) { struct bio *bio = &bbio->bio; + struct inode *inode = bio_first_page_all(bio)->mapping->host; struct bio_vec *bvec; - struct processed_extent processed = { 0 }; + struct bvec_iter_all iter_all; + bool uptodate = !bio->bi_status; + u64 file_offset = page_offset(bio_first_page_all(bio)) + + bio_first_bvec_all(bio)->bv_offset; /* * The offset to the beginning of a bio, since one bio can never be * larger than UINT_MAX, u32 here is enough. */ u32 bio_offset = 0; - struct bvec_iter_all iter_all; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { - bool uptodate = !bio->bi_status; struct page *page = bvec->bv_page; - struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; u64 start; @@ -742,17 +674,16 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) } } - /* Update page status and unlock. */ + /* Update page status. */ end_page_read(page, uptodate, start, len); - endio_readpage_release_extent(&processed, BTRFS_I(inode), - start, end, uptodate); ASSERT(bio_offset + len > bio_offset); bio_offset += len; } - /* Release the last extent */ - endio_readpage_release_extent(&processed, NULL, 0, 0, false); + /* Unlock the extent io tree. */ + unlock_extent(&BTRFS_I(inode)->io_tree, file_offset, + file_offset + bio_offset - 1, NULL); bio_put(bio); }
The structure processed_extent and the helper endio_readpage_release_extent() are used to reduce the number of calls of unlock_extent() during end_bio_extent_readpage() This is done by merging the range and only call the function either the status (uptodate or not) changes or the range is no longer continuous. However the behavior has been changed: - The range is always continuous Since it's the endio function of a btrfs bio, it's ensured the range is always continuous inside the same file. - The uptodate status is now per-bio (aka, will not change) Since commit 7609afac6775 ("btrfs: handle checksum validation and repair at the storage layer"), the function end_bio_extent_readpage() no longer handles the metadata/data verification. This means the @uptodate variable will not change during the function end_bio_extent_readpage() Thus there is no longer the need for processed_extent and the helper endio_readpage_release_extent(). Just call unlock_extent() at the end of end_bio_extent_readpage(). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 87 +++++--------------------------------------- 1 file changed, 9 insertions(+), 78 deletions(-)