diff mbox series

[04/20] btrfs: always read the entire extent_buffer

Message ID 20230309090526.332550-5-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/20] btrfs: mark extent_buffer_under_io static | expand

Commit Message

Christoph Hellwig March 9, 2023, 9:05 a.m. UTC
Currently read_extent_buffer_pages skips pages that are already uptodate
when reading in an extent_buffer.  While this reduces the amount of data
read, it increases the number of I/O operations as we now need to do
multiple I/Os when reading an extent buffer with one or more uptodate
pages in the middle of it.  On any modern storage device, be that hard
drives or SSDs this actually decreases I/O performance.  Fortunately
this case is pretty rare as the pages are always initially read together
and then aged the same way.  Besides simplifying the code a bit as-is
this will allow for major simplifications to the I/O completion handler
later on.

Note that the case where all pages are uptodate is still handled by an
optimized fast path that does not read any data from disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Comments

Johannes Thumshirn March 9, 2023, 11:29 a.m. UTC | #1
On 09.03.23 10:07, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.

Can someone smarter than me explain why we need to iterate eb->pages four
times in this function? This doesn't look super efficient to me.

> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>  	 */
>  	for (i = 0; i < num_pages; i++) {
>  		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>  			all_uptodate = 0;
> -		}

I think we could also break out of the loop here. As soon as
one page isn't uptodate we don't care about the fast path anymore.
Christoph Hellwig March 9, 2023, 3:21 p.m. UTC | #2
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

We don't, and only a single iteration is left by the end of the series :)
Qu Wenruo March 10, 2023, 7:35 a.m. UTC | #3
On 2023/3/9 17:05, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.
>  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 302af9b01bda2a..26d8162bee000d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	int locked_pages = 0;
>   	int all_uptodate = 1;
>   	int num_pages;
> -	unsigned long num_reads = 0;
>   	struct btrfs_bio_ctrl bio_ctrl = {
>   		.opf = REQ_OP_READ,
>   		.mirror_num = mirror_num,
> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	 */
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>   			all_uptodate = 0;
> -		}
>   	}
>   
>   	if (all_uptodate) {
> @@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   
>   	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
>   	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, num_reads);
> +	atomic_set(&eb->io_pages, num_pages);
>   	/*
>   	 * It is possible for release_folio to clear the TREE_REF bit before we
>   	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
> @@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
>   
> -		if (!PageUptodate(page)) {
> -			ClearPageError(page);
> -			submit_extent_page(&bio_ctrl, page_offset(page), page,
> -					   PAGE_SIZE, 0);
> -		} else {
> -			unlock_page(page);
> -		}
> +		ClearPageError(page);
> +		submit_extent_page(&bio_ctrl, page_offset(page), page,
> +				   PAGE_SIZE, 0);
>   	}
>   
>   	submit_one_bio(&bio_ctrl);
Christoph Hellwig March 14, 2023, 6:09 a.m. UTC | #4
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> > -		if (!PageUptodate(page)) {
> > -			num_reads++;
> > +		if (!PageUptodate(page))
> >  			all_uptodate = 0;
> > -		}
> 
> I think we could also break out of the loop here. As soon as
> one page isn't uptodate we don't care about the fast path anymore.

FYI, I've decided to keep this as-is for the next version as the
code is gone by the end of the series anyway.
David Sterba March 17, 2023, 11:16 p.m. UTC | #5
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> On 09.03.23 10:07, Christoph Hellwig wrote:
> > Currently read_extent_buffer_pages skips pages that are already uptodate
> > when reading in an extent_buffer.  While this reduces the amount of data
> > read, it increases the number of I/O operations as we now need to do
> > multiple I/Os when reading an extent buffer with one or more uptodate
> > pages in the middle of it.  On any modern storage device, be that hard
> > drives or SSDs this actually decreases I/O performance.  Fortunately
> > this case is pretty rare as the pages are always initially read together
> > and then aged the same way.  Besides simplifying the code a bit as-is
> > this will allow for major simplifications to the I/O completion handler
> > later on.
> > 
> > Note that the case where all pages are uptodate is still handled by an
> > optimized fast path that does not read any data from disk.
> 
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

I remember one bug that was solved by splitting the first loop into two,
one locking all pages first and then checking the uptodate status, the
comment is explaining that.

2571e739677f ("Btrfs: fix memory leak in reading btree blocks")

As you can see in the changelog it's a bit convoluted race, the number
of loops should not matter if they're there for correctness reasons. I
haven't checked the final code but I doubt it's equivalent and may
introduce subtle bugs.
Christoph Hellwig March 20, 2023, 5:46 a.m. UTC | #6
On Sat, Mar 18, 2023 at 12:16:09AM +0100, David Sterba wrote:
> I remember one bug that was solved by splitting the first loop into two,
> one locking all pages first and then checking the uptodate status, the
> comment is explaining that.
> 
> 2571e739677f ("Btrfs: fix memory leak in reading btree blocks")
> 
> As you can see in the changelog it's a bit convoluted race, the number
> of loops should not matter if they're there for correctness reasons. I
> haven't checked the final code but I doubt it's equivalent and may
> introduce subtle bugs.

The patch we're replying to would have actually fixed that above
bug on it's own, and thus have also fixed the above issue.  The
problem it solves was that num_pages could be smaller than the
actual number of pages in the extent_buffer.  With this change,
we always read all pages and thus can't have a wrong ->io_pages.
In fact a later patch removes ->io_pages entirely.  Even better
at the end of the series the locking between different reads moves
to the extent_buffer level, so the above race is fundamentally
fixed at a higher level and there won't be an extra read at all.
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 302af9b01bda2a..26d8162bee000d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4313,7 +4313,6 @@  int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	int locked_pages = 0;
 	int all_uptodate = 1;
 	int num_pages;
-	unsigned long num_reads = 0;
 	struct btrfs_bio_ctrl bio_ctrl = {
 		.opf = REQ_OP_READ,
 		.mirror_num = mirror_num,
@@ -4359,10 +4358,8 @@  int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	 */
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		if (!PageUptodate(page)) {
-			num_reads++;
+		if (!PageUptodate(page))
 			all_uptodate = 0;
-		}
 	}
 
 	if (all_uptodate) {
@@ -4372,7 +4369,7 @@  int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 
 	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
 	eb->read_mirror = 0;
-	atomic_set(&eb->io_pages, num_reads);
+	atomic_set(&eb->io_pages, num_pages);
 	/*
 	 * It is possible for release_folio to clear the TREE_REF bit before we
 	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
@@ -4382,13 +4379,9 @@  int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 
-		if (!PageUptodate(page)) {
-			ClearPageError(page);
-			submit_extent_page(&bio_ctrl, page_offset(page), page,
-					   PAGE_SIZE, 0);
-		} else {
-			unlock_page(page);
-		}
+		ClearPageError(page);
+		submit_extent_page(&bio_ctrl, page_offset(page), page,
+				   PAGE_SIZE, 0);
 	}
 
 	submit_one_bio(&bio_ctrl);