[v6,11/19] btrfs: Convert from readpages to readahead

Message ID	20200217184613.19668-19-willy@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=n14h=4F=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B38F24654 From: Matthew Wilcox <willy@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, ocfs2-devel@oss.oracle.com, linux-xfs@vger.kernel.org Subject: [PATCH v6 11/19] btrfs: Convert from readpages to readahead Date: Mon, 17 Feb 2020 10:45:59 -0800 Message-Id: <20200217184613.19668-19-willy@infradead.org> In-Reply-To: <20200217184613.19668-1-willy@infradead.org> References: <20200217184613.19668-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Change readahead API \| expand [v6,00/19] Change readahead API [v6,01/19] mm: Return void from various readahead functions [v6,02/19] mm: Ignore return value of ->readpages [v6,03/19] mm: Use readahead_control to pass arguments [v6,04/16] mm: Tweak readahead loop slightly [v6,05/16] mm: Put readahead pages in cache earlier [v6,06/16] mm: Add readahead address space operation [v6,07/16] mm: Add page_cache_readahead_limit [v6,08/16] fs: Convert mpage_readpages to mpage_readahead [v6,09/16] btrfs: Convert from readpages to readahead [v6,10/16] erofs: Convert uncompressed files from readpages to readahead [v6,11/19] btrfs: Convert from readpages to readahead [v6,12/19] erofs: Convert uncompressed files from readpages to readahead [v6,13/19] erofs: Convert compressed files from readpages to readahead [v6,14/16] fuse: Convert from readpages to readahead [v6,15/19] f2fs: Convert from readpages to readahead [v6,16/16] mm: Use memalloc_nofs_save in readahead path [v6,17/19] iomap: Restructure iomap_readpages_actor [v6,18/19] iomap: Convert from readpages to readahead [v6,19/19] mm: Use memalloc_nofs_save in readahead path

Message ID

20200217184613.19668-19-willy@infradead.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B38F24654
From: Matthew Wilcox <willy@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	linux-btrfs@vger.kernel.org,
	linux-erofs@lists.ozlabs.org,
	linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	cluster-devel@redhat.com,
	ocfs2-devel@oss.oracle.com,
	linux-xfs@vger.kernel.org
Subject: [PATCH v6 11/19] btrfs: Convert from readpages to readahead
Date: Mon, 17 Feb 2020 10:45:59 -0800
Message-Id: <20200217184613.19668-19-willy@infradead.org>
In-Reply-To: <20200217184613.19668-1-willy@infradead.org>
References: <20200217184613.19668-1-willy@infradead.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Change readahead API | expand

Commit Message

Matthew Wilcox Feb. 17, 2020, 6:45 p.m. UTC

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in btrfs.  Add a
readahead_for_each_batch() iterator to optimise the loop in the XArray.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/btrfs/extent_io.c    | 46 +++++++++++++----------------------------
 fs/btrfs/extent_io.h    |  3 +--
 fs/btrfs/inode.c        | 16 +++++++-------
 include/linux/pagemap.h | 27 ++++++++++++++++++++++++
 4 files changed, 49 insertions(+), 43 deletions(-)

Comments

Dave Chinner Feb. 18, 2020, 6:57 a.m. UTC | #1

On Mon, Feb 17, 2020 at 10:45:59AM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new readahead operation in btrfs.  Add a
> readahead_for_each_batch() iterator to optimise the loop in the XArray.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/btrfs/extent_io.c    | 46 +++++++++++++----------------------------
>  fs/btrfs/extent_io.h    |  3 +--
>  fs/btrfs/inode.c        | 16 +++++++-------
>  include/linux/pagemap.h | 27 ++++++++++++++++++++++++
>  4 files changed, 49 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c0f202741e09..e97a6acd6f5d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4278,52 +4278,34 @@ int extent_writepages(struct address_space *mapping,
>  	return ret;
>  }
>  
> -int extent_readpages(struct address_space *mapping, struct list_head *pages,
> -		     unsigned nr_pages)
> +void extent_readahead(struct readahead_control *rac)
>  {
>  	struct bio *bio = NULL;
>  	unsigned long bio_flags = 0;
>  	struct page *pagepool[16];
>  	struct extent_map *em_cached = NULL;
> -	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
> -	int nr = 0;
> +	struct extent_io_tree *tree = &BTRFS_I(rac->mapping->host)->io_tree;
>  	u64 prev_em_start = (u64)-1;
> +	int nr;
>  
> -	while (!list_empty(pages)) {
> -		u64 contig_end = 0;
> -
> -		for (nr = 0; nr < ARRAY_SIZE(pagepool) && !list_empty(pages);) {
> -			struct page *page = lru_to_page(pages);
> -
> -			prefetchw(&page->flags);
> -			list_del(&page->lru);
> -			if (add_to_page_cache_lru(page, mapping, page->index,
> -						readahead_gfp_mask(mapping))) {
> -				put_page(page);
> -				break;
> -			}
> -
> -			pagepool[nr++] = page;
> -			contig_end = page_offset(page) + PAGE_SIZE - 1;
> -		}
> +	readahead_for_each_batch(rac, pagepool, ARRAY_SIZE(pagepool), nr) {
> +		u64 contig_start = page_offset(pagepool[0]);
> +		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;

So this assumes a contiguous page range is returned, right?

>  
> -		if (nr) {
> -			u64 contig_start = page_offset(pagepool[0]);
> +		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);

Ok, yes it does. :)

I don't see how readahead_for_each_batch() guarantees that, though.

>  
> -			ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> -
> -			contiguous_readpages(tree, pagepool, nr, contig_start,
> -				     contig_end, &em_cached, &bio, &bio_flags,
> -				     &prev_em_start);
> -		}
> +		contiguous_readpages(tree, pagepool, nr, contig_start,
> +				contig_end, &em_cached, &bio, &bio_flags,
> +				&prev_em_start);
>  	}
>  
>  	if (em_cached)
>  		free_extent_map(em_cached);
>  
> -	if (bio)
> -		return submit_one_bio(bio, 0, bio_flags);
> -	return 0;
> +	if (bio) {
> +		if (submit_one_bio(bio, 0, bio_flags))
> +			return;
> +	}
>  }

Shouldn't that just be

	if (bio)
		submit_one_bio(bio, 0, bio_flags);

> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 4f36c06d064d..1bbb60a0bf16 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -669,6 +669,33 @@ static inline void readahead_next(struct readahead_control *rac)
>  #define readahead_for_each(rac, page)					\
>  	for (; (page = readahead_page(rac)); readahead_next(rac))
>  
> +static inline unsigned int readahead_page_batch(struct readahead_control *rac,
> +		struct page **array, unsigned int size)
> +{
> +	unsigned int batch = 0;

Confusing when put alongside rac->_batch_count counting the number
of pages in the batch, and "batch" being the index into the page
array, and they aren't the same counts....

> +	XA_STATE(xas, &rac->mapping->i_pages, rac->_start);
> +	struct page *page;
> +
> +	rac->_batch_count = 0;
> +	xas_for_each(&xas, page, rac->_start + rac->_nr_pages - 1) {

That just iterates pages in the start,end doesn't it? What
guarantees that this fills the array with a contiguous page range?

> +		VM_BUG_ON_PAGE(!PageLocked(page), page);
> +		VM_BUG_ON_PAGE(PageTail(page), page);
> +		array[batch++] = page;
> +		rac->_batch_count += hpage_nr_pages(page);
> +		if (PageHead(page))
> +			xas_set(&xas, rac->_start + rac->_batch_count);

What on earth does this do? Comments please!

> +
> +		if (batch == size)
> +			break;
> +	}
> +
> +	return batch;
> +}

Seems a bit big for an inline function.

> +
> +#define readahead_for_each_batch(rac, array, size, nr)			\
> +	for (; (nr = readahead_page_batch(rac, array, size));		\
> +			readahead_next(rac))

I had to go look at the caller to work out what "size" refered to
here.

This is complex enough that it needs proper API documentation.

Cheers,

Dave.

Matthew Wilcox Feb. 18, 2020, 9:12 p.m. UTC | #2

On Tue, Feb 18, 2020 at 05:57:58PM +1100, Dave Chinner wrote:
> On Mon, Feb 17, 2020 at 10:45:59AM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Use the new readahead operation in btrfs.  Add a
> > readahead_for_each_batch() iterator to optimise the loop in the XArray.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/btrfs/extent_io.c    | 46 +++++++++++++----------------------------
> >  fs/btrfs/extent_io.h    |  3 +--
> >  fs/btrfs/inode.c        | 16 +++++++-------
> >  include/linux/pagemap.h | 27 ++++++++++++++++++++++++
> >  4 files changed, 49 insertions(+), 43 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index c0f202741e09..e97a6acd6f5d 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4278,52 +4278,34 @@ int extent_writepages(struct address_space *mapping,
> >  	return ret;
> >  }
> >  
> > -int extent_readpages(struct address_space *mapping, struct list_head *pages,
> > -		     unsigned nr_pages)
> > +void extent_readahead(struct readahead_control *rac)
> >  {
> >  	struct bio *bio = NULL;
> >  	unsigned long bio_flags = 0;
> >  	struct page *pagepool[16];
> >  	struct extent_map *em_cached = NULL;
> > -	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
> > -	int nr = 0;
> > +	struct extent_io_tree *tree = &BTRFS_I(rac->mapping->host)->io_tree;
> >  	u64 prev_em_start = (u64)-1;
> > +	int nr;
> >  
> > -	while (!list_empty(pages)) {
> > -		u64 contig_end = 0;
> > -
> > -		for (nr = 0; nr < ARRAY_SIZE(pagepool) && !list_empty(pages);) {
> > -			struct page *page = lru_to_page(pages);
> > -
> > -			prefetchw(&page->flags);
> > -			list_del(&page->lru);
> > -			if (add_to_page_cache_lru(page, mapping, page->index,
> > -						readahead_gfp_mask(mapping))) {
> > -				put_page(page);
> > -				break;
> > -			}
> > -
> > -			pagepool[nr++] = page;
> > -			contig_end = page_offset(page) + PAGE_SIZE - 1;
> > -		}
> > +	readahead_for_each_batch(rac, pagepool, ARRAY_SIZE(pagepool), nr) {
> > +		u64 contig_start = page_offset(pagepool[0]);
> > +		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
> 
> So this assumes a contiguous page range is returned, right?

Yes.  That's documented in the readahead API and is the behaviour of
the code.  I mean, btrfs asserts it's true while most of the rest of
the kernel is indifferent to it, but it's the documented and actual
behaviour.

> >  
> > -		if (nr) {
> > -			u64 contig_start = page_offset(pagepool[0]);
> > +		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> 
> Ok, yes it does. :)
> 
> I don't see how readahead_for_each_batch() guarantees that, though.

I ... don't see how it doesn't?  We start at rac->_start and iterate
through the consecutive pages in the page cache.  readahead_for_each_batch()
does assume that __do_page_cache_readahead() has its current behaviour
of putting the pages in the page cache in order, and kicks off a new
call to ->readahead() every time it has to skip an index for whatever
reason (eg page already in page cache).

> > -	if (bio)
> > -		return submit_one_bio(bio, 0, bio_flags);
> > -	return 0;
> > +	if (bio) {
> > +		if (submit_one_bio(bio, 0, bio_flags))
> > +			return;
> > +	}
> >  }
> 
> Shouldn't that just be
> 
> 	if (bio)
> 		submit_one_bio(bio, 0, bio_flags);

It should, but some overzealous person decided to mark submit_one_bio()
as __must_check, so I have to work around that.

> > +static inline unsigned int readahead_page_batch(struct readahead_control *rac,
> > +		struct page **array, unsigned int size)
> > +{
> > +	unsigned int batch = 0;
> 
> Confusing when put alongside rac->_batch_count counting the number
> of pages in the batch, and "batch" being the index into the page
> array, and they aren't the same counts....

Yes.  Renamed to 'i'.

> > +	XA_STATE(xas, &rac->mapping->i_pages, rac->_start);
> > +	struct page *page;
> > +
> > +	rac->_batch_count = 0;
> > +	xas_for_each(&xas, page, rac->_start + rac->_nr_pages - 1) {
> 
> That just iterates pages in the start,end doesn't it? What
> guarantees that this fills the array with a contiguous page range?

The behaviour of __do_page_cache_readahead().  Dave Howells also has a
usecase for xas_for_each_contig(), so I'm going to add that soon.

> > +		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > +		VM_BUG_ON_PAGE(PageTail(page), page);
> > +		array[batch++] = page;
> > +		rac->_batch_count += hpage_nr_pages(page);
> > +		if (PageHead(page))
> > +			xas_set(&xas, rac->_start + rac->_batch_count);
> 
> What on earth does this do? Comments please!

		/*
		 * The page cache isn't using multi-index entries yet,
		 * so xas_for_each() won't do the right thing for
		 * large pages.  This can be removed once the page cache
		 * is converted.
		 */

> > +
> > +		if (batch == size)
> > +			break;
> > +	}
> > +
> > +	return batch;
> > +}
> 
> Seems a bit big for an inline function.

It's only called by btrfs at the moment.  If it gets more than one caller,
then sure, let's move it out of line.

> > +
> > +#define readahead_for_each_batch(rac, array, size, nr)			\
> > +	for (; (nr = readahead_page_batch(rac, array, size));		\
> > +			readahead_next(rac))
> 
> I had to go look at the caller to work out what "size" refered to
> here.
> 
> This is complex enough that it needs proper API documentation.

How about just:

-#define readahead_for_each_batch(rac, array, size, nr)                 \
-       for (; (nr = readahead_page_batch(rac, array, size));           \
+#define readahead_for_each_batch(rac, array, array_sz, nr)             \
+       for (; (nr = readahead_page_batch(rac, array, array_sz));       \

(corresponding rename in readahead_page_batch).  I mean, we could also
do:

#define readahead_for_each_batch(rac, array, nr)			\
	for (; (nr = readahead_page_batch(rac, array, ARRAY_SIZE(array)); \
			readahead_next(rac))

making it less flexible, but easier to use.

Dave Chinner Feb. 19, 2020, 1:23 a.m. UTC | #3

On Tue, Feb 18, 2020 at 01:12:28PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 05:57:58PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:59AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
....
> > >  
> > > -		if (nr) {
> > > -			u64 contig_start = page_offset(pagepool[0]);
> > > +		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> > 
> > Ok, yes it does. :)
> > 
> > I don't see how readahead_for_each_batch() guarantees that, though.
> 
> I ... don't see how it doesn't?  We start at rac->_start and iterate
> through the consecutive pages in the page cache.  readahead_for_each_batch()
> does assume that __do_page_cache_readahead() has its current behaviour
> of putting the pages in the page cache in order, and kicks off a new
> call to ->readahead() every time it has to skip an index for whatever
> reason (eg page already in page cache).

And there is the comment I was looking for while reading
readahead_for_each_batch() :)

> 
> > > -	if (bio)
> > > -		return submit_one_bio(bio, 0, bio_flags);
> > > -	return 0;
> > > +	if (bio) {
> > > +		if (submit_one_bio(bio, 0, bio_flags))
> > > +			return;
> > > +	}
> > >  }
> > 
> > Shouldn't that just be
> > 
> > 	if (bio)
> > 		submit_one_bio(bio, 0, bio_flags);
> 
> It should, but some overzealous person decided to mark submit_one_bio()
> as __must_check, so I have to work around that.

/me looks at code

Ngggh.

I rather dislike functions that are named in a way that look like
they belong to core kernel APIs but in reality are local static
functions.

I'd ask for this to be fixed if it was generic code, but it's btrfs
specific code so they can deal with the ugliness of their own
creation. :/

> > Confusing when put alongside rac->_batch_count counting the number
> > of pages in the batch, and "batch" being the index into the page
> > array, and they aren't the same counts....
> 
> Yes.  Renamed to 'i'.
> 
> > > +	XA_STATE(xas, &rac->mapping->i_pages, rac->_start);
> > > +	struct page *page;
> > > +
> > > +	rac->_batch_count = 0;
> > > +	xas_for_each(&xas, page, rac->_start + rac->_nr_pages - 1) {
> > 
> > That just iterates pages in the start,end doesn't it? What
> > guarantees that this fills the array with a contiguous page range?
> 
> The behaviour of __do_page_cache_readahead().  Dave Howells also has a
> usecase for xas_for_each_contig(), so I'm going to add that soon.
> 
> > > +		VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > +		VM_BUG_ON_PAGE(PageTail(page), page);
> > > +		array[batch++] = page;
> > > +		rac->_batch_count += hpage_nr_pages(page);
> > > +		if (PageHead(page))
> > > +			xas_set(&xas, rac->_start + rac->_batch_count);
> > 
> > What on earth does this do? Comments please!
> 
> 		/*
> 		 * The page cache isn't using multi-index entries yet,
> 		 * so xas_for_each() won't do the right thing for
> 		 * large pages.  This can be removed once the page cache
> 		 * is converted.
> 		 */

Oh, it's changing the internal xarray lookup cursor position to
point at the correct next page index? Perhaps it's better to say
that instead of "won't do the right thing"?

> > > +#define readahead_for_each_batch(rac, array, size, nr)			\
> > > +	for (; (nr = readahead_page_batch(rac, array, size));		\
> > > +			readahead_next(rac))
> > 
> > I had to go look at the caller to work out what "size" refered to
> > here.
> > 
> > This is complex enough that it needs proper API documentation.
> 
> How about just:
> 
> -#define readahead_for_each_batch(rac, array, size, nr)                 \
> -       for (; (nr = readahead_page_batch(rac, array, size));           \
> +#define readahead_for_each_batch(rac, array, array_sz, nr)             \
> +       for (; (nr = readahead_page_batch(rac, array, array_sz));       \

Yup, that's fine - now the macro documents itself.

Cheers,

Dave.

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c0f202741e09..e97a6acd6f5d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4278,52 +4278,34 @@  int extent_writepages(struct address_space *mapping,
 	return ret;
 }
 
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages)
+void extent_readahead(struct readahead_control *rac)
 {
 	struct bio *bio = NULL;
 	unsigned long bio_flags = 0;
 	struct page *pagepool[16];
 	struct extent_map *em_cached = NULL;
-	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
-	int nr = 0;
+	struct extent_io_tree *tree = &BTRFS_I(rac->mapping->host)->io_tree;
 	u64 prev_em_start = (u64)-1;
+	int nr;
 
-	while (!list_empty(pages)) {
-		u64 contig_end = 0;
-
-		for (nr = 0; nr < ARRAY_SIZE(pagepool) && !list_empty(pages);) {
-			struct page *page = lru_to_page(pages);
-
-			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-						readahead_gfp_mask(mapping))) {
-				put_page(page);
-				break;
-			}
-
-			pagepool[nr++] = page;
-			contig_end = page_offset(page) + PAGE_SIZE - 1;
-		}
+	readahead_for_each_batch(rac, pagepool, ARRAY_SIZE(pagepool), nr) {
+		u64 contig_start = page_offset(pagepool[0]);
+		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
 
-		if (nr) {
-			u64 contig_start = page_offset(pagepool[0]);
+		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
 
-			ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
-
-			contiguous_readpages(tree, pagepool, nr, contig_start,
-				     contig_end, &em_cached, &bio, &bio_flags,
-				     &prev_em_start);
-		}
+		contiguous_readpages(tree, pagepool, nr, contig_start,
+				contig_end, &em_cached, &bio, &bio_flags,
+				&prev_em_start);
 	}
 
 	if (em_cached)
 		free_extent_map(em_cached);
 
-	if (bio)
-		return submit_one_bio(bio, 0, bio_flags);
-	return 0;
+	if (bio) {
+		if (submit_one_bio(bio, 0, bio_flags))
+			return;
+	}
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 5d205bbaafdc..bddac32948c7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -198,8 +198,7 @@  int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
 			    struct writeback_control *wbc);
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages);
+void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		__u64 start, __u64 len);
 void set_page_extent_mapped(struct page *page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..61d5137ce4e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4802,8 +4802,8 @@  static void evict_inode_truncate_pages(struct inode *inode)
 
 	/*
 	 * Keep looping until we have no more ranges in the io tree.
-	 * We can have ongoing bios started by readpages (called from readahead)
-	 * that have their endio callback (extent_io.c:end_bio_extent_readpage)
+	 * We can have ongoing bios started by readahead that have
+	 * their endio callback (extent_io.c:end_bio_extent_readpage)
 	 * still in progress (unlocked the pages in the bio but did not yet
 	 * unlocked the ranges in the io tree). Therefore this means some
 	 * ranges can still be locked and eviction started because before
@@ -7004,11 +7004,11 @@  static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 			 * for it to complete) and then invalidate the pages for
 			 * this range (through invalidate_inode_pages2_range()),
 			 * but that can lead us to a deadlock with a concurrent
-			 * call to readpages() (a buffered read or a defrag call
+			 * call to readahead (a buffered read or a defrag call
 			 * triggered a readahead) on a page lock due to an
 			 * ordered dio extent we created before but did not have
 			 * yet a corresponding bio submitted (whence it can not
-			 * complete), which makes readpages() wait for that
+			 * complete), which makes readahead wait for that
 			 * ordered extent to complete while holding a lock on
 			 * that page.
 			 */
@@ -8247,11 +8247,9 @@  static int btrfs_writepages(struct address_space *mapping,
 	return extent_writepages(mapping, wbc);
 }
 
-static int
-btrfs_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void btrfs_readahead(struct readahead_control *rac)
 {
-	return extent_readpages(mapping, pages, nr_pages);
+	extent_readahead(rac);
 }
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
@@ -10456,7 +10454,7 @@  static const struct address_space_operations btrfs_aops = {
 	.readpage	= btrfs_readpage,
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
-	.readpages	= btrfs_readpages,
+	.readahead	= btrfs_readahead,
 	.direct_IO	= btrfs_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4f36c06d064d..1bbb60a0bf16 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -669,6 +669,33 @@  static inline void readahead_next(struct readahead_control *rac)
 #define readahead_for_each(rac, page)					\
 	for (; (page = readahead_page(rac)); readahead_next(rac))
 
+static inline unsigned int readahead_page_batch(struct readahead_control *rac,
+		struct page **array, unsigned int size)
+{
+	unsigned int batch = 0;
+	XA_STATE(xas, &rac->mapping->i_pages, rac->_start);
+	struct page *page;
+
+	rac->_batch_count = 0;
+	xas_for_each(&xas, page, rac->_start + rac->_nr_pages - 1) {
+		VM_BUG_ON_PAGE(!PageLocked(page), page);
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		array[batch++] = page;
+		rac->_batch_count += hpage_nr_pages(page);
+		if (PageHead(page))
+			xas_set(&xas, rac->_start + rac->_batch_count);
+
+		if (batch == size)
+			break;
+	}
+
+	return batch;
+}
+
+#define readahead_for_each_batch(rac, array, size, nr)			\
+	for (; (nr = readahead_page_batch(rac, array, size));		\
+			readahead_next(rac))
+
 /* The byte offset into the file of this readahead block */
 static inline loff_t readahead_offset(struct readahead_control *rac)
 {

[v6,11/19] btrfs: Convert from readpages to readahead

Commit Message

Comments

Patch