diff mbox series

[v7,14/24] btrfs: Convert from readpages to readahead

Message ID 20200219210103.32400-15-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series Change readahead API | expand

Commit Message

Matthew Wilcox (Oracle) Feb. 19, 2020, 9 p.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use the new readahead operation in btrfs.  Add a
readahead_for_each_batch() iterator to optimise the loop in the XArray.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/btrfs/extent_io.c    | 46 +++++++++++++----------------------------
 fs/btrfs/extent_io.h    |  3 +--
 fs/btrfs/inode.c        | 16 +++++++-------
 include/linux/pagemap.h | 37 +++++++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+), 43 deletions(-)

Comments

Johannes Thumshirn Feb. 20, 2020, 9:42 a.m. UTC | #1
On 19/02/2020 22:03, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Use the new readahead operation in btrfs.  Add a
> readahead_for_each_batch() iterator to optimise the loop in the XArray.


OK I must admit I haven't followed this series closely, but what 
happened to said readahead_for_each_batch()?

As far as I can see it's now:

[...]
> +	while ((nr = readahead_page_batch(rac, pagepool))) {
Matthew Wilcox (Oracle) Feb. 20, 2020, 1:48 p.m. UTC | #2
On Thu, Feb 20, 2020 at 09:42:19AM +0000, Johannes Thumshirn wrote:
> On 19/02/2020 22:03, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Use the new readahead operation in btrfs.  Add a
> > readahead_for_each_batch() iterator to optimise the loop in the XArray.
> 
> OK I must admit I haven't followed this series closely, but what 
> happened to said readahead_for_each_batch()?
> 
> As far as I can see it's now:
> 
> [...]
> > +	while ((nr = readahead_page_batch(rac, pagepool))) {

Oops, forgot to update the changelog there.  Yes, that's exactly what it
changed to.  That discussion was here:

https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20200219144117.GP24185@bombadil.infradead.org/__;!!GqivPVa7Brio!NowpsY7jqCHi3nk-7KYRB6OMBhU9RUBOzzoIQZqqy0USKzVxxygQi3ltRqDxPjzhgEcEiw$ 

... and then Christoph pointed out the iterators weren't really adding
much value at that point, so they got deleted.  New changelog for
this patch:

btrfs: Convert from readpages to readahead
  
Implement the new readahead method in btrfs.  Add a readahead_page_batch()
to optimise fetching a batch of pages at once.
Christoph Hellwig Feb. 20, 2020, 3:46 p.m. UTC | #3
On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> btrfs: Convert from readpages to readahead
>   
> Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> to optimise fetching a batch of pages at once.

Shouldn't this readahead_page_batch heper go into a separate patch so
that it clearly stands out?
Matthew Wilcox (Oracle) Feb. 20, 2020, 3:54 p.m. UTC | #4
On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > btrfs: Convert from readpages to readahead
> >   
> > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > to optimise fetching a batch of pages at once.
> 
> Shouldn't this readahead_page_batch heper go into a separate patch so
> that it clearly stands out?

I'll move it into 'Put readahead pages in cache earlier' for v8 (the
same patch where we add readahead_page())
Christoph Hellwig Feb. 20, 2020, 3:57 p.m. UTC | #5
On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > btrfs: Convert from readpages to readahead
> > >   
> > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > to optimise fetching a batch of pages at once.
> > 
> > Shouldn't this readahead_page_batch heper go into a separate patch so
> > that it clearly stands out?
> 
> I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> same patch where we add readahead_page())

One argument for keeping it in a patch of its own is that btrfs appears
to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
so it might go away pretty soon and we could just revert the commit.

But this starts to get into really minor details, so I'll shut up now :)
Christoph Hellwig Feb. 24, 2020, 9:43 p.m. UTC | #6
On Thu, Feb 20, 2020 at 07:57:27AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > > btrfs: Convert from readpages to readahead
> > > >   
> > > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > > to optimise fetching a batch of pages at once.
> > > 
> > > Shouldn't this readahead_page_batch heper go into a separate patch so
> > > that it clearly stands out?
> > 
> > I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> > same patch where we add readahead_page())
> 
> One argument for keeping it in a patch of its own is that btrfs appears
> to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
> so it might go away pretty soon and we could just revert the commit.
> 
> But this starts to get into really minor details, so I'll shut up now :)

So looking at this again I have another comment and a question.

First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
dangerous, as it will do the wrong thing when passing a pointer or
function argument.

Second I wonder іf it would be worth to also switch to a batched
operation in iomap if the xarray overhead is high enough.  That should
be pretty trivial, but we don't really need to do it in this series.
Matthew Wilcox (Oracle) Feb. 24, 2020, 9:54 p.m. UTC | #7
On Mon, Feb 24, 2020 at 01:43:47PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 07:57:27AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 20, 2020 at 07:54:52AM -0800, Matthew Wilcox wrote:
> > > On Thu, Feb 20, 2020 at 07:46:58AM -0800, Christoph Hellwig wrote:
> > > > On Thu, Feb 20, 2020 at 05:48:49AM -0800, Matthew Wilcox wrote:
> > > > > btrfs: Convert from readpages to readahead
> > > > >   
> > > > > Implement the new readahead method in btrfs.  Add a readahead_page_batch()
> > > > > to optimise fetching a batch of pages at once.
> > > > 
> > > > Shouldn't this readahead_page_batch heper go into a separate patch so
> > > > that it clearly stands out?
> > > 
> > > I'll move it into 'Put readahead pages in cache earlier' for v8 (the
> > > same patch where we add readahead_page())
> > 
> > One argument for keeping it in a patch of its own is that btrfs appears
> > to be the only user, and Goldwyn has a WIP conversion of btrfs to iomap,
> > so it might go away pretty soon and we could just revert the commit.
> > 
> > But this starts to get into really minor details, so I'll shut up now :)
> 
> So looking at this again I have another comment and a question.
> 
> First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
> dangerous, as it will do the wrong thing when passing a pointer or
> function argument.

somebody already thought of that ;-)

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

> Second I wonder іf it would be worth to also switch to a batched
> operation in iomap if the xarray overhead is high enough.  That should
> be pretty trivial, but we don't really need to do it in this series.

I've also considered keeping a small array of pointers inside the
readahead_control so nobody needs to have a readahead_page_batch()
operation.  Even keeping 10 pointers in there will reduce the XArray
overhead by 90%.  But this fit the current btrfs model well, and it
lets us play with different approaches by abstracting everything away.
I'm sure this won't be the last patch that touches the readahead code ;-)
Christoph Hellwig Feb. 24, 2020, 9:57 p.m. UTC | #8
On Mon, Feb 24, 2020 at 01:54:14PM -0800, Matthew Wilcox wrote:
> > First I think the implicit ARRAY_SIZE in readahead_page_batch is highly
> > dangerous, as it will do the wrong thing when passing a pointer or
> > function argument.
> 
> somebody already thought of that ;-)
> 
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

Ok.  Still find it pretty weird to design a primary interface that
just works with an array type.
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c0f202741e09..e70f14c1de60 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4278,52 +4278,34 @@  int extent_writepages(struct address_space *mapping,
 	return ret;
 }
 
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages)
+void extent_readahead(struct readahead_control *rac)
 {
 	struct bio *bio = NULL;
 	unsigned long bio_flags = 0;
 	struct page *pagepool[16];
 	struct extent_map *em_cached = NULL;
-	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
-	int nr = 0;
+	struct extent_io_tree *tree = &BTRFS_I(rac->mapping->host)->io_tree;
 	u64 prev_em_start = (u64)-1;
+	int nr;
 
-	while (!list_empty(pages)) {
-		u64 contig_end = 0;
-
-		for (nr = 0; nr < ARRAY_SIZE(pagepool) && !list_empty(pages);) {
-			struct page *page = lru_to_page(pages);
-
-			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-						readahead_gfp_mask(mapping))) {
-				put_page(page);
-				break;
-			}
-
-			pagepool[nr++] = page;
-			contig_end = page_offset(page) + PAGE_SIZE - 1;
-		}
+	while ((nr = readahead_page_batch(rac, pagepool))) {
+		u64 contig_start = page_offset(pagepool[0]);
+		u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
 
-		if (nr) {
-			u64 contig_start = page_offset(pagepool[0]);
+		ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
 
-			ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
-
-			contiguous_readpages(tree, pagepool, nr, contig_start,
-				     contig_end, &em_cached, &bio, &bio_flags,
-				     &prev_em_start);
-		}
+		contiguous_readpages(tree, pagepool, nr, contig_start,
+				contig_end, &em_cached, &bio, &bio_flags,
+				&prev_em_start);
 	}
 
 	if (em_cached)
 		free_extent_map(em_cached);
 
-	if (bio)
-		return submit_one_bio(bio, 0, bio_flags);
-	return 0;
+	if (bio) {
+		if (submit_one_bio(bio, 0, bio_flags))
+			return;
+	}
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 5d205bbaafdc..bddac32948c7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -198,8 +198,7 @@  int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
 			    struct writeback_control *wbc);
-int extent_readpages(struct address_space *mapping, struct list_head *pages,
-		     unsigned nr_pages);
+void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		__u64 start, __u64 len);
 void set_page_extent_mapped(struct page *page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..61d5137ce4e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4802,8 +4802,8 @@  static void evict_inode_truncate_pages(struct inode *inode)
 
 	/*
 	 * Keep looping until we have no more ranges in the io tree.
-	 * We can have ongoing bios started by readpages (called from readahead)
-	 * that have their endio callback (extent_io.c:end_bio_extent_readpage)
+	 * We can have ongoing bios started by readahead that have
+	 * their endio callback (extent_io.c:end_bio_extent_readpage)
 	 * still in progress (unlocked the pages in the bio but did not yet
 	 * unlocked the ranges in the io tree). Therefore this means some
 	 * ranges can still be locked and eviction started because before
@@ -7004,11 +7004,11 @@  static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 			 * for it to complete) and then invalidate the pages for
 			 * this range (through invalidate_inode_pages2_range()),
 			 * but that can lead us to a deadlock with a concurrent
-			 * call to readpages() (a buffered read or a defrag call
+			 * call to readahead (a buffered read or a defrag call
 			 * triggered a readahead) on a page lock due to an
 			 * ordered dio extent we created before but did not have
 			 * yet a corresponding bio submitted (whence it can not
-			 * complete), which makes readpages() wait for that
+			 * complete), which makes readahead wait for that
 			 * ordered extent to complete while holding a lock on
 			 * that page.
 			 */
@@ -8247,11 +8247,9 @@  static int btrfs_writepages(struct address_space *mapping,
 	return extent_writepages(mapping, wbc);
 }
 
-static int
-btrfs_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
+static void btrfs_readahead(struct readahead_control *rac)
 {
-	return extent_readpages(mapping, pages, nr_pages);
+	extent_readahead(rac);
 }
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
@@ -10456,7 +10454,7 @@  static const struct address_space_operations btrfs_aops = {
 	.readpage	= btrfs_readpage,
 	.writepage	= btrfs_writepage,
 	.writepages	= btrfs_writepages,
-	.readpages	= btrfs_readpages,
+	.readahead	= btrfs_readahead,
 	.direct_IO	= btrfs_direct_IO,
 	.invalidatepage = btrfs_invalidatepage,
 	.releasepage	= btrfs_releasepage,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 60f9b8d4da6c..dadf6ed11e71 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -672,6 +672,43 @@  static inline struct page *readahead_page(struct readahead_control *rac)
 	return page;
 }
 
+static inline unsigned int __readahead_batch(struct readahead_control *rac,
+		struct page **array, unsigned int array_sz)
+{
+	unsigned int i = 0;
+	XA_STATE(xas, &rac->mapping->i_pages, rac->_index);
+	struct page *page;
+
+	BUG_ON(rac->_batch_count > rac->_nr_pages);
+	rac->_nr_pages -= rac->_batch_count;
+	rac->_index += rac->_batch_count;
+	rac->_batch_count = 0;
+
+	xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
+		VM_BUG_ON_PAGE(!PageLocked(page), page);
+		VM_BUG_ON_PAGE(PageTail(page), page);
+		array[i++] = page;
+		rac->_batch_count += hpage_nr_pages(page);
+
+		/*
+		 * The page cache isn't using multi-index entries yet,
+		 * so the xas cursor needs to be manually moved to the
+		 * next index.  This can be removed once the page cache
+		 * is converted.
+		 */
+		if (PageHead(page))
+			xas_set(&xas, rac->_index + rac->_batch_count);
+
+		if (i == array_sz)
+			break;
+	}
+
+	return i;
+}
+
+#define readahead_page_batch(rac, array)				\
+	__readahead_batch(rac, array, ARRAY_SIZE(array))
+
 /* The byte offset into the file of this readahead block */
 static inline loff_t readahead_pos(struct readahead_control *rac)
 {