diff mbox series

[14/14] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

Message ID 164928630577.457102.8519251179327601178.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series cifs: Iterators, netfslib and folios | expand

Commit Message

David Howells April 6, 2022, 11:05 p.m. UTC
Fscache has an optimisation by which reads from the cache are skipped until
we know that (a) there's data there to be read and (b) that data isn't
entirely covered by pages resident in the netfs pagecache.  This is done
with two flags manipulated by fscache_note_page_release():

	if (...
	    test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
	    test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);

where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
that netfslib should download from the server or clear the page instead.

The fscache_note_page_release() function is intended to be called from
->releasepage() - but that only gets called if PG_private or PG_private_2
is set - and currently the former is at the discretion of the netfs and the
latter is only set whilst a page is being written to the cache, so
sometimes we miss clearing the optimisation.

Fix this by adding an extra address_space operation, ->removing folio(),
and flag, AS_NOTIFY_REMOVING_FOLIO.  The operation is called if the flag is
set when a folio is removed from the pagecache.  The flag should be set if
a non-NULL cookie is obtained from fscache and cleared in ->evict_inode()
before truncate_inode_pages_final() is called.

This can be tested by reading some data and then dropping caches.  The
number of times the new op is called is counted in /proc/fs/fscache/stats:

	RdHelp : DR=0 RA=4100 RP=0 WB=0 WBZ=0 rm=131072 <----

Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cifs@vger.kernel.org
---

 fs/afs/file.c           |    1 +
 fs/afs/inode.c          |    1 +
 fs/afs/internal.h       |    2 ++
 fs/cifs/cifsfs.c        |    1 +
 fs/cifs/file.c          |    1 +
 fs/cifs/fscache.c       |    3 +++
 fs/netfs/internal.h     |    1 +
 fs/netfs/misc.c         |   13 ++++++++++++-
 fs/netfs/stats.c        |    9 ++++++---
 include/linux/fs.h      |    2 ++
 include/linux/netfs.h   |    1 +
 include/linux/pagemap.h |    1 +
 mm/filemap.c            |   15 +++++++++++++++
 13 files changed, 47 insertions(+), 4 deletions(-)

Comments

Matthew Wilcox April 7, 2022, 3:13 a.m. UTC | #1
On Thu, Apr 07, 2022 at 12:05:05AM +0100, David Howells wrote:
> Fix this by adding an extra address_space operation, ->removing folio(),
> and flag, AS_NOTIFY_REMOVING_FOLIO.  The operation is called if the flag is
> set when a folio is removed from the pagecache.  The flag should be set if
> a non-NULL cookie is obtained from fscache and cleared in ->evict_inode()
> before truncate_inode_pages_final() is called.

What's wrong with ->freepage?
David Howells April 7, 2022, 6:41 a.m. UTC | #2
Matthew Wilcox <willy@infradead.org> wrote:

> On Thu, Apr 07, 2022 at 12:05:05AM +0100, David Howells wrote:
> > Fix this by adding an extra address_space operation, ->removing folio(),
> > and flag, AS_NOTIFY_REMOVING_FOLIO.  The operation is called if the flag is
> > set when a folio is removed from the pagecache.  The flag should be set if
> > a non-NULL cookie is obtained from fscache and cleared in ->evict_inode()
> > before truncate_inode_pages_final() is called.
> 
> What's wrong with ->freepage?

It's too late.  The optimisation must be cancelled before there's a chance
that a new page can be allocated and attached to the pagecache - but
->freepage() is called after the folio has been removed.  Doing it in
->freepage() would allow ->readahead(), ->readpage() or ->write_begin() to
jump in and start a new read (which gets skipped because the optimisation is
still in play).

Another possibility could be that the FSCACHE_COOKIE_HAVE_DATA and
FSCACHE_COOKIE_NO_DATA_TO_READ flags could be moved from cookie->flags to
mapping->flags and the VM could do the twiddling itself (no aop required) -
except that fscache can't currently then find them (maybe use an aop for
that?).

David
Matthew Wilcox April 7, 2022, 9:22 p.m. UTC | #3
On Thu, Apr 07, 2022 at 07:41:47AM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Thu, Apr 07, 2022 at 12:05:05AM +0100, David Howells wrote:
> > > Fix this by adding an extra address_space operation, ->removing folio(),
> > > and flag, AS_NOTIFY_REMOVING_FOLIO.  The operation is called if the flag is
> > > set when a folio is removed from the pagecache.  The flag should be set if
> > > a non-NULL cookie is obtained from fscache and cleared in ->evict_inode()
> > > before truncate_inode_pages_final() is called.
> > 
> > What's wrong with ->freepage?
> 
> It's too late.  The optimisation must be cancelled before there's a chance
> that a new page can be allocated and attached to the pagecache - but
> ->freepage() is called after the folio has been removed.  Doing it in
> ->freepage() would allow ->readahead(), ->readpage() or ->write_begin() to
> jump in and start a new read (which gets skipped because the optimisation is
> still in play).

OK.  You suggested that releasepage was an acceptable place to call it.
How about we have AS_RELEASE_ALL (... or something ...) and then
page_has_private() becomes a bit more complicated ... to the point
where we should probably get rid of it (by embedding it into
filemap_release_folio():

+++ b/mm/filemap.c
@@ -3981,6 +3981,9 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
        struct address_space * const mapping = folio->mapping;

        BUG_ON(!folio_test_locked(folio));
+       if (!mapping_needs_release(mapping) && !folio_test_private(folio) &&
+           !folio_test_private_2(folio))
+               return false;
        if (folio_test_writeback(folio))
                return false;
David Howells April 25, 2022, 12:07 p.m. UTC | #4
Matthew Wilcox <willy@infradead.org> wrote:

> OK.  You suggested that releasepage was an acceptable place to call it.
> How about we have AS_RELEASE_ALL (... or something ...) and then
> page_has_private() becomes a bit more complicated ... to the point
> where we should probably get rid of it (by embedding it into
> filemap_release_folio():

I'm not sure page_has_private() is quite so easy to get rid of.
shrink_page_list() and collapse_file(), for example, use it to conditionalise
a call to try_to_release_page() plus some other bits.

I think that, for the moment, I would need to add a check for AS_RELEASE_ALL
to page_has_private().

David
Matthew Wilcox April 25, 2022, 12:30 p.m. UTC | #5
On Mon, Apr 25, 2022 at 01:07:41PM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > OK.  You suggested that releasepage was an acceptable place to call it.
> > How about we have AS_RELEASE_ALL (... or something ...) and then
> > page_has_private() becomes a bit more complicated ... to the point
> > where we should probably get rid of it (by embedding it into
> > filemap_release_folio():
> 
> I'm not sure page_has_private() is quite so easy to get rid of.
> shrink_page_list() and collapse_file(), for example, use it to conditionalise
> a call to try_to_release_page() plus some other bits.

That's what I was saying.  Make the calls to try_to_release_page()
unconditional and delete page_has_private() because it only confuses
people who should actually be using PagePrivate().

> I think that, for the moment, I would need to add a check for AS_RELEASE_ALL
> to page_has_private().
> 
> David
> 
>
David Wysochanski Sept. 2, 2022, 12:30 a.m. UTC | #6
On Mon, Apr 25, 2022 at 8:30 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Apr 25, 2022 at 01:07:41PM +0100, David Howells wrote:
> > Matthew Wilcox <willy@infradead.org> wrote:
> >
> > > OK.  You suggested that releasepage was an acceptable place to call it.
> > > How about we have AS_RELEASE_ALL (... or something ...) and then
> > > page_has_private() becomes a bit more complicated ... to the point
> > > where we should probably get rid of it (by embedding it into
> > > filemap_release_folio():
> >
> > I'm not sure page_has_private() is quite so easy to get rid of.
> > shrink_page_list() and collapse_file(), for example, use it to conditionalise
> > a call to try_to_release_page() plus some other bits.
>
> That's what I was saying.  Make the calls to try_to_release_page()
> unconditional and delete page_has_private() because it only confuses
> people who should actually be using PagePrivate().
>
> > I think that, for the moment, I would need to add a check for AS_RELEASE_ALL
> > to page_has_private().
> >
> > David
> >
> >
>

I am not sure what the next steps are here but I wanted to ping about
this patch.  NFS also needs this patch or something like it.  David are
you planning to submit an updated series with an updated patch?

A partial backport of David's original patch here on top of my v3 NFS
netfs conversion patches [1] resolves one of my unit test failures
where there were extra reads from the network instead of the cache.
Also Daire Byrne indicates that he too was seeing the same thing
and he tested my patches below and it resolved his issue as well.
Note that I needed another netfs patch in this series,
"netfs: Provide invalidatepage and releasepage calls"
on top of this one, to resolve the problem.

[1] https://github.com/DaveWysochanskiRH/kernel/commits/nfs-fscache-netfs-removing-folio
$ git log --oneline | head
ad90bddf6570 NFS: Add usage of new VFS API removing folio
9e2a7c301564 mm,netfs: Add removing_folio() to stop netfs read
optimization (TEST ONLY)
776088910162 netfs: Provide invalidatepage and releasepage calls
8aa1379ceb49 NFS: Convert buffered read paths to use netfs when
fscache is enabled
807808d87040 NFS: Configure support for netfs when NFS fscache is configured
43a41cce491d NFS: Rename readpage_async_filler to nfs_pageio_add_page
b90cb1053190 Linux 6.0-rc3
diff mbox series

Patch

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 1398b3fc6fef..b833f440903d 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -54,6 +54,7 @@  const struct address_space_operations afs_file_aops = {
 	.launder_folio	= afs_launder_folio,
 	.releasepage	= netfs_releasepage,
 	.invalidate_folio = netfs_invalidate_folio,
+	.removing_folio	= netfs_removing_folio,
 	.write_begin	= afs_write_begin,
 	.write_end	= afs_write_end,
 	.writepage	= afs_writepage,
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 12d2438661fc..6b8891402df5 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -790,6 +790,7 @@  void afs_evict_inode(struct inode *inode)
 
 	ASSERTCMP(inode->i_ino, ==, vnode->fid.vnode);
 
+	clear_bit(AS_NOTIFY_REMOVING_FOLIO, &inode->i_mapping->flags);
 	truncate_inode_pages_final(&inode->i_data);
 
 	afs_set_cache_aux(vnode, &aux);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 7b7ef945dc78..7756217dab09 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -686,6 +686,8 @@  static inline void afs_vnode_set_cache(struct afs_vnode *vnode,
 {
 #ifdef CONFIG_AFS_FSCACHE
 	vnode->netfs_ctx.cache = cookie;
+	if (cookie)
+		set_bit(AS_NOTIFY_REMOVING_FOLIO, &vnode->vfs_inode.i_mapping->flags);
 #endif
 }
 
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index d07cb28652f2..125d303c5038 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -404,6 +404,7 @@  cifs_free_inode(struct inode *inode)
 static void
 cifs_evict_inode(struct inode *inode)
 {
+	clear_bit(AS_NOTIFY_REMOVING_FOLIO, &inode->i_mapping->flags);
 	truncate_inode_pages_final(&inode->i_data);
 	if (inode->i_state & I_PINNING_FSCACHE_WB)
 		cifs_fscache_unuse_inode_cookie(inode, true);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 12663d9d1e51..71f96701e602 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3923,6 +3923,7 @@  const struct address_space_operations cifs_addr_ops = {
 	.direct_IO = cifs_direct_io,
 	.invalidate_folio = netfs_invalidate_folio,
 	.launder_folio = cifs_launder_folio,
+	.removing_folio	= netfs_removing_folio,
 	/*
 	 * TODO: investigate and if useful we could add an cifs_migratePage
 	 * helper (under an CONFIG_MIGRATION) in the future, and also
diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
index bb1c3a372de4..5d8464bed798 100644
--- a/fs/cifs/fscache.c
+++ b/fs/cifs/fscache.c
@@ -108,6 +108,9 @@  void cifs_fscache_get_inode_cookie(struct inode *inode)
 				       &cifsi->uniqueid, sizeof(cifsi->uniqueid),
 				       &cd, sizeof(cd),
 				       i_size_read(&cifsi->vfs_inode));
+	if (cifsi->netfs_ctx.cache)
+		set_bit(AS_NOTIFY_REMOVING_FOLIO, inode->i_mapping->flags);
+
 }
 
 void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index bc4db520d5c7..f373b9da6258 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -114,6 +114,7 @@  extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
 extern atomic_t netfs_n_rh_write_zskip;
+extern atomic_t netfs_n_rh_remove_folio;
 
 
 static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index a9e8abfa89f8..59ac380c2b08 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -178,7 +178,18 @@  int netfs_releasepage(struct page *page, gfp_t gfp)
 		folio_wait_fscache(folio);
 	}
 
-	fscache_note_page_release(netfs_i_cookie(folio_inode(folio)));
 	return true;
 }
 EXPORT_SYMBOL(netfs_releasepage);
+
+/**
+ * netfs_removing_folio - Notification of a folio about to be removed
+ * @mapping: The pagecache about to be altered
+ * @folio: The folio about to be removed
+ */
+void netfs_removing_folio(struct address_space *mapping, struct folio *folio)
+{
+	netfs_stat(&netfs_n_rh_remove_folio);
+	fscache_note_page_release(netfs_i_cookie(mapping->host));
+}
+EXPORT_SYMBOL(netfs_removing_folio);
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index 85aef51cc6cf..9a0c49540979 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -28,17 +28,20 @@  atomic_t netfs_n_rh_write_begin;
 atomic_t netfs_n_rh_write_done;
 atomic_t netfs_n_rh_write_failed;
 atomic_t netfs_n_rh_write_zskip;
+atomic_t netfs_n_rh_remove_folio;
 
 void netfs_stats_show(struct seq_file *m)
 {
-	seq_printf(m, "RdHelp : DR=%u RA=%u RP=%u WB=%u WBZ=%u rr=%u sr=%u\n",
+	seq_printf(m, "RdHelp : rr=%u sr=%u\n",
+		   atomic_read(&netfs_n_rh_rreq),
+		   atomic_read(&netfs_n_rh_sreq));
+	seq_printf(m, "RdHelp : DR=%u RA=%u RP=%u WB=%u WBZ=%u rm=%u\n",
 		   atomic_read(&netfs_n_rh_dio_read),
 		   atomic_read(&netfs_n_rh_readahead),
 		   atomic_read(&netfs_n_rh_readpage),
 		   atomic_read(&netfs_n_rh_write_begin),
 		   atomic_read(&netfs_n_rh_write_zskip),
-		   atomic_read(&netfs_n_rh_rreq),
-		   atomic_read(&netfs_n_rh_sreq));
+		   atomic_read(&netfs_n_rh_remove_folio));
 	seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n",
 		   atomic_read(&netfs_n_rh_zero),
 		   atomic_read(&netfs_n_rh_short_read),
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..269327d36bd7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -360,6 +360,8 @@  struct address_space_operations {
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidate_folio) (struct folio *, size_t offset, size_t len);
 	int (*releasepage) (struct page *, gfp_t);
+	void (*removing_folio)(struct address_space *mapping,
+			       struct folio *folio);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
 	/*
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index b503d13061f4..717114531da9 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -316,6 +316,7 @@  extern int netfs_write_begin(struct file *, struct address_space *,
 			     void **);
 extern void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 extern int netfs_releasepage(struct page *page, gfp_t gfp_flags);
+extern void netfs_removing_folio(struct address_space *mapping, struct folio *folio);
 
 extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool);
 extern void netfs_get_subrequest(struct netfs_io_subrequest *subreq,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994cd943a..976b56180700 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -199,6 +199,7 @@  enum mapping_flags {
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
 	AS_LARGE_FOLIO_SUPPORT = 6,
+	AS_NOTIFY_REMOVING_FOLIO,	/* Call aops->removing_folio() */
 };
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 3a5ffb5587cd..1cacf11369d9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -209,6 +209,20 @@  static void filemap_unaccount_folio(struct address_space *mapping,
 		folio_account_cleaned(folio, inode_to_wb(mapping->host));
 }
 
+/*
+ * Note that a page is about to be removed from the pagecache.  If a page that
+ * is about to be removed had been copied to the cache, then in future fscache
+ * won't be able to skip checking in the cache.  We do, however, need to do the
+ * notification before the removal lest we race with the page being brought
+ * back again.
+ */
+static void fscache_notify_removing_page(struct address_space *mapping,
+					 struct folio *folio)
+{
+	if (unlikely(test_bit(AS_NOTIFY_REMOVING_FOLIO, &mapping->flags)))
+		mapping->a_ops->removing_folio(mapping, folio);
+}
+
 /*
  * Delete a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
@@ -219,6 +233,7 @@  void __filemap_remove_folio(struct folio *folio, void *shadow)
 	struct address_space *mapping = folio->mapping;
 
 	trace_mm_filemap_delete_from_page_cache(folio);
+	fscache_notify_removing_page(mapping, folio);
 	filemap_unaccount_folio(mapping, folio);
 	page_cache_delete(mapping, folio, shadow);
 }