[RFC,54/61] afs: Wait on PG_fscache before modifying/releasing a page
diff mbox series

Message ID 158861253957.340223.7465334678444521655.stgit@warthog.procyon.org.uk
State New
Headers show
Series
  • fscache, cachefiles: Rewrite the I/O interface in terms of kiocb/iov_iter
Related show

Commit Message

David Howells May 4, 2020, 5:15 p.m. UTC
PG_fscache is going to be used to indicate that a page is being written to
the cache, and that the page should not be modified or released until it's
finished.

Make afs_invalidatepage() and afs_releasepage() wait for it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c  |   13 +++++++++++++
 fs/afs/write.c |    9 +++++++++
 2 files changed, 22 insertions(+)

Comments

Matthew Wilcox May 5, 2020, 11:59 a.m. UTC | #1
On Mon, May 04, 2020 at 06:15:39PM +0100, David Howells wrote:
> PG_fscache is going to be used to indicate that a page is being written to
> the cache, and that the page should not be modified or released until it's
> finished.
> 
> Make afs_invalidatepage() and afs_releasepage() wait for it.

Well, why?  Keeping a refcount on the page will prevent it from going
away while it's being written to storage.  And the fact that it's
being written to this cache is no reason to delay the truncate of a file
(is it?)  Similarly, I don't see why we need to wait for the page to make
it to the cache before we start to modify it.  Certainly we'll need to
re-write it to the cache since the cache is now stale, but why should
we wait for the now-stale write to complete?
David Howells May 6, 2020, 7:57 a.m. UTC | #2
Matthew Wilcox <willy@infradead.org> wrote:

> > PG_fscache is going to be used to indicate that a page is being written to
> > the cache, and that the page should not be modified or released until it's
> > finished.
> > 
> > Make afs_invalidatepage() and afs_releasepage() wait for it.
> 
> Well, why?  Keeping a refcount on the page will prevent it from going
> away while it's being written to storage.  And the fact that it's
> being written to this cache is no reason to delay the truncate of a file
> (is it?)

Won't that screw up ITER_MAPPING?  Does that mean that ITER_MAPPING isn't
viable?

David
Matthew Wilcox May 6, 2020, 11:09 a.m. UTC | #3
On Wed, May 06, 2020 at 08:57:58AM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > > PG_fscache is going to be used to indicate that a page is being written to
> > > the cache, and that the page should not be modified or released until it's
> > > finished.
> > > 
> > > Make afs_invalidatepage() and afs_releasepage() wait for it.
> > 
> > Well, why?  Keeping a refcount on the page will prevent it from going
> > away while it's being written to storage.  And the fact that it's
> > being written to this cache is no reason to delay the truncate of a file
> > (is it?)
> 
> Won't that screw up ITER_MAPPING?  Does that mean that ITER_MAPPING isn't
> viable?

Can you remind me why ITER_MAPPING needs:

"The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed."

An elevated refcount prevents migration, and it also prevents the pages
from being freed.  It doesn't prevent them from being truncated out of
the file, but it does ensure the pages aren't reallocated.
David Howells May 6, 2020, 2:24 p.m. UTC | #4
Matthew Wilcox <willy@infradead.org> wrote:

> > Won't that screw up ITER_MAPPING?  Does that mean that ITER_MAPPING isn't
> > viable?
> 
> Can you remind me why ITER_MAPPING needs:
> 
> "The caller must guarantee that the pages are all present and they must be
> locked using PG_locked, PG_writeback or PG_fscache to prevent them from
> going away or being migrated whilst they're being accessed."
> 
> An elevated refcount prevents migration, and it also prevents the pages
> from being freed.  It doesn't prevent them from being truncated out of
> the file, but it does ensure the pages aren't reallocated.

ITER_MAPPING relies on the mapping to maintain the pointers to the pages so
that it can find them rather than being like ITER_BVEC where there's a
separate list.

Truncate removes the pages from the mapping - at which point ITER_MAPPING can
no longer find them.

David
David Howells May 8, 2020, 2:39 p.m. UTC | #5
David Howells <dhowells@redhat.com> wrote:

> ITER_MAPPING relies on the mapping to maintain the pointers to the pages so
> that it can find them rather than being like ITER_BVEC where there's a
> separate list.
> 
> Truncate removes the pages from the mapping - at which point ITER_MAPPING can
> no longer find them.

It looks like ITER_MAPPING is fine with truncate, provided the invalidation
waits for the iterator to complete first:

	int truncate_inode_page(struct address_space *mapping, struct page *page)
	{
		VM_BUG_ON_PAGE(PageTail(page), page);

		if (page->mapping != mapping)
			return -EIO;

		truncate_cleanup_page(mapping, page);
		delete_from_page_cache(page);
		return 0;
	}

In which case, ->invalidatepage() needs to wait for PG_fscache.

Similarly, it looks like ->releasepage() is fine, provided it waits for
PG_fscache also.

If I have to use ITER_BVEC, what's the advisability of using vmalloc() to
allocate the bio_vec array for a transient op?  Such an array can reference up
to 1MiB on a 64-bit machine with 4KiB non-compound pages if it only allocates
up to a single page.  I'm wondering what the teardown cost is, though, if all
the corresponding PTEs have to be erased from all CPUs.

David

Patch
diff mbox series

diff --git a/fs/afs/file.c b/fs/afs/file.c
index ea9f6d45d9ff..b25c5ab1f4e1 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -548,6 +548,11 @@  static void afs_invalidatepage(struct page *page, unsigned int offset,
 
 	/* we clean up only if the entire page is being invalidated */
 	if (offset == 0 && length == PAGE_SIZE) {
+#ifdef CONFIG_AFS_FSCACHE
+		if (PageFsCache(page))
+			wait_on_page_fscache(page);
+#endif
+
 		if (PagePrivate(page)) {
 			priv = page_private(page);
 			trace_afs_page_dirty(vnode, tracepoint_string("inval"),
@@ -575,6 +580,14 @@  static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
+#ifdef CONFIG_AFS_FSCACHE
+	if (PageFsCache(page)) {
+		if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
+			return false;
+		wait_on_page_fscache(page);
+	}
+#endif
+
 	if (PagePrivate(page)) {
 		priv = page_private(page);
 		trace_afs_page_dirty(vnode, tracepoint_string("rel"),
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 390fee44446c..3632909fcd91 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -111,6 +111,10 @@  int afs_write_begin(struct file *file, struct address_space *mapping,
 		SetPageUptodate(page);
 	}
 
+#ifdef CONFIG_AFS_FSCACHE
+	wait_on_page_fscache(page);
+#endif
+
 	/* page won't leak in error case: it eventually gets cleaned off LRU */
 	*pagep = page;
 
@@ -786,6 +790,11 @@  vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	/* Wait for the page to be written to the cache before we allow it to
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
+#ifdef CONFIG_AFS_FSCACHE
+	if (PageFsCache(vmf->page) &&
+	    wait_on_page_bit_killable(vmf->page, PG_fscache) < 0)
+		return VM_FAULT_RETRY;
+#endif
 
 	if (PageWriteback(vmf->page) &&
 	    wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)