diff mbox series

[1/3] ceph: Uninline the data on a file opened for writing

Message ID 164242347319.2763588.2514920080375140879.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series [1/3] ceph: Uninline the data on a file opened for writing | expand

Commit Message

David Howells Jan. 17, 2022, 12:44 p.m. UTC
If a ceph file is made up of inline data, uninline that in the ceph_open()
rather than in ceph_page_mkwrite(), ceph_write_iter(), ceph_fallocate() or
ceph_write_begin().

This makes it easier to convert to using the netfs library for VM write
hooks.

Changes
=======
ver #2:
 - Removed the uninline-handling code from ceph_write_begin() also.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: ceph-devel@vger.kernel.org
---

 fs/ceph/addr.c  |   97 +++++++++++++------------------------------------------
 fs/ceph/file.c  |   28 +++++++++-------
 fs/ceph/super.h |    2 +
 3 files changed, 40 insertions(+), 87 deletions(-)

Comments

Matthew Wilcox Jan. 17, 2022, 1:47 p.m. UTC | #1
On Mon, Jan 17, 2022 at 12:44:33PM +0000, David Howells wrote:
> +	if (ceph_caps_issued(ci) & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) {
> +		folio = filemap_get_folio(inode->i_mapping, 0);
> +		if (folio) {
> +			if (folio_test_uptodate(folio)) {
>  				from_pagecache = true;
> -				lock_page(page);
> +				folio_lock(folio);
>  			} else {
> -				put_page(page);
> -				page = NULL;
> +				folio_put(folio);
> +				folio = NULL;

This all falls very much under "doing it the hard way", and quite
possibly under the "actively buggy with races" category.

read_mapping_folio() does what you want, as long as you pass 'filp'
as your 'void *data'.  I should fix that type ...
Jeff Layton Jan. 17, 2022, 2:11 p.m. UTC | #2
On Mon, 2022-01-17 at 13:47 +0000, Matthew Wilcox wrote:
> On Mon, Jan 17, 2022 at 12:44:33PM +0000, David Howells wrote:
> > +	if (ceph_caps_issued(ci) & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) {
> > +		folio = filemap_get_folio(inode->i_mapping, 0);
> > +		if (folio) {
> > +			if (folio_test_uptodate(folio)) {
> >  				from_pagecache = true;
> > -				lock_page(page);
> > +				folio_lock(folio);
> >  			} else {
> > -				put_page(page);
> > -				page = NULL;
> > +				folio_put(folio);
> > +				folio = NULL;
> 
> This all falls very much under "doing it the hard way", and quite
> possibly under the "actively buggy with races" category.
> 
> read_mapping_folio() does what you want, as long as you pass 'filp'
> as your 'void *data'.  I should fix that type ...
> 

That would be nicer, I think. If you do that though, then patch #3
probably needs to come first in the series...
David Howells Jan. 17, 2022, 2:27 p.m. UTC | #3
Jeff Layton <jlayton@kernel.org> wrote:

> That would be nicer, I think. If you do that though, then patch #3
> probably needs to come first in the series...

Seems reasonable.  Could you do that, or do you want me to have a go at it?

David
Jeff Layton Jan. 17, 2022, 2:30 p.m. UTC | #4
On Mon, 2022-01-17 at 14:27 +0000, David Howells wrote:
> Jeff Layton <jlayton@kernel.org> wrote:
> 
> > That would be nicer, I think. If you do that though, then patch #3
> > probably needs to come first in the series...
> 
> Seems reasonable.  Could you do that, or do you want me to have a go at it?
> 

Do you mind doing it? I can take a stab at it, but it may be a while
before I can get to it.

Thanks,
David Howells Jan. 17, 2022, 3:24 p.m. UTC | #5
Jeff Layton <jlayton@kernel.org> wrote:

> On Mon, 2022-01-17 at 13:47 +0000, Matthew Wilcox wrote:
> > This all falls very much under "doing it the hard way", and quite
> > possibly under the "actively buggy with races" category.
> > 
> > read_mapping_folio() does what you want, as long as you pass 'filp'
> > as your 'void *data'.  I should fix that type ...


How much do we care about the case where we don't have either the
CEPH_CAP_FILE_CACHE or the CEPH_CAP_FILE_LAZYIO caps?  Is it possible just to
shove the page into the pagecache whatever we do?  At the moment there are two
threads, both of which get a page - one attached to the page cache, one not.
The rest is then common because from that point on, it doesn't matter where
the folio resides.

David
Jeff Layton Jan. 17, 2022, 3:45 p.m. UTC | #6
On Mon, 2022-01-17 at 15:24 +0000, David Howells wrote:
> Jeff Layton <jlayton@kernel.org> wrote:
> 
> > On Mon, 2022-01-17 at 13:47 +0000, Matthew Wilcox wrote:
> > > This all falls very much under "doing it the hard way", and quite
> > > possibly under the "actively buggy with races" category.
> > > 
> > > read_mapping_folio() does what you want, as long as you pass 'filp'
> > > as your 'void *data'.  I should fix that type ...
> 
> 
> How much do we care about the case where we don't have either the
> CEPH_CAP_FILE_CACHE or the CEPH_CAP_FILE_LAZYIO caps?  Is it possible just to
> shove the page into the pagecache whatever we do?  At the moment there are two
> threads, both of which get a page - one attached to the page cache, one not.
> The rest is then common because from that point on, it doesn't matter where
> the folio resides.
> 

Well, the protocol says that if you don't have CEPH_CAP_FILE_CACHE (Fc)
then you're not allowed to do reads from the cache. I wouldn't care if
we stuffed the page into the mapping anyway, but that could potentially
affect mmap readers at the same time.

OTOH, mmap reads when we don't have Fc is somewhat racy anyway. Maybe it
doesn't matter...
David Howells Jan. 17, 2022, 3:57 p.m. UTC | #7
Matthew Wilcox <willy@infradead.org> wrote:

> read_mapping_folio() does what you want, as long as you pass 'filp'
> as your 'void *data'.  I should fix that type ...

Ah, but *can* I pass file in at that point?  It's true that I have a file* -
but that's in the process of being set up.

David
diff mbox series

Patch

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index c98e5238a1b6..ef1caa43b521 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1274,45 +1274,11 @@  static int ceph_write_begin(struct file *file, struct address_space *mapping,
 			    struct page **pagep, void **fsdata)
 {
 	struct inode *inode = file_inode(file);
-	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct folio *folio = NULL;
-	pgoff_t index = pos >> PAGE_SHIFT;
 	int r;
 
-	/*
-	 * Uninlining should have already been done and everything updated, EXCEPT
-	 * for inline_version sent to the MDS.
-	 */
-	if (ci->i_inline_version != CEPH_INLINE_NONE) {
-		unsigned int fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE;
-		if (aop_flags & AOP_FLAG_NOFS)
-			fgp_flags |= FGP_NOFS;
-		folio = __filemap_get_folio(mapping, index, fgp_flags,
-					    mapping_gfp_mask(mapping));
-		if (!folio)
-			return -ENOMEM;
-
-		/*
-		 * The inline_version on a new inode is set to 1. If that's the
-		 * case, then the folio is brand new and isn't yet Uptodate.
-		 */
-		r = 0;
-		if (index == 0 && ci->i_inline_version != 1) {
-			if (!folio_test_uptodate(folio)) {
-				WARN_ONCE(1, "ceph: write_begin called on still-inlined inode (inline_version %llu)!\n",
-					  ci->i_inline_version);
-				r = -EINVAL;
-			}
-			goto out;
-		}
-		zero_user_segment(&folio->page, 0, folio_size(folio));
-		folio_mark_uptodate(folio);
-		goto out;
-	}
-
 	r = netfs_write_begin(file, inode->i_mapping, pos, len, 0, &folio, NULL,
 			      &ceph_netfs_read_ops, NULL);
-out:
 	if (r == 0)
 		folio_wait_fscache(folio);
 	if (r < 0) {
@@ -1508,19 +1474,6 @@  static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf)
 	sb_start_pagefault(inode->i_sb);
 	ceph_block_sigs(&oldset);
 
-	if (ci->i_inline_version != CEPH_INLINE_NONE) {
-		struct page *locked_page = NULL;
-		if (off == 0) {
-			lock_page(page);
-			locked_page = page;
-		}
-		err = ceph_uninline_data(vma->vm_file, locked_page);
-		if (locked_page)
-			unlock_page(locked_page);
-		if (err < 0)
-			goto out_free;
-	}
-
 	if (off + thp_size(page) <= size)
 		len = thp_size(page);
 	else
@@ -1645,13 +1598,14 @@  void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
 	}
 }
 
-int ceph_uninline_data(struct file *filp, struct page *locked_page)
+int ceph_uninline_data(struct file *file)
 {
-	struct inode *inode = file_inode(filp);
+	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_osd_request *req;
-	struct page *page = NULL;
+	struct folio *folio = NULL;
+	struct page *pages[1];
 	u64 len, inline_version;
 	int err = 0;
 	bool from_pagecache = false;
@@ -1667,34 +1621,30 @@  int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	    inline_version == CEPH_INLINE_NONE)
 		goto out;
 
-	if (locked_page) {
-		page = locked_page;
-		WARN_ON(!PageUptodate(page));
-	} else if (ceph_caps_issued(ci) &
-		   (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) {
-		page = find_get_page(inode->i_mapping, 0);
-		if (page) {
-			if (PageUptodate(page)) {
+	if (ceph_caps_issued(ci) & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) {
+		folio = filemap_get_folio(inode->i_mapping, 0);
+		if (folio) {
+			if (folio_test_uptodate(folio)) {
 				from_pagecache = true;
-				lock_page(page);
+				folio_lock(folio);
 			} else {
-				put_page(page);
-				page = NULL;
+				folio_put(folio);
+				folio = NULL;
 			}
 		}
 	}
 
-	if (page) {
+	if (folio) {
 		len = i_size_read(inode);
-		if (len > PAGE_SIZE)
-			len = PAGE_SIZE;
+		if (len >  folio_size(folio))
+			len = folio_size(folio);
 	} else {
-		page = __page_cache_alloc(GFP_NOFS);
-		if (!page) {
+		folio = filemap_alloc_folio(GFP_NOFS, 0);
+		if (!folio) {
 			err = -ENOMEM;
 			goto out;
 		}
-		err = __ceph_do_getattr(inode, page,
+		err = __ceph_do_getattr(inode, folio_page(folio, 0),
 					CEPH_STAT_CAP_INLINE_DATA, true);
 		if (err < 0) {
 			/* no inline data */
@@ -1732,7 +1682,8 @@  int ceph_uninline_data(struct file *filp, struct page *locked_page)
 		goto out;
 	}
 
-	osd_req_op_extent_osd_data_pages(req, 1, &page, len, 0, false, false);
+	pages[0] = folio_page(folio, 0);
+	osd_req_op_extent_osd_data_pages(req, 1, pages, len, 0, false, false);
 
 	{
 		__le64 xattr_buf = cpu_to_le64(inline_version);
@@ -1769,12 +1720,10 @@  int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	if (err == -ECANCELED)
 		err = 0;
 out:
-	if (page && page != locked_page) {
-		if (from_pagecache) {
-			unlock_page(page);
-			put_page(page);
-		} else
-			__free_pages(page, 0);
+	if (folio) {
+		if (from_pagecache)
+			folio_unlock(folio);
+		folio_put(folio);
 	}
 
 	dout("uninline_data %p %llx.%llx inline_version %llu = %d\n",
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 9d9304e712d9..d1d28220f691 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -205,6 +205,7 @@  static int ceph_init_file_info(struct inode *inode, struct file *file,
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_file_info *fi;
+	int ret;
 
 	dout("%s %p %p 0%o (%s)\n", __func__, inode, file,
 			inode->i_mode, isdir ? "dir" : "regular");
@@ -235,7 +236,22 @@  static int ceph_init_file_info(struct inode *inode, struct file *file,
 	INIT_LIST_HEAD(&fi->rw_contexts);
 	fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen);
 
+	if ((file->f_mode & FMODE_WRITE) &&
+	    ci->i_inline_version != CEPH_INLINE_NONE) {
+		ret = ceph_uninline_data(file);
+		if (ret < 0)
+			goto error;
+	}
+
 	return 0;
+
+error:
+	ceph_fscache_unuse_cookie(inode, file->f_mode & FMODE_WRITE);
+	ceph_put_fmode(ci, fi->fmode, 1);
+	kmem_cache_free(ceph_file_cachep, fi);
+	/* wake up anyone waiting for caps on this inode */
+	wake_up_all(&ci->i_cap_wq);
+	return ret;
 }
 
 /*
@@ -1763,12 +1779,6 @@  static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (err)
 		goto out;
 
-	if (ci->i_inline_version != CEPH_INLINE_NONE) {
-		err = ceph_uninline_data(file, NULL);
-		if (err < 0)
-			goto out;
-	}
-
 	dout("aio_write %p %llx.%llx %llu~%zd getting caps. i_size %llu\n",
 	     inode, ceph_vinop(inode), pos, count, i_size_read(inode));
 	if (fi->fmode & CEPH_FILE_MODE_LAZY)
@@ -2094,12 +2104,6 @@  static long ceph_fallocate(struct file *file, int mode,
 		goto unlock;
 	}
 
-	if (ci->i_inline_version != CEPH_INLINE_NONE) {
-		ret = ceph_uninline_data(file, NULL);
-		if (ret < 0)
-			goto unlock;
-	}
-
 	size = i_size_read(inode);
 
 	/* Are we punching a hole beyond EOF? */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index d0142cc5c41b..f1cec05e4eb8 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1207,7 +1207,7 @@  extern void __ceph_touch_fmode(struct ceph_inode_info *ci,
 /* addr.c */
 extern const struct address_space_operations ceph_aops;
 extern int ceph_mmap(struct file *file, struct vm_area_struct *vma);
-extern int ceph_uninline_data(struct file *filp, struct page *locked_page);
+extern int ceph_uninline_data(struct file *file);
 extern int ceph_pool_perm_check(struct inode *inode, int need);
 extern void ceph_pool_perm_destroy(struct ceph_mds_client* mdsc);
 int ceph_purge_inode_cap(struct inode *inode, struct ceph_cap *cap, bool *invalidate);