[02/31] fscrypt: Add some folio helper functions

Message ID	20230126202415.1682629-3-willy@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: "Theodore Tso" <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 02/31] fscrypt: Add some folio helper functions Date: Thu, 26 Jan 2023 20:23:46 +0000 Message-Id: <20230126202415.1682629-3-willy@infradead.org> In-Reply-To: <20230126202415.1682629-1-willy@infradead.org> References: <20230126202415.1682629-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Convert most of ext4 to folios \| expand [00/31] Convert most of ext4 to folios [01/31] fs: Add FGP_WRITEBEGIN [02/31] fscrypt: Add some folio helper functions [03/31] ext4: Convert ext4_bio_write_page() to use a folio [04/31] ext4: Convert ext4_finish_bio() to use folios [05/31] ext4: Convert ext4_writepage() to use a folio [06/31] ext4: Turn mpage_process_page() into mpage_process_folio() [07/31] ext4: Convert mpage_submit_page() to mpage_submit_folio() [08/31] ext4: Convert ext4_bio_write_page() to ext4_bio_write_folio() [09/31] ext4: Convert ext4_readpage_inline() to take a folio [10/31] ext4: Convert ext4_convert_inline_data_to_extent() to use a folio [11/31] ext4: Convert ext4_try_to_write_inline_data() to use a folio [12/31] ext4: Convert ext4_da_convert_inline_data_to_extent() to use a folio [13/31] ext4: Convert ext4_da_write_inline_data_begin() to use a folio [14/31] ext4: Convert ext4_read_inline_page() to ext4_read_inline_folio() [15/31] ext4: Convert ext4_write_inline_data_end() to use a folio [16/31] ext4: Convert ext4_write_begin() to use a folio [17/31] ext4: Convert ext4_write_end() to use a folio [18/31] ext4: Use a folio in ext4_journalled_write_end() [19/31] ext4: Convert ext4_journalled_zero_new_buffers() to use a folio [20/31] ext4: Convert __ext4_block_zero_page_range() to use a folio [21/31] ext4: Convert __ext4_journalled_writepage() to take a folio [22/31] ext4: Convert ext4_page_nomap_can_writeout() to take a folio [23/31] ext4: Use a folio in ext4_da_write_begin() [24/31] ext4: Convert ext4_mpage_readpages() to work on folios [25/31] ext4: Convert ext4_block_write_begin() to take a folio [26/31] ext4: Convert ext4_writepage() to take a folio [27/31] ext4: Use a folio in ext4_page_mkwrite() [28/31] ext4: Use a folio iterator in __read_end_io() [29/31] ext4: Convert mext_page_mkuptodate() to take a folio [30/31] ext4: Convert pagecache_read() to use a folio [31/31] ext4: Use a folio in ext4_read_merkle_tree_page

Matthew Wilcox Jan. 26, 2023, 8:23 p.m. UTC

fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/fscrypt.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Eric Biggers Jan. 27, 2023, 3:02 a.m. UTC | #1

On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/fscrypt.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 4f5f8a651213..c2c07d36fb3a 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
>  	return (struct page *)page_private(bounce_page);
>  }
>  
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> +	return folio->mapping == NULL;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> +	return bounce_folio->private;
> +}

ext4_bio_write_folio() is still doing:

	bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);

Should it be creating a "bounce folio" instead, or is that not in the scope of
this patchset?

- Eric

Matthew Wilcox Jan. 27, 2023, 4:13 p.m. UTC | #2

On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  include/linux/fscrypt.h | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > index 4f5f8a651213..c2c07d36fb3a 100644
> > --- a/include/linux/fscrypt.h
> > +++ b/include/linux/fscrypt.h
> > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> >  	return (struct page *)page_private(bounce_page);
> >  }
> >  
> > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > +{
> > +	return folio->mapping == NULL;
> > +}
> > +
> > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > +{
> > +	return bounce_folio->private;
> > +}
> 
> ext4_bio_write_folio() is still doing:
> 
> 	bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
> 
> Should it be creating a "bounce folio" instead, or is that not in the scope of
> this patchset?

It's out of scope for _this_ patchset.  I think it's a patchset that
could come either before or after, and is needed to support large folios
with ext4.  The biggest problem with doing that conversion is that
bounce pages are allocated from a mempool which obviously only allocates
order-0 folios.  I don't know what to do about that.  Have a mempool
for each order of folio that the filesystem supports?  Try to allocate
folios without a mempool and then split the folio if allocation fails?
Have a mempool containing PMD-order pages and split them ourselves if
we need to allocate from the mempool?

Nothing's really standing out to me as the perfect answer.  There are
probably other alternatives.

Eric Biggers Jan. 27, 2023, 4:21 p.m. UTC | #3

On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> > On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  include/linux/fscrypt.h | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > > 
> > > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > > index 4f5f8a651213..c2c07d36fb3a 100644
> > > --- a/include/linux/fscrypt.h
> > > +++ b/include/linux/fscrypt.h
> > > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> > >  	return (struct page *)page_private(bounce_page);
> > >  }
> > >  
> > > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > > +{
> > > +	return folio->mapping == NULL;
> > > +}
> > > +
> > > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > > +{
> > > +	return bounce_folio->private;
> > > +}
> > 
> > ext4_bio_write_folio() is still doing:
> > 
> > 	bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
> > 
> > Should it be creating a "bounce folio" instead, or is that not in the scope of
> > this patchset?
> 
> It's out of scope for _this_ patchset.  I think it's a patchset that
> could come either before or after, and is needed to support large folios
> with ext4.  The biggest problem with doing that conversion is that
> bounce pages are allocated from a mempool which obviously only allocates
> order-0 folios.  I don't know what to do about that.  Have a mempool
> for each order of folio that the filesystem supports?  Try to allocate
> folios without a mempool and then split the folio if allocation fails?
> Have a mempool containing PMD-order pages and split them ourselves if
> we need to allocate from the mempool?
> 
> Nothing's really standing out to me as the perfect answer.  There are
> probably other alternatives.

Would it be possible to keep using bounce *pages* all the time, even when the
pagecache contains large folios?

- Eric

Matthew Wilcox Jan. 27, 2023, 4:37 p.m. UTC | #4

On Fri, Jan 27, 2023 at 08:21:46AM -0800, Eric Biggers wrote:
> On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> > On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> > > On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > > > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > > > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > ---
> > > >  include/linux/fscrypt.h | 21 +++++++++++++++++++++
> > > >  1 file changed, 21 insertions(+)
> > > > 
> > > > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > > > index 4f5f8a651213..c2c07d36fb3a 100644
> > > > --- a/include/linux/fscrypt.h
> > > > +++ b/include/linux/fscrypt.h
> > > > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> > > >  	return (struct page *)page_private(bounce_page);
> > > >  }
> > > >  
> > > > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > > > +{
> > > > +	return folio->mapping == NULL;
> > > > +}
> > > > +
> > > > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > > > +{
> > > > +	return bounce_folio->private;
> > > > +}
> > > 
> > > ext4_bio_write_folio() is still doing:
> > > 
> > > 	bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
> > > 
> > > Should it be creating a "bounce folio" instead, or is that not in the scope of
> > > this patchset?
> > 
> > It's out of scope for _this_ patchset.  I think it's a patchset that
> > could come either before or after, and is needed to support large folios
> > with ext4.  The biggest problem with doing that conversion is that
> > bounce pages are allocated from a mempool which obviously only allocates
> > order-0 folios.  I don't know what to do about that.  Have a mempool
> > for each order of folio that the filesystem supports?  Try to allocate
> > folios without a mempool and then split the folio if allocation fails?
> > Have a mempool containing PMD-order pages and split them ourselves if
> > we need to allocate from the mempool?
> > 
> > Nothing's really standing out to me as the perfect answer.  There are
> > probably other alternatives.
> 
> Would it be possible to keep using bounce *pages* all the time, even when the
> pagecache contains large folios?

I _think_ so.  Probably the best solution is to attempt to allocate an
order-N folio (with GFP_NOWAIT?) and then fall back to allocating 2^N
pages from the mempool.  It'll require some surgery to ext4_finish_bio()
as well as ext4_bio_write_folio(), fscrypt_encrypt_pagecache_blocks()
and fscrypt_free_bounce_page(), but I think it's doable.  I'll try
to whip something up.

Ritesh Harjani (IBM) March 5, 2023, 9:06 a.m. UTC | #5

"Matthew Wilcox (Oracle)" <willy@infradead.org> writes:

> fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/fscrypt.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)

Straight forward conversion. IIUC, even after this patchset we haven't
killed fscrypt_is_bounce_page() and fscrypt_pagecache_folio(), because
there are other users of this in f2fs and fscrypt.

Looks good to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


-ritesh

>
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 4f5f8a651213..c2c07d36fb3a 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
>  	return (struct page *)page_private(bounce_page);
>  }
>
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> +	return folio->mapping == NULL;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> +	return bounce_folio->private;
> +}
> +
>  void fscrypt_free_bounce_page(struct page *bounce_page);
>
>  /* policy.c */
> @@ -448,6 +458,17 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
>  	return ERR_PTR(-EINVAL);
>  }
>
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> +	return false;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> +	WARN_ON_ONCE(1);
> +	return ERR_PTR(-EINVAL);
> +}
> +
>  static inline void fscrypt_free_bounce_page(struct page *bounce_page)
>  {
>  }
> --
> 2.35.1

Theodore Ts'o March 14, 2023, 10:05 p.m. UTC | #6

On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> 
> It's out of scope for _this_ patchset.  I think it's a patchset that
> could come either before or after, and is needed to support large folios
> with ext4.  The biggest problem with doing that conversion is that
> bounce pages are allocated from a mempool which obviously only allocates
> order-0 folios.  I don't know what to do about that.  Have a mempool
> for each order of folio that the filesystem supports?  Try to allocate
> folios without a mempool and then split the folio if allocation fails?
> Have a mempool containing PMD-order pages and split them ourselves if
> we need to allocate from the mempool?
> 
> Nothing's really standing out to me as the perfect answer.  There are
> probably other alternatives.

Hmm.... should we have some kind of check in case a large folio is
passed to these fscrypt functions?  (e.g., some kind of BUG_ON, or
WARN_ON?)

Or do we just rely on people remembering that when we start trying to
support large folios for ext4, it will probably have to be the easy
cases first (e.g., no fscrypt, no fsverity, block size == page size)?

      	    	      	       	  	    - Ted

Eric Biggers March 14, 2023, 11:12 p.m. UTC | #7

On Tue, Mar 14, 2023 at 06:05:51PM -0400, Theodore Ts'o wrote:
> On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> > 
> > It's out of scope for _this_ patchset.  I think it's a patchset that
> > could come either before or after, and is needed to support large folios
> > with ext4.  The biggest problem with doing that conversion is that
> > bounce pages are allocated from a mempool which obviously only allocates
> > order-0 folios.  I don't know what to do about that.  Have a mempool
> > for each order of folio that the filesystem supports?  Try to allocate
> > folios without a mempool and then split the folio if allocation fails?
> > Have a mempool containing PMD-order pages and split them ourselves if
> > we need to allocate from the mempool?
> > 
> > Nothing's really standing out to me as the perfect answer.  There are
> > probably other alternatives.
> 
> Hmm.... should we have some kind of check in case a large folio is
> passed to these fscrypt functions?  (e.g., some kind of BUG_ON, or
> WARN_ON?)
> 
> Or do we just rely on people remembering that when we start trying to
> support large folios for ext4, it will probably have to be the easy
> cases first (e.g., no fscrypt, no fsverity, block size == page size)?
> 

I think large folio support for fscrypt and fsverity is not that far away.  I
already made the following changes in 6.3:

    51e4e3153ebc ("fscrypt: support decrypting data from large folios")
    5d0f0e57ed90 ("fsverity: support verifying data from large folios")

AFAICT, absent actual testing of course, the only major thing that's still
needed is that fscrypt_encrypt_pagecache_blocks() needs to support large folios.
I'm not sure how it should work, exactly.  Matthew gave a couple options.
Another option is to just continue to use bounce *pages*, and keep track of all
the bounce pages for each folio.

We could certainly make fscrypt_encrypt_pagecache_blocks() WARN when given a
large folio for now, if we aren't going to update it properly anytime soon.

By the way, fscrypt_encrypt_pagecache_blocks() is only used by the fs-layer file
contents encryption, not inline encryption.  Even without changing it, we could
support large folios on encrypted files when inline encryption is being used.

(A smaller thing, which I think I missed in "fsverity: support verifying data
from large folios", is that fsverity_verify_bio() still uses
bio_first_page_all(bio)->mapping->host to get the bio's inode.  Perhaps there
needs to be a page_folio() in there for the ->mapping to be valid?)

- Eric

Theodore Ts'o March 15, 2023, 2:53 a.m. UTC | #8

On Tue, Mar 14, 2023 at 04:12:39PM -0700, Eric Biggers wrote:
> 
> I think large folio support for fscrypt and fsverity is not that far away.  I
> already made the following changes in 6.3:
> 
>     51e4e3153ebc ("fscrypt: support decrypting data from large folios")
>     5d0f0e57ed90 ("fsverity: support verifying data from large folios")

Cool!  I was thinking that fscrypt and fsverity might end up lagging
as far as the large folio support was concerned, but I'm glad that
this might not be the case.

> AFAICT, absent actual testing of course, the only major thing that's still
> needed is that fscrypt_encrypt_pagecache_blocks() needs to support large folios.
> I'm not sure how it should work, exactly.  Matthew gave a couple options.
> Another option is to just continue to use bounce *pages*, and keep track of all
> the bounce pages for each folio.

We don't have to solve that right away; it is possible to support
reads of large folios, but not writes.  If someone reads in a 128k
folio, and then modifies a 4k page in the middle of the page, we could
just split up the 128k folio and then writing out either the single 4k
page that was modified.  (It might very well be that in that case, we
*want* to break up the folio anyway, to avoid the write amplification
problem.)

In any case, I suspect that how we would support large folios for ext4
by first is to support using iomap for buffer I/O --- but only for
file systems where page size == block size, with no fscrypt, no
fsverity, no data=journal, and only for buffered reads.  And for
buffered writes, we'll break apart the folio and then use the existing
ext4_writepages() code path.

We can then gradually start relying on iomap and using large folios
for additional scenarios, both on the read and eventually, on the
write side.  I suspect we'll want to have a way of enabling and
disabling large folios on a fine-grained manner, as well has
potentially proactively breaking up large folios in page_mkwrite (so
that a 4k random page modification doesn't get amplified into the
entire contents of a large folio needing to be written back).

       		     	   	 	    - Ted

[02/31] fscrypt: Add some folio helper functions

Commit Message

Comments

Patch