diff mbox series

[RFC] memcpy_from_folio()

Message ID Y8qr8c3+SJLGWhUo@casper.infradead.org (mailing list archive)
State New
Headers show
Series [RFC] memcpy_from_folio() | expand

Commit Message

Matthew Wilcox Jan. 20, 2023, 2:57 p.m. UTC
I think I have a good folio replacement for memcpy_from_page().  One of
the annoying things about dealing with multi-page folios is that you
can't kmap the entire folio, yet on systems without highmem, you don't
need to.  It's also somewhat annoying in the caller to keep track
of n/len/offset/pos/...

I think this is probably the best option.  We could have a loop that
kmaps each page in the folio, but that seems like excessive complexity.
I'm happy to have highmem systems be less efficient, since they are
anyway.  Another potential area of concern is that folios can be quite
large and maybe having preemption disabled while we copy 2MB of data
might be a bad thing.  If so, the API is fine with limiting the amount
of data we copy, we just need to find out that it is a problem and
decide what the correct limit is, if it's not folio_size().

 fs/ext4/verity.c           |   16 +++++++---------
 include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
 include/linux/page-flags.h |    1 +
 3 files changed, 37 insertions(+), 9 deletions(-)

Comments

Ira Weiny Jan. 21, 2023, 7:15 a.m. UTC | #1
Matthew Wilcox wrote:
> I think I have a good folio replacement for memcpy_from_page().  One of
> the annoying things about dealing with multi-page folios is that you
> can't kmap the entire folio, yet on systems without highmem, you don't
> need to.  It's also somewhat annoying in the caller to keep track
> of n/len/offset/pos/...
> 
> I think this is probably the best option.  We could have a loop that
> kmaps each page in the folio, but that seems like excessive complexity.

Why?  IMO better to contain the complexity of highmem systems into any
memcpy_[to,from]_folio() calls then spread them around the kernel.

> I'm happy to have highmem systems be less efficient, since they are
> anyway.  Another potential area of concern is that folios can be quite
> large and maybe having preemption disabled while we copy 2MB of data
> might be a bad thing.  If so, the API is fine with limiting the amount
> of data we copy, we just need to find out that it is a problem and
> decide what the correct limit is, if it's not folio_size().

Why not map the pages only when needed?  I agree that keeping preemption
disabled for a long time is a bad thing.  But kmap_local_page does not
disable preemption, only migration.

Regardless any looping on the maps is going to only be on highmem systems
and we can map the pages only if/when needed.  Synchronization of the folio
should be handled by the caller.  So it is fine to all allow migration
during memcpy_from_folio().

So why not loop through the pages only when needed?

> 
>  fs/ext4/verity.c           |   16 +++++++---------
>  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
>  include/linux/page-flags.h |    1 +
>  3 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> index e4da1704438e..afe847c967a4 100644
> --- a/fs/ext4/verity.c
> +++ b/fs/ext4/verity.c
> @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void *buf, size_t count,
>  			  loff_t pos)
>  {
>  	while (count) {
> -		size_t n = min_t(size_t, count,
> -				 PAGE_SIZE - offset_in_page(pos));
> -		struct page *page;
> +		struct folio *folio;
> +		size_t n;
>  
> -		page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> +		folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
>  					 NULL);

Is this an issue with how many pages get read into the page
cache?  I went off on a tangent thinking this read the entire folio into
the cache.  But I see now I was wrong.  If this is operating page by page
why change this function at all?

> -		if (IS_ERR(page))
> -			return PTR_ERR(page);
> -
> -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> +		if (IS_ERR(folio))
> +			return PTR_ERR(folio);
>  
> -		put_page(page);
> +		n = memcpy_from_file_folio(buf, folio, pos, count);
> +		folio_put(folio);
>  
>  		buf += n;
>  		pos += n;
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 9fa462561e05..9917357b9e8f 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page, size_t offset, size_t len)
>  	kunmap_local(addr);
>  }
>  
> +/**
> + * memcpy_from_file_folio - Copy some bytes from a file folio.
> + * @to: The destination buffer.
> + * @folio: The folio to copy from.
> + * @pos: The position in the file.
> + * @len: The maximum number of bytes to copy.
> + *
> + * Copy up to @len bytes from this folio.  This may be limited by PAGE_SIZE

I have a problem with 'may be limited'.  How is the caller to know this?
Won't this propagate a lot of checks in the caller?  Effectively replacing
one complexity in the callers for another?

> + * if the folio comes from HIGHMEM, and by the size of the folio.
> + *
> + * Return: The number of bytes copied from the folio.
> + */
> +static inline size_t memcpy_from_file_folio(char *to, struct folio *folio,
> +		loff_t pos, size_t len)
> +{
> +	size_t offset = offset_in_folio(folio, pos);
> +	char *from = kmap_local_folio(folio, offset);
> +
> +	if (folio_test_highmem(folio))
> +		len = min(len, PAGE_SIZE - offset);
> +	else
> +		len = min(len, folio_size(folio) - offset);
> +
> +	memcpy(to, from, len);

Do we need flush_dcache_page() for the pages?

I gave this an attempt today before I realized read_mapping_folio() only
reads a single page.  :-(

How does memcpy_from_file_folio() work beyond a single page?  And in that
case what is the point?  The more I think about this the more confused I
get.

Ira
Matthew Wilcox Jan. 22, 2023, 12:36 p.m. UTC | #2
On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> Matthew Wilcox wrote:
> > I think I have a good folio replacement for memcpy_from_page().  One of
> > the annoying things about dealing with multi-page folios is that you
> > can't kmap the entire folio, yet on systems without highmem, you don't
> > need to.  It's also somewhat annoying in the caller to keep track
> > of n/len/offset/pos/...
> > 
> > I think this is probably the best option.  We could have a loop that
> > kmaps each page in the folio, but that seems like excessive complexity.
> 
> Why?  IMO better to contain the complexity of highmem systems into any
> memcpy_[to,from]_folio() calls then spread them around the kernel.

Sure, but look at the conversion that I posted.  It's actually simpler
than using the memcpy_from_page() API.

> > I'm happy to have highmem systems be less efficient, since they are
> > anyway.  Another potential area of concern is that folios can be quite
> > large and maybe having preemption disabled while we copy 2MB of data
> > might be a bad thing.  If so, the API is fine with limiting the amount
> > of data we copy, we just need to find out that it is a problem and
> > decide what the correct limit is, if it's not folio_size().
> 
> Why not map the pages only when needed?  I agree that keeping preemption
> disabled for a long time is a bad thing.  But kmap_local_page does not
> disable preemption, only migration.

Some of the scheduler people aren't terribly happy about even disabling
migration for a long time.  Is "copying 2MB of data" a long time?  If I've
done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
DIMMs installed.  That works out to 17GB/s and so copying 2MB of data
will take 118us.  Probably OK for even the most demanding workload.

> Regardless any looping on the maps is going to only be on highmem systems
> and we can map the pages only if/when needed.  Synchronization of the folio
> should be handled by the caller.  So it is fine to all allow migration
> during memcpy_from_folio().
> 
> So why not loop through the pages only when needed?

So you're proposing re-enabling migration after calling
kmap_local_folio()?  I don't really understand.

> > 
> >  fs/ext4/verity.c           |   16 +++++++---------
> >  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
> >  include/linux/page-flags.h |    1 +
> >  3 files changed, 37 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > index e4da1704438e..afe847c967a4 100644
> > --- a/fs/ext4/verity.c
> > +++ b/fs/ext4/verity.c
> > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void *buf, size_t count,
> >  			  loff_t pos)
> >  {
> >  	while (count) {
> > -		size_t n = min_t(size_t, count,
> > -				 PAGE_SIZE - offset_in_page(pos));
> > -		struct page *page;
> > +		struct folio *folio;
> > +		size_t n;
> >  
> > -		page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> > +		folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
> >  					 NULL);
> 
> Is this an issue with how many pages get read into the page
> cache?  I went off on a tangent thinking this read the entire folio into
> the cache.  But I see now I was wrong.  If this is operating page by page
> why change this function at all?

The folio may (indeed _should_) be already present in the cache, otherwise
the cache isn't doing a very good job.  If read_mapping_folio() ends up
having to allocate the folio, today it only allocates a single page folio.
But if somebody else allocated it through the readahead code, and the
filesystem supports multi-page folios, then it will be larger than a
single page.  All callers must be prepared to handle a multi-page folio.

> > -		if (IS_ERR(page))
> > -			return PTR_ERR(page);
> > -
> > -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> > +		if (IS_ERR(folio))
> > +			return PTR_ERR(folio);
> >  
> > -		put_page(page);
> > +		n = memcpy_from_file_folio(buf, folio, pos, count);
> > +		folio_put(folio);
> >  
> >  		buf += n;
> >  		pos += n;
> > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > index 9fa462561e05..9917357b9e8f 100644
> > --- a/include/linux/highmem.h
> > +++ b/include/linux/highmem.h
> > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page, size_t offset, size_t len)
> >  	kunmap_local(addr);
> >  }
> >  
> > +/**
> > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > + * @to: The destination buffer.
> > + * @folio: The folio to copy from.
> > + * @pos: The position in the file.
> > + * @len: The maximum number of bytes to copy.
> > + *
> > + * Copy up to @len bytes from this folio.  This may be limited by PAGE_SIZE
> 
> I have a problem with 'may be limited'.  How is the caller to know this?

... from the return value?

> Won't this propagate a lot of checks in the caller?  Effectively replacing
> one complexity in the callers for another?

Look at the caller I converted!  It _reduces_ the amount of checks in
the caller.

> > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > + *
> > + * Return: The number of bytes copied from the folio.
> > + */
> > +static inline size_t memcpy_from_file_folio(char *to, struct folio *folio,
> > +		loff_t pos, size_t len)
> > +{
> > +	size_t offset = offset_in_folio(folio, pos);
> > +	char *from = kmap_local_folio(folio, offset);
> > +
> > +	if (folio_test_highmem(folio))
> > +		len = min(len, PAGE_SIZE - offset);
> > +	else
> > +		len = min(len, folio_size(folio) - offset);
> > +
> > +	memcpy(to, from, len);
> 
> Do we need flush_dcache_page() for the pages?

Why?  memcpy_from_page() doesn't have one.

> I gave this an attempt today before I realized read_mapping_folio() only
> reads a single page.  :-(
> 
> How does memcpy_from_file_folio() work beyond a single page?  And in that
> case what is the point?  The more I think about this the more confused I
> get.

In the highmem case, we map a single page and so cannot go beyond a
single page.  If the folio wasn't allocated from highmem, we're just
using its address directly, so we can access the whole folio.

Hope I've cleared up your confusions.
Fabio M. De Francesco Jan. 23, 2023, 5:50 p.m. UTC | #3
On domenica 22 gennaio 2023 13:36:51 CET Matthew Wilcox wrote:
> On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> > Matthew Wilcox wrote:
> > > I think I have a good folio replacement for memcpy_from_page().  One of
> > > the annoying things about dealing with multi-page folios is that you
> > > can't kmap the entire folio, yet on systems without highmem, you don't
> > > need to.  It's also somewhat annoying in the caller to keep track
> > > of n/len/offset/pos/...
> > > 
> > > I think this is probably the best option.  We could have a loop that
> > > kmaps each page in the folio, but that seems like excessive complexity.
> > 
> > Why?  IMO better to contain the complexity of highmem systems into any
> > memcpy_[to,from]_folio() calls then spread them around the kernel.
> 
> Sure, but look at the conversion that I posted.  It's actually simpler
> than using the memcpy_from_page() API.
> 
> > > I'm happy to have highmem systems be less efficient, since they are
> > > anyway.  Another potential area of concern is that folios can be quite
> > > large and maybe having preemption disabled while we copy 2MB of data
> > > might be a bad thing.  If so, the API is fine with limiting the amount
> > > of data we copy, we just need to find out that it is a problem and
> > > decide what the correct limit is, if it's not folio_size().
> > 
> > Why not map the pages only when needed?  I agree that keeping preemption
> > disabled for a long time is a bad thing.  But kmap_local_page does not
> > disable preemption, only migration.
> 
> Some of the scheduler people aren't terribly happy about even disabling
> migration for a long time.  Is "copying 2MB of data" a long time?  If I've
> done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
> DIMMs installed.  That works out to 17GB/s and so copying 2MB of data
> will take 118us.  Probably OK for even the most demanding workload.
> 
> > Regardless any looping on the maps is going to only be on highmem systems
> > and we can map the pages only if/when needed.  Synchronization of the 
folio
> > should be handled by the caller.  So it is fine to all allow migration
> > during memcpy_from_folio().
> > 
> > So why not loop through the pages only when needed?
> 
> So you're proposing re-enabling migration after calling
> kmap_local_folio()?  I don't really understand.
> 
> > >  fs/ext4/verity.c           |   16 +++++++---------
> > >  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
> > >  include/linux/page-flags.h |    1 +
> > >  3 files changed, 37 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > > index e4da1704438e..afe847c967a4 100644
> > > --- a/fs/ext4/verity.c
> > > +++ b/fs/ext4/verity.c
> > > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void
> > > *buf, size_t count,> > 
> > >  			  loff_t pos)
> > >  
> > >  {
> > >  
> > >  	while (count) {
> > > 
> > > -		size_t n = min_t(size_t, count,
> > > -				 PAGE_SIZE - offset_in_page(pos));
> > > -		struct page *page;
> > > +		struct folio *folio;
> > > +		size_t n;
> > > 
> > > -		page = read_mapping_page(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > +		folio = read_mapping_folio(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > 
> > >  					 NULL);
> > 
> > Is this an issue with how many pages get read into the page
> > cache?  I went off on a tangent thinking this read the entire folio into
> > the cache.  But I see now I was wrong.  If this is operating page by page
> > why change this function at all?
> 
> The folio may (indeed _should_) be already present in the cache, otherwise
> the cache isn't doing a very good job.  If read_mapping_folio() ends up
> having to allocate the folio, today it only allocates a single page folio.
> But if somebody else allocated it through the readahead code, and the
> filesystem supports multi-page folios, then it will be larger than a
> single page.  All callers must be prepared to handle a multi-page folio.
> 
> > > -		if (IS_ERR(page))
> > > -			return PTR_ERR(page);
> > > -
> > > -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> > > +		if (IS_ERR(folio))
> > > +			return PTR_ERR(folio);
> > > 
> > > -		put_page(page);
> > > +		n = memcpy_from_file_folio(buf, folio, pos, count);
> > > +		folio_put(folio);
> > > 
> > >  		buf += n;
> > >  		pos += n;
> > > 
> > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > > index 9fa462561e05..9917357b9e8f 100644
> > > --- a/include/linux/highmem.h
> > > +++ b/include/linux/highmem.h
> > > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page,
> > > size_t offset, size_t len)> > 
> > >  	kunmap_local(addr);
> > >  
> > >  }
> > > 
> > > +/**
> > > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > > + * @to: The destination buffer.
> > > + * @folio: The folio to copy from.
> > > + * @pos: The position in the file.
> > > + * @len: The maximum number of bytes to copy.
> > > + *
> > > + * Copy up to @len bytes from this folio.  This may be limited by
> > > PAGE_SIZE
> > 
> > I have a problem with 'may be limited'.  How is the caller to know this?
> 
> ... from the return value?
> 
> > Won't this propagate a lot of checks in the caller?  Effectively replacing
> > one complexity in the callers for another?
> 
> Look at the caller I converted!  It _reduces_ the amount of checks in
> the caller.
> 
> > > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > > + *
> > > + * Return: The number of bytes copied from the folio.
> > > + */
> > > +static inline size_t memcpy_from_file_folio(char *to, struct folio
> > > *folio,
> > > +		loff_t pos, size_t len)
> > > +{
> > > +	size_t offset = offset_in_folio(folio, pos);
> > > +	char *from = kmap_local_folio(folio, offset);
> > > +
> > > +	if (folio_test_highmem(folio))
> > > +		len = min(len, PAGE_SIZE - offset);
> > > +	else
> > > +		len = min(len, folio_size(folio) - offset);
> > > +
> > > +	memcpy(to, from, len);
> > 
> > Do we need flush_dcache_page() for the pages?
> 
> Why?  memcpy_from_page() doesn't have one.
> 
> > I gave this an attempt today before I realized read_mapping_folio() only
> > reads a single page.  :-(
> > 
> > How does memcpy_from_file_folio() work beyond a single page?  And in that
> > case what is the point?  The more I think about this the more confused I
> > get.
> 
> In the highmem case, we map a single page and so cannot go beyond a
> single page.  If the folio wasn't allocated from highmem, we're just
> using its address directly, so we can access the whole folio.
> 
> Hope I've cleared up your confusions.

As you know I'm not a memory management expert, instead I'm just a user of 
these APIs. However, since I have been Cc'ed, I have read Matthew's RFC and 
the dialogue that follows. 

FWIW, I think I understand and I like this proposal. Therefore, I also hope to 
see it become a "real" patch.

Fabio
diff mbox series

Patch

diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index e4da1704438e..afe847c967a4 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -42,18 +42,16 @@  static int pagecache_read(struct inode *inode, void *buf, size_t count,
 			  loff_t pos)
 {
 	while (count) {
-		size_t n = min_t(size_t, count,
-				 PAGE_SIZE - offset_in_page(pos));
-		struct page *page;
+		struct folio *folio;
+		size_t n;
 
-		page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
+		folio = read_mapping_folio(inode->i_mapping, pos >> PAGE_SHIFT,
 					 NULL);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
-
-		memcpy_from_page(buf, page, offset_in_page(pos), n);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		put_page(page);
+		n = memcpy_from_file_folio(buf, folio, pos, count);
+		folio_put(folio);
 
 		buf += n;
 		pos += n;
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 9fa462561e05..9917357b9e8f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -414,6 +414,35 @@  static inline void memzero_page(struct page *page, size_t offset, size_t len)
 	kunmap_local(addr);
 }
 
+/**
+ * memcpy_from_file_folio - Copy some bytes from a file folio.
+ * @to: The destination buffer.
+ * @folio: The folio to copy from.
+ * @pos: The position in the file.
+ * @len: The maximum number of bytes to copy.
+ *
+ * Copy up to @len bytes from this folio.  This may be limited by PAGE_SIZE
+ * if the folio comes from HIGHMEM, and by the size of the folio.
+ *
+ * Return: The number of bytes copied from the folio.
+ */
+static inline size_t memcpy_from_file_folio(char *to, struct folio *folio,
+		loff_t pos, size_t len)
+{
+	size_t offset = offset_in_folio(folio, pos);
+	char *from = kmap_local_folio(folio, offset);
+
+	if (folio_test_highmem(folio))
+		len = min(len, PAGE_SIZE - offset);
+	else
+		len = min(len, folio_size(folio) - offset);
+
+	memcpy(to, from, len);
+	kunmap_local(from);
+
+	return len;
+}
+
 /**
  * folio_zero_segments() - Zero two byte ranges in a folio.
  * @folio: The folio to write to.
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bc09194d372f..bba2a32031a2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -531,6 +531,7 @@  PAGEFLAG(Readahead, readahead, PF_NO_COMPOUND)
  * available at this point.
  */
 #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
+#define folio_test_highmem(__f)	is_highmem_idx(folio_zonenum(__f))
 #else
 PAGEFLAG_FALSE(HighMem, highmem)
 #endif