[v2,02/28] mm: Add functions to zero portions of a folio

Message ID	20211108040551.1942823-3-willy@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: "Darrick J . Wong " <djwong@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org> Subject: [PATCH v2 02/28] mm: Add functions to zero portions of a folio Date: Mon, 8 Nov 2021 04:05:25 +0000 Message-Id: <20211108040551.1942823-3-willy@infradead.org> In-Reply-To: <20211108040551.1942823-1-willy@infradead.org> References: <20211108040551.1942823-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	iomap/xfs folio patches \| expand [v2,00/28] iomap/xfs folio patches [v2,01/28] csky,sparc: Declare flush_dcache_folio() [v2,02/28] mm: Add functions to zero portions of a folio [v2,03/28] fs: Remove FS_THP_SUPPORT [v2,04/28] fs: Rename AS_THP_SUPPORT and mapping_thp_support [v2,05/28] block: Add bio_add_folio() [v2,06/28] block: Add bio_for_each_folio_all() [v2,07/28] fs/buffer: Convert __block_write_begin_int() to take a folio [v2,08/28] iomap: Convert to_iomap_page to take a folio [v2,09/28] iomap: Convert iomap_page_create to take a folio [v2,10/28] iomap: Convert iomap_page_release to take a folio [v2,11/28] iomap: Convert iomap_releasepage to use a folio [v2,12/28] iomap: Add iomap_invalidate_folio [v2,13/28] iomap: Pass the iomap_page into iomap_set_range_uptodate [v2,14/28] iomap: Convert bio completions to use folios [v2,15/28] iomap: Use folio offsets instead of page offsets [v2,16/28] iomap: Convert iomap_read_inline_data to take a folio [v2,17/28] iomap: Convert readahead and readpage to use a folio [v2,18/28] iomap: Convert iomap_page_mkwrite to use a folio [v2,19/28] iomap: Convert __iomap_zero_iter to use a folio [v2,20/28] iomap: Convert iomap_write_begin() and iomap_write_end() to folios [v2,21/28] iomap: Convert iomap_write_end_inline to take a folio [v2,22/28] iomap,xfs: Convert ->discard_page to ->discard_folio [v2,23/28] iomap: Simplify iomap_writepage_map() [v2,24/28] iomap: Simplify iomap_do_writepage() [v2,25/28] iomap: Convert iomap_add_to_ioend() to take a folio [v2,26/28] iomap: Convert iomap_migrate_page() to use folios [v2,27/28] iomap: Support multi-page folios in invalidatepage [v2,28/28] xfs: Support multi-page folios

Matthew Wilcox Nov. 8, 2021, 4:05 a.m. UTC

These functions are wrappers around zero_user_segments(), which means
that zero_user_segments() can now be called for compound pages even when
CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/highmem.h | 44 ++++++++++++++++++++++++++++++++++++++---
 mm/highmem.c            |  2 --
 2 files changed, 41 insertions(+), 5 deletions(-)

Christoph Hellwig Nov. 9, 2021, 8:40 a.m. UTC | #1

On Mon, Nov 08, 2021 at 04:05:25AM +0000, Matthew Wilcox (Oracle) wrote:
> These functions are wrappers around zero_user_segments(), which means
> that zero_user_segments() can now be called for compound pages even when
> CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Related note: the inline !HIGHMEM version should switch to page_address
instead of kmap_local_page to make the code more obvious.

Darrick J. Wong Nov. 17, 2021, 4:45 a.m. UTC | #2

On Mon, Nov 08, 2021 at 04:05:25AM +0000, Matthew Wilcox (Oracle) wrote:
> These functions are wrappers around zero_user_segments(), which means
> that zero_user_segments() can now be called for compound pages even when
> CONFIG_TRANSPARENT_HUGEPAGE is disabled.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/highmem.h | 44 ++++++++++++++++++++++++++++++++++++++---
>  mm/highmem.c            |  2 --
>  2 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 25aff0f2ed0b..c343c69bb5b4 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -231,10 +231,10 @@ static inline void tag_clear_highpage(struct page *page)
>   * If we pass in a base or tail page, we can zero up to PAGE_SIZE.
>   * If we pass in a head page, we can zero up to the size of the compound page.
>   */
> -#if defined(CONFIG_HIGHMEM) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
> +#ifdef CONFIG_HIGHMEM
>  void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
>  		unsigned start2, unsigned end2);
> -#else /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
> +#else
>  static inline void zero_user_segments(struct page *page,
>  		unsigned start1, unsigned end1,
>  		unsigned start2, unsigned end2)
> @@ -254,7 +254,7 @@ static inline void zero_user_segments(struct page *page,
>  	for (i = 0; i < compound_nr(page); i++)
>  		flush_dcache_page(page + i);
>  }
> -#endif /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
> +#endif
>  
>  static inline void zero_user_segment(struct page *page,
>  	unsigned start, unsigned end)
> @@ -364,4 +364,42 @@ static inline void memzero_page(struct page *page, size_t offset, size_t len)
>  	kunmap_local(addr);
>  }
>  
> +/**
> + * folio_zero_segments() - Zero two byte ranges in a folio.
> + * @folio: The folio to write to.
> + * @start1: The first byte to zero.
> + * @end1: One more than the last byte in the first range.
> + * @start2: The first byte to zero in the second range.
> + * @end2: One more than the last byte in the second range.
> + */
> +static inline void folio_zero_segments(struct folio *folio,
> +		size_t start1, size_t end1, size_t start2, size_t end2)
> +{
> +	zero_user_segments(&folio->page, start1, end1, start2, end2);
> +}
> +
> +/**
> + * folio_zero_segment() - Zero a byte range in a folio.
> + * @folio: The folio to write to.
> + * @start: The first byte to zero.
> + * @end: One more than the last byte in the first range.
> + */
> +static inline void folio_zero_segment(struct folio *folio,
> +		size_t start, size_t end)
> +{
> +	zero_user_segments(&folio->page, start, end, 0, 0);
> +}
> +
> +/**
> + * folio_zero_range() - Zero a byte range in a folio.
> + * @folio: The folio to write to.
> + * @start: The first byte to zero.
> + * @length: The number of bytes to zero.
> + */
> +static inline void folio_zero_range(struct folio *folio,
> +		size_t start, size_t length)
> +{
> +	zero_user_segments(&folio->page, start, start + length, 0, 0);

At first I thought "Gee, this is wrong, end should be start+length-1!"

Then I looked at zero_user_segments and realized that despite the
parameter name "endi1", it really wants you to tell it the next byte.
Not the end byte of the range you want to zero.

Then I looked at the other two new functions and saw that you documented
this, and now I get why Linus ranted about this some time ago.

The code looks right, but the "end" names rankle me.  Can we please
change them all?  Or at least in the new functions, if you all already
fought a flamewar over this that I'm not aware of?

Almost-Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +}
> +
>  #endif /* _LINUX_HIGHMEM_H */
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 88f65f155845..819d41140e5b 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -359,7 +359,6 @@ void kunmap_high(struct page *page)
>  }
>  EXPORT_SYMBOL(kunmap_high);
>  
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
>  		unsigned start2, unsigned end2)
>  {
> @@ -416,7 +415,6 @@ void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
>  	BUG_ON((start1 | start2 | end1 | end2) != 0);
>  }
>  EXPORT_SYMBOL(zero_user_segments);
> -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  #endif /* CONFIG_HIGHMEM */
>  
>  #ifdef CONFIG_KMAP_LOCAL
> -- 
> 2.33.0
>

Matthew Wilcox Nov. 17, 2021, 2:07 p.m. UTC | #3

On Tue, Nov 16, 2021 at 08:45:27PM -0800, Darrick J. Wong wrote:
> > +/**
> > + * folio_zero_segment() - Zero a byte range in a folio.
> > + * @folio: The folio to write to.
> > + * @start: The first byte to zero.
> > + * @end: One more than the last byte in the first range.
> > + */
> > +static inline void folio_zero_segment(struct folio *folio,
> > +		size_t start, size_t end)
> > +{
> > +	zero_user_segments(&folio->page, start, end, 0, 0);
> > +}
> > +
> > +/**
> > + * folio_zero_range() - Zero a byte range in a folio.
> > + * @folio: The folio to write to.
> > + * @start: The first byte to zero.
> > + * @length: The number of bytes to zero.
> > + */
> > +static inline void folio_zero_range(struct folio *folio,
> > +		size_t start, size_t length)
> > +{
> > +	zero_user_segments(&folio->page, start, start + length, 0, 0);
> 
> At first I thought "Gee, this is wrong, end should be start+length-1!"
> 
> Then I looked at zero_user_segments and realized that despite the
> parameter name "endi1", it really wants you to tell it the next byte.
> Not the end byte of the range you want to zero.
> 
> Then I looked at the other two new functions and saw that you documented
> this, and now I get why Linus ranted about this some time ago.
> 
> The code looks right, but the "end" names rankle me.  Can we please
> change them all?  Or at least in the new functions, if you all already
> fought a flamewar over this that I'm not aware of?

Change them to what?  I tend to use 'end' to mean 'excluded end' and
'max' to mean 'included end'.  What would you call the excluded end?

Darrick J. Wong Nov. 17, 2021, 5:07 p.m. UTC | #4

On Wed, Nov 17, 2021 at 02:07:00PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 16, 2021 at 08:45:27PM -0800, Darrick J. Wong wrote:
> > > +/**
> > > + * folio_zero_segment() - Zero a byte range in a folio.
> > > + * @folio: The folio to write to.
> > > + * @start: The first byte to zero.
> > > + * @end: One more than the last byte in the first range.
> > > + */
> > > +static inline void folio_zero_segment(struct folio *folio,
> > > +		size_t start, size_t end)
> > > +{
> > > +	zero_user_segments(&folio->page, start, end, 0, 0);
> > > +}
> > > +
> > > +/**
> > > + * folio_zero_range() - Zero a byte range in a folio.
> > > + * @folio: The folio to write to.
> > > + * @start: The first byte to zero.
> > > + * @length: The number of bytes to zero.
> > > + */
> > > +static inline void folio_zero_range(struct folio *folio,
> > > +		size_t start, size_t length)
> > > +{
> > > +	zero_user_segments(&folio->page, start, start + length, 0, 0);
> > 
> > At first I thought "Gee, this is wrong, end should be start+length-1!"
> > 
> > Then I looked at zero_user_segments and realized that despite the
> > parameter name "endi1", it really wants you to tell it the next byte.
> > Not the end byte of the range you want to zero.
> > 
> > Then I looked at the other two new functions and saw that you documented
> > this, and now I get why Linus ranted about this some time ago.
> > 
> > The code looks right, but the "end" names rankle me.  Can we please
> > change them all?  Or at least in the new functions, if you all already
> > fought a flamewar over this that I'm not aware of?
> 
> Change them to what?  I tend to use 'end' to mean 'excluded end' and
> 'max' to mean 'included end'.  What would you call the excluded end?

I've started using 'next', or changing the code to make 'end' be the
last element in the range the caller wants to act upon.  The thing is,
those are all iterators, so 'next' fits, whereas it doesn't fit so well
for range zeroing where that might have been all the zeroing we wanted
to do.

Though.  'xend' (shorthand for 'excluded end') is different enough to
signal that the reader should pay attention.  Ok, how about xend then?

--D

Matthew Wilcox Nov. 18, 2021, 3:55 p.m. UTC | #5

On Wed, Nov 17, 2021 at 09:07:07AM -0800, Darrick J. Wong wrote:
> I've started using 'next', or changing the code to make 'end' be the
> last element in the range the caller wants to act upon.  The thing is,
> those are all iterators, so 'next' fits, whereas it doesn't fit so well
> for range zeroing where that might have been all the zeroing we wanted
> to do.

Yeah, it doesn't really work so well for one of the patches in this
series:

                        if (buffer_new(bh)) {
...
                                        folio_zero_segments(folio,
                                                to, block_end,
                                                block_start, from);

("zero between block_start and block_end, except for the region
specified by 'from' and 'to'").  Except that for some reason the
ranges are specified backwards, so it's not obvious what's going on.
Converting that to folio_zero_ranges() would be a possibility, at the
expense of complexity in the caller, or using 'max' instead of 'end'
would also add complexity to the callers.

> Though.  'xend' (shorthand for 'excluded end') is different enough to
> signal that the reader should pay attention.  Ok, how about xend then?

Done!

@@ -367,26 +367,26 @@ static inline void memzero_page(struct page *page, size_t
offset, size_t len)
  * folio_zero_segments() - Zero two byte ranges in a folio.
  * @folio: The folio to write to.
  * @start1: The first byte to zero.
- * @end1: One more than the last byte in the first range.
+ * @xend1: One more than the last byte in the first range.
  * @start2: The first byte to zero in the second range.
- * @end2: One more than the last byte in the second range.
+ * @xend2: One more than the last byte in the second range.
  */
 static inline void folio_zero_segments(struct folio *folio,
-               size_t start1, size_t end1, size_t start2, size_t end2)
+               size_t start1, size_t xend1, size_t start2, size_t xend2)
 {
-       zero_user_segments(&folio->page, start1, end1, start2, end2);
+       zero_user_segments(&folio->page, start1, xend1, start2, xend2);
 }

 /**
  * folio_zero_segment() - Zero a byte range in a folio.
  * @folio: The folio to write to.
  * @start: The first byte to zero.
- * @end: One more than the last byte in the first range.
+ * @xend: One more than the last byte to zero.
  */
 static inline void folio_zero_segment(struct folio *folio,
-               size_t start, size_t end)
+               size_t start, size_t xend)
 {
-       zero_user_segments(&folio->page, start, end, 0, 0);
+       zero_user_segments(&folio->page, start, xend, 0, 0);
 }

 /**

Darrick J. Wong Nov. 18, 2021, 5:26 p.m. UTC | #6

On Thu, Nov 18, 2021 at 03:55:12PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 17, 2021 at 09:07:07AM -0800, Darrick J. Wong wrote:
> > I've started using 'next', or changing the code to make 'end' be the
> > last element in the range the caller wants to act upon.  The thing is,
> > those are all iterators, so 'next' fits, whereas it doesn't fit so well
> > for range zeroing where that might have been all the zeroing we wanted
> > to do.
> 
> Yeah, it doesn't really work so well for one of the patches in this
> series:
> 
>                         if (buffer_new(bh)) {
> ...
>                                         folio_zero_segments(folio,
>                                                 to, block_end,
>                                                 block_start, from);
> 
> ("zero between block_start and block_end, except for the region
> specified by 'from' and 'to'").  Except that for some reason the
> ranges are specified backwards, so it's not obvious what's going on.
> Converting that to folio_zero_ranges() would be a possibility, at the
> expense of complexity in the caller, or using 'max' instead of 'end'
> would also add complexity to the callers.

The call above looks like it is preparing to copy some data into the
middle of a buffer by zero-initializing the bytes before and the bytes
after that middle region.

Admittedly my fs-addled brain actually finds this hot mess easier to
understand:

folio_zero_segments(folio, to, blocksize - 1, block_start, from - 1);

but I suppose the xend method involves less subtraction everywhere.

> 
> > Though.  'xend' (shorthand for 'excluded end') is different enough to
> > signal that the reader should pay attention.  Ok, how about xend then?
> 
> Done!
> 
> @@ -367,26 +367,26 @@ static inline void memzero_page(struct page *page, size_t
> offset, size_t len)
>   * folio_zero_segments() - Zero two byte ranges in a folio.
>   * @folio: The folio to write to.
>   * @start1: The first byte to zero.
> - * @end1: One more than the last byte in the first range.
> + * @xend1: One more than the last byte in the first range.
>   * @start2: The first byte to zero in the second range.
> - * @end2: One more than the last byte in the second range.
> + * @xend2: One more than the last byte in the second range.
>   */
>  static inline void folio_zero_segments(struct folio *folio,
> -               size_t start1, size_t end1, size_t start2, size_t end2)
> +               size_t start1, size_t xend1, size_t start2, size_t xend2)
>  {
> -       zero_user_segments(&folio->page, start1, end1, start2, end2);
> +       zero_user_segments(&folio->page, start1, xend1, start2, xend2);
>  }
> 
>  /**
>   * folio_zero_segment() - Zero a byte range in a folio.
>   * @folio: The folio to write to.
>   * @start: The first byte to zero.
> - * @end: One more than the last byte in the first range.
> + * @xend: One more than the last byte to zero.
>   */
>  static inline void folio_zero_segment(struct folio *folio,
> -               size_t start, size_t end)
> +               size_t start, size_t xend)
>  {
> -       zero_user_segments(&folio->page, start, end, 0, 0);
> +       zero_user_segments(&folio->page, start, xend, 0, 0);

Works for me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  }
> 
>  /**
>

Matthew Wilcox Nov. 18, 2021, 8:08 p.m. UTC | #7

On Thu, Nov 18, 2021 at 09:26:15AM -0800, Darrick J. Wong wrote:
> On Thu, Nov 18, 2021 at 03:55:12PM +0000, Matthew Wilcox wrote:
> >                         if (buffer_new(bh)) {
> > ...
> >                                         folio_zero_segments(folio,
> >                                                 to, block_end,
> >                                                 block_start, from);
> > 
> > ("zero between block_start and block_end, except for the region
> > specified by 'from' and 'to'").  Except that for some reason the
> > ranges are specified backwards, so it's not obvious what's going on.
> > Converting that to folio_zero_ranges() would be a possibility, at the
> > expense of complexity in the caller, or using 'max' instead of 'end'
> > would also add complexity to the callers.
> 
> The call above looks like it is preparing to copy some data into the
> middle of a buffer by zero-initializing the bytes before and the bytes
> after that middle region.
> 
> Admittedly my fs-addled brain actually finds this hot mess easier to
> understand:
> 
> folio_zero_segments(folio, to, blocksize - 1, block_start, from - 1);
> 
> but I suppose the xend method involves less subtraction everywhere.

That's exactly what it's doing.  It's kind of funny because it's an
abstraction that permits a micro-optimisation (removing potentially one
kmap() call), but removes the opportunity for a larger optimisation
(removing several, and also removing calls to flush_dcache_folio).
That is, we could rewrite __block_write_begin_int() as:

static void *kremap_folio(void *kaddr, struct folio *folio)
{
	if (kaddr)
		return kaddr;
	/* buffer heads only support single page folios */
	return kmap_local_folio(folio, 0);
}

+       void *kaddr = NULL;
...
-                               if (block_end > to || block_start < from)
-                                       folio_zero_segments(folio,
-                                               to, block_end,
-                                               block_start, from);
+                               if (from > block_start) {
+                                       kaddr = kremap_folio(kaddr, folio);
+                                       memset(kaddr + block_start, 0,
+                                               block_start - from);
+                               }
+                               if (block_end > to) {
+                                       kaddr = kremap_folio(kaddr, folio);
+                                       memset(kaddr + to, 0, block_end - to);
+                               }
...
        }
+       if (kaddr) {
+               kunmap_local(kaddr);
+               flush_dcache_folio(folio);
+       }

That way if there are multiple unmapped+new buffers, we only kmap/kunmap
once per page.  I don't care to submit this as a patch though ... buffer
heads just need to go away.  iomap can't use an optimisation like this;
it already reports all the contiguous unmapped blocks as a single extent,
and if you have multiple unmapped extents per page, well ... I'm sorry
for you, but the overhead of kmap/kunmap is the least of your problems.

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks.  Pushed to
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next

I'll give that until Monday to soak and send a pull request.

[v2,02/28] mm: Add functions to zero portions of a folio

Commit Message

Comments

Patch