[v2,19/28] iomap: Convert __iomap_zero_iter to use a folio

Message ID	20211108040551.1942823-20-willy@infradead.org (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: "Darrick J . Wong " <djwong@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org> Subject: [PATCH v2 19/28] iomap: Convert __iomap_zero_iter to use a folio Date: Mon, 8 Nov 2021 04:05:42 +0000 Message-Id: <20211108040551.1942823-20-willy@infradead.org> In-Reply-To: <20211108040551.1942823-1-willy@infradead.org> References: <20211108040551.1942823-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	iomap/xfs folio patches \| expand [v2,00/28] iomap/xfs folio patches [v2,01/28] csky,sparc: Declare flush_dcache_folio() [v2,02/28] mm: Add functions to zero portions of a folio [v2,03/28] fs: Remove FS_THP_SUPPORT [v2,04/28] fs: Rename AS_THP_SUPPORT and mapping_thp_support [v2,05/28] block: Add bio_add_folio() [v2,06/28] block: Add bio_for_each_folio_all() [v2,07/28] fs/buffer: Convert __block_write_begin_int() to take a folio [v2,08/28] iomap: Convert to_iomap_page to take a folio [v2,09/28] iomap: Convert iomap_page_create to take a folio [v2,10/28] iomap: Convert iomap_page_release to take a folio [v2,11/28] iomap: Convert iomap_releasepage to use a folio [v2,12/28] iomap: Add iomap_invalidate_folio [v2,13/28] iomap: Pass the iomap_page into iomap_set_range_uptodate [v2,14/28] iomap: Convert bio completions to use folios [v2,15/28] iomap: Use folio offsets instead of page offsets [v2,16/28] iomap: Convert iomap_read_inline_data to take a folio [v2,17/28] iomap: Convert readahead and readpage to use a folio [v2,18/28] iomap: Convert iomap_page_mkwrite to use a folio [v2,19/28] iomap: Convert __iomap_zero_iter to use a folio [v2,20/28] iomap: Convert iomap_write_begin() and iomap_write_end() to folios [v2,21/28] iomap: Convert iomap_write_end_inline to take a folio [v2,22/28] iomap,xfs: Convert ->discard_page to ->discard_folio [v2,23/28] iomap: Simplify iomap_writepage_map() [v2,24/28] iomap: Simplify iomap_do_writepage() [v2,25/28] iomap: Convert iomap_add_to_ioend() to take a folio [v2,26/28] iomap: Convert iomap_migrate_page() to use folios [v2,27/28] iomap: Support multi-page folios in invalidatepage [v2,28/28] xfs: Support multi-page folios

Matthew Wilcox Nov. 8, 2021, 4:05 a.m. UTC

The zero iterator can work in folio-sized chunks instead of page-sized
chunks.  This will save a lot of page cache lookups if the file is cached
in multi-page folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Christoph Hellwig Nov. 9, 2021, 8:47 a.m. UTC | #1

On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> The zero iterator can work in folio-sized chunks instead of page-sized
> chunks.  This will save a lot of page cache lookups if the file is cached
> in multi-page folios.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

but it will clash with my just sent series that decouples DAX
zeroing from buffered I/O zeroing and folds __iomap_zero_iter into
the caller.

Darrick J. Wong Nov. 17, 2021, 2:24 a.m. UTC | #2

On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> The zero iterator can work in folio-sized chunks instead of page-sized
> chunks.  This will save a lot of page cache lookups if the file is cached
> in multi-page folios.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

hch's dax decoupling series notwithstanding,

Though TBH I am kinda wondering how the two of you plan to resolve those
kinds of differences -- I haven't looked at that series, though I think
this one's been waiting in the wings for longer?

Heck, I wonder how Matthew plans to merge all this given that it touches
mm, fs, block, and iomap...?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 64e54981b651..9c61d12028ca 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  {
> +	struct folio *folio;
>  	struct page *page;
>  	int status;
> -	unsigned offset = offset_in_page(pos);
> -	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> +	size_t offset, bytes;
>  
> -	status = iomap_write_begin(iter, pos, bytes, &page);
> +	status = iomap_write_begin(iter, pos, length, &page);
>  	if (status)
>  		return status;
> +	folio = page_folio(page);
>  
> -	zero_user(page, offset, bytes);
> -	mark_page_accessed(page);
> +	offset = offset_in_folio(folio, pos);
> +	bytes = min_t(u64, folio_size(folio) - offset, length);
> +	folio_zero_range(folio, offset, bytes);
> +	folio_mark_accessed(folio);
>  
>  	return iomap_write_end(iter, pos, bytes, bytes, page);
>  }
> -- 
> 2.33.0
>

Matthew Wilcox Nov. 17, 2021, 2:20 p.m. UTC | #3

On Tue, Nov 16, 2021 at 06:24:24PM -0800, Darrick J. Wong wrote:
> On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> > The zero iterator can work in folio-sized chunks instead of page-sized
> > chunks.  This will save a lot of page cache lookups if the file is cached
> > in multi-page folios.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> hch's dax decoupling series notwithstanding,
> 
> Though TBH I am kinda wondering how the two of you plan to resolve those
> kinds of differences -- I haven't looked at that series, though I think
> this one's been waiting in the wings for longer?

I haven't looked at that series either

> Heck, I wonder how Matthew plans to merge all this given that it touches
> mm, fs, block, and iomap...?

I'm planning on sending a pull request to Linus on Monday for the first
few patches in this series:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next

Then I was hoping you'd take the block + fs/buffer + iomap pieces for
the next merge window.

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!  Going through and collecting all these now ...

Matthew Wilcox Dec. 9, 2021, 9:38 p.m. UTC | #4

On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> +++ b/fs/iomap/buffered-io.c
> @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  {
> +	struct folio *folio;
>  	struct page *page;
>  	int status;
> -	unsigned offset = offset_in_page(pos);
> -	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> +	size_t offset, bytes;
>  
> -	status = iomap_write_begin(iter, pos, bytes, &page);
> +	status = iomap_write_begin(iter, pos, length, &page);

This turned out to be buggy.  Darrick and I figured out why his tests
were failing and mine weren't; this only shows up with a 4kB block
size filesystem and I was only testing with 1kB block size filesystems.
(at least on x86; I haven't figured out why it passes with 1kB block size
filesystems, so I'm not sure what would be true on other filesystems).
iomap_write_begin() is not prepared to deal with a length that spans a
page boundary.  So I'm replacing this patch with the following patches
(whitespace damaged; pick them up from
https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c
if you want to compile them):

commit 412212960b72
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Dec 9 15:47:44 2021 -0500

    iomap: Allow iomap_write_begin() to be called with the full length

    In the future, we want write_begin to know the entire length of the
    write so that it can choose to allocate large folios.  Pass the full
    length in from __iomap_zero_iter() and limit it where necessary.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index d67108489148..9270db17c435 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
        struct gfs2_sbd *sdp = GFS2_SB(inode);
        unsigned int blocks;

+       /* gfs2 does not support large folios yet */
+       if (len > PAGE_SIZE)
+               len = PAGE_SIZE;
        blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
        return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8d7a67655b60..67fcd3b9928d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
                goto out_no_page;
        }
        folio = page_folio(page);
+       if (pos + len > folio_pos(folio) + folio_size(folio))
+               len = folio_pos(folio) + folio_size(folio) - pos;

        if (srcmap->type == IOMAP_INLINE)
                status = iomap_write_begin_inline(iter, page);
@@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff
_t pos, u64 length)
        struct page *page;
        int status;
        unsigned offset = offset_in_page(pos);
-       unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);

-       status = iomap_write_begin(iter, pos, bytes, &page);
+       if (length > UINT_MAX)
+               length = UINT_MAX;
+       status = iomap_write_begin(iter, pos, length, &page);
        if (status)
                return status;
+       if (length > PAGE_SIZE - offset)
+               length = PAGE_SIZE - offset;

-       zero_user(page, offset, bytes);
+       zero_user(page, offset, length);
        mark_page_accessed(page);

-       return iomap_write_end(iter, pos, bytes, bytes, page);
+       return iomap_write_end(iter, pos, length, length, page);
 }

 static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)


commit 78c747a1b3a1
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Nov 5 14:24:09 2021 -0400

    iomap: Convert __iomap_zero_iter to use a folio
    
    The zero iterator can work in folio-sized chunks instead of page-sized
    chunks.  This will save a lot of page cache lookups if the file is cached
    in large folios.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 67fcd3b9928d..bbde6d4f27cd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 {
+       struct folio *folio;
        struct page *page;
        int status;
-       unsigned offset = offset_in_page(pos);
+       size_t offset;
 
        if (length > UINT_MAX)
                length = UINT_MAX;
        status = iomap_write_begin(iter, pos, length, &page);
        if (status)
                return status;
-       if (length > PAGE_SIZE - offset)
-               length = PAGE_SIZE - offset;
+       folio = page_folio(page);
 
-       zero_user(page, offset, length);
-       mark_page_accessed(page);
+       offset = offset_in_folio(folio, pos);
+       if (length > folio_size(folio) - offset)
+               length = folio_size(folio) - offset;
+       folio_zero_range(folio, offset, length);
+       folio_mark_accessed(folio);
 
        return iomap_write_end(iter, pos, length, length, page);
 }


The xfstests that Darrick identified as failing all passed.  Running a
full sweep now; then I'll re-run with a 1kB filesystem to be sure that
still passes.  Then I'll send another pull request.

Matthew Wilcox Dec. 10, 2021, 4:19 p.m. UTC | #5

On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote:
> @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff
> _t pos, u64 length)
>         struct page *page;
>         int status;
>         unsigned offset = offset_in_page(pos);
> -       unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> 
> -       status = iomap_write_begin(iter, pos, bytes, &page);
> +       if (length > UINT_MAX)
> +               length = UINT_MAX;
> +       status = iomap_write_begin(iter, pos, length, &page);
>         if (status)
>                 return status;
> +       if (length > PAGE_SIZE - offset)
> +               length = PAGE_SIZE - offset;
> 
> -       zero_user(page, offset, bytes);
> +       zero_user(page, offset, length);
>         mark_page_accessed(page);
> 
> -       return iomap_write_end(iter, pos, bytes, bytes, page);
> +       return iomap_write_end(iter, pos, length, length, page);
>  }

After attempting the merge with Christoph's ill-timed refactoring,
I decided that eliding the use of 'bytes' here was the wrong approach,
because it very much needs to be put back in for the merge.

Here's the merge as I have it:

diff --cc fs/iomap/buffered-io.c
index f3176cf90351,d1aa0f0e7fd5..40356db3e856
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@@ -888,19 -926,12 +904,23 @@@ static loff_t iomap_zero_iter(struct io
                return length;

        do {
-               unsigned offset = offset_in_page(pos);
-               size_t bytes = min_t(u64, PAGE_SIZE - offset, length);
-               struct page *page;
 -              s64 bytes;
++              struct folio *folio;
 +              int status;
++              size_t offset;
++              size_t bytes = min_t(u64, SIZE_MAX, length);
 +
-               status = iomap_write_begin(iter, pos, bytes, &page);
++              status = iomap_write_begin(iter, pos, bytes, &folio);
 +              if (status)
 +                      return status;
 +
-               zero_user(page, offset, bytes);
-               mark_page_accessed(page);
++              offset = offset_in_folio(folio, pos);
++              if (bytes > folio_size(folio) - offset)
++                      bytes = folio_size(folio) - offset;
++
++              folio_zero_range(folio, offset, bytes);
++              folio_mark_accessed(folio);

-               bytes = iomap_write_end(iter, pos, bytes, bytes, page);
 -              if (IS_DAX(iter->inode))
 -                      bytes = dax_iomap_zero(pos, length, iomap);
 -              else
 -                      bytes = __iomap_zero_iter(iter, pos, length);
++              bytes = iomap_write_end(iter, pos, bytes, bytes, folio);
                if (bytes < 0)
                        return bytes;

I've pushed out a new tag:

https://git.infradead.org/users/willy/linux.git/shortlog/refs/tags/iomap-folio-5.17d

Christoph Hellwig Dec. 13, 2021, 7:34 a.m. UTC | #6

On Fri, Dec 10, 2021 at 04:19:54PM +0000, Matthew Wilcox wrote:
> After attempting the merge with Christoph's ill-timed refactoring,

I did give you a headsup before..

> I decided that eliding the use of 'bytes' here was the wrong approach,
> because it very much needs to be put back in for the merge.

Is there any good reason to not just delay the iomp_zero_iter folio
conversion for now?

Matthew Wilcox Dec. 13, 2021, 6:08 p.m. UTC | #7

On Sun, Dec 12, 2021 at 11:34:54PM -0800, Christoph Hellwig wrote:
> On Fri, Dec 10, 2021 at 04:19:54PM +0000, Matthew Wilcox wrote:
> > After attempting the merge with Christoph's ill-timed refactoring,
> 
> I did give you a headsup before..

I thought that was going in via Darrick's tree.  I had no idea Dan was
going to take it.

> > I decided that eliding the use of 'bytes' here was the wrong approach,
> > because it very much needs to be put back in for the merge.
> 
> Is there any good reason to not just delay the iomp_zero_iter folio
> conversion for now?

It would hold up about half of the iomap folio conversion (~10 patches).
I don't understand what the benefit is of your patch series.  Moving
filesystems away from being bdev based just doesn't seem interesting
to me.  Having DAX as an optional feature that some bdevs have seems
like a far superior option.

Darrick J. Wong Dec. 16, 2021, 7:36 p.m. UTC | #8

On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote:
> On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> > +++ b/fs/iomap/buffered-io.c
> > @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
> >  
> >  static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
> >  {
> > +	struct folio *folio;
> >  	struct page *page;
> >  	int status;
> > -	unsigned offset = offset_in_page(pos);
> > -	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> > +	size_t offset, bytes;
> >  
> > -	status = iomap_write_begin(iter, pos, bytes, &page);
> > +	status = iomap_write_begin(iter, pos, length, &page);
> 
> This turned out to be buggy.  Darrick and I figured out why his tests
> were failing and mine weren't; this only shows up with a 4kB block
> size filesystem and I was only testing with 1kB block size filesystems.
> (at least on x86; I haven't figured out why it passes with 1kB block size
> filesystems, so I'm not sure what would be true on other filesystems).
> iomap_write_begin() is not prepared to deal with a length that spans a
> page boundary.  So I'm replacing this patch with the following patches
> (whitespace damaged; pick them up from
> https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c
> if you want to compile them):
> 
> commit 412212960b72
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Thu Dec 9 15:47:44 2021 -0500
> 
>     iomap: Allow iomap_write_begin() to be called with the full length
> 
>     In the future, we want write_begin to know the entire length of the
>     write so that it can choose to allocate large folios.  Pass the full
>     length in from __iomap_zero_iter() and limit it where necessary.
> 
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index d67108489148..9270db17c435 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
>         struct gfs2_sbd *sdp = GFS2_SB(inode);
>         unsigned int blocks;
> 
> +       /* gfs2 does not support large folios yet */
> +       if (len > PAGE_SIZE)
> +               len = PAGE_SIZE;

This is awkward -- gfs2 doesn't set the mapping flag to indicate that it
supports large folios, so it should never be asked to deal with more
than a page at a time.  Shouldn't iomap_write_begin clamp its len
argument to PAGE_SIZE at the start if the mapping doesn't have the large
folios flag set?

--D

>         blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
>         return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
>  }
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8d7a67655b60..67fcd3b9928d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>                 goto out_no_page;
>         }
>         folio = page_folio(page);
> +       if (pos + len > folio_pos(folio) + folio_size(folio))
> +               len = folio_pos(folio) + folio_size(folio) - pos;
> 
>         if (srcmap->type == IOMAP_INLINE)
>                 status = iomap_write_begin_inline(iter, page);
> @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff
> _t pos, u64 length)
>         struct page *page;
>         int status;
>         unsigned offset = offset_in_page(pos);
> -       unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> 
> -       status = iomap_write_begin(iter, pos, bytes, &page);
> +       if (length > UINT_MAX)
> +               length = UINT_MAX;
> +       status = iomap_write_begin(iter, pos, length, &page);
>         if (status)
>                 return status;
> +       if (length > PAGE_SIZE - offset)
> +               length = PAGE_SIZE - offset;
> 
> -       zero_user(page, offset, bytes);
> +       zero_user(page, offset, length);
>         mark_page_accessed(page);
> 
> -       return iomap_write_end(iter, pos, bytes, bytes, page);
> +       return iomap_write_end(iter, pos, length, length, page);
>  }
> 
>  static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> 
> 
> commit 78c747a1b3a1
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Fri Nov 5 14:24:09 2021 -0400
> 
>     iomap: Convert __iomap_zero_iter to use a folio
>     
>     The zero iterator can work in folio-sized chunks instead of page-sized
>     chunks.  This will save a lot of page cache lookups if the file is cached
>     in large folios.
>     
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 67fcd3b9928d..bbde6d4f27cd 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  {
> +       struct folio *folio;
>         struct page *page;
>         int status;
> -       unsigned offset = offset_in_page(pos);
> +       size_t offset;
>  
>         if (length > UINT_MAX)
>                 length = UINT_MAX;
>         status = iomap_write_begin(iter, pos, length, &page);
>         if (status)
>                 return status;
> -       if (length > PAGE_SIZE - offset)
> -               length = PAGE_SIZE - offset;
> +       folio = page_folio(page);
>  
> -       zero_user(page, offset, length);
> -       mark_page_accessed(page);
> +       offset = offset_in_folio(folio, pos);
> +       if (length > folio_size(folio) - offset)
> +               length = folio_size(folio) - offset;
> +       folio_zero_range(folio, offset, length);
> +       folio_mark_accessed(folio);
>  
>         return iomap_write_end(iter, pos, length, length, page);
>  }
> 
> 
> The xfstests that Darrick identified as failing all passed.  Running a
> full sweep now; then I'll re-run with a 1kB filesystem to be sure that
> still passes.  Then I'll send another pull request.

Matthew Wilcox Dec. 16, 2021, 8:43 p.m. UTC | #9

On Thu, Dec 16, 2021 at 11:36:14AM -0800, Darrick J. Wong wrote:
> > 
> > +       /* gfs2 does not support large folios yet */
> > +       if (len > PAGE_SIZE)
> > +               len = PAGE_SIZE;
> 
> This is awkward -- gfs2 doesn't set the mapping flag to indicate that it
> supports large folios, so it should never be asked to deal with more
> than a page at a time.  Shouldn't iomap_write_begin clamp its len
> argument to PAGE_SIZE at the start if the mapping doesn't have the large
> folios flag set?

You're right, this is awkward.  And it's a bit of a beartrap for
another filesystem that wants to implement ->prepare_page in the
future.

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 9270db17c435..d67108489148 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -968,9 +968,6 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
        struct gfs2_sbd *sdp = GFS2_SB(inode);
        unsigned int blocks;

-       /* gfs2 does not support large folios yet */
-       if (len > PAGE_SIZE)
-               len = PAGE_SIZE;
        blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
        return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1a9e897ee25a..b1ded5204d1c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -619,6 +619,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
        if (fatal_signal_pending(current))
                return -EINTR;
 
+       if (!mapping_large_folio_support(iter->inode->i_mapping))
+               len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
+
        if (page_ops && page_ops->page_prepare) {
                status = page_ops->page_prepare(iter->inode, pos, len);
                if (status)

[v2,19/28] iomap: Convert __iomap_zero_iter to use a folio

Commit Message

Comments

Patch