mbox series

[v6,00/15] btrfs: add data write support for subpage

Message ID 20210705020110.89358-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: add data write support for subpage | expand

Message

Qu Wenruo July 5, 2021, 2 a.m. UTC
This much smaller patchset can be fetched from github:
https://github.com/adam900710/linux/tree/subpage

And thanks him again for fixing up the small typos and style problem in
my old patches. Almost no patch is no untouched, really appreciate the
effort.

These patchset is targeted at v5.15 merge window.
There are 11 patches pending for a while, and not touched, thus they
should be pretty stable and safe.

While there are 4 new patches, two of them are straightforward small
fixes, the remaining 2 are a little scary as they reworked the core code
of compression.

But the rework should improve the readabilty thus make reviewing a
little easier (as least I hope so).

=== Current stage ===
The tests on x86 pass without new failure, and generic test group on
arm64 with 64K page size passes except known failure and defrag group.

For btrfs test group, all pass except compression/raid56/defrag.

For anyone who is interested in testing, please use btrfs-progs v5.12 to
avoid false alert at mkfs time.

=== Limitation ===
There are several limitations introduced just for subpage:
- No compressed write support
  Read is no problem, but compression write path has more things left to
  be modified.

  I'm already working on that, the status is:
  * Split compressed bio submission
    Patchset submitted, since it's also cleaning up several BUG_ON()s, it
    has a better chance to get merged.
    But I'm not in a hurry to push this part into v5.14, especially
    not before the initial subpage enablement.

  * Fix up extent_clear_unlock_delalloc() to avoid use subpage unlock
    for pages not locked by subpage helpers
    WIP

  * Testing

- No inline extent will be created
  This is mostly due to the fact that filemap_fdatawrite_range() will
  trigger more write than the range specified.
  In fallocate calls, this behavior can make us to writeback which can
  be inlined, before we enlarge the isize, causing inline extent being
  created along with regular extents.

  In fact, even on x86_64, we can still have fsstress to create inodes
  with mixed inline and regular extents.
  Thus there is a much bigger problem to address.

- No support for RAID56
  There are still too many hardcoded PAGE_SIZE in raid56 code.
  Considering it's already considered unsafe due to its write-hole
  problem, disabling RAID56 for subpage looks sane to me.

- No defrag support for subpage
  The support for subpage defrag has already an initial version
  submitted to the mail list.
  Thus the correct support won't be included in this patchset.

  Currently I'm not pushing defrag patchset, as it's really not
  the priority, and still has rare bugs related to EXTENT_DELALLOC_NEW
  extent bit.

=== Patchset structure ===
Patch 01~04:	Subpage fixes for compression read path
Patch 05~06:	Support for subpage relocation
Patch 07~14:	Subpage specific fixes and extra limitations
Patch 15:	Enable subpage support

=== Changelog ===
v2:
- Rebased to latest misc-next
  Now metadata write patches are removed from the series, as they are
  already merged into misc-next.

- Added new Reviewed-by/Tested-by/Reported-by tags

- Use separate endio functions to subpage metadata write path

- Re-order the patches, to make refactors at the top of the series
  One refactor, the submit_extent_page() one, should benefit 4K page
  size more than 64K page size, thus it's worthy to be merged early

- New bug fixes exposed by Ritesh Harjani on Power

- Reject RAID56 completely
  Exposed by btrfs test group, which caused BUG_ON() for various sites.
  Considering RAID56 is already not considered safe, it's better to
  reject them completely for now.

- Fix subpage scrub repair failure
  Caused by hardcoded PAGE_SIZE

- Fix free space cache inode size
  Same cause as scrub repair failure

v3:
- Rebased to remove write path prepration patches

- Properly enable btrfs defrag
  Previsouly, btrfs defrag is in fact just disabled.
  This makes tons of tests in btrfs/defrag to fail.

- More bug fixes for rare race/crashes
  * Fix relocation false alert on csum mismatch
  * Fix relocation data corruption
  * Fix a rare case of false ASSERT()
    The fix already get merged into the prepration patches, thus no
    longer in this patchset though.
  
  Mostly reported by Ritesh from IBM.

v4:
- Disable subpage defrag completely
  As full page defrag can race with fsstress in btrfs/062, causing
  strange ordered extent bugs.
  The full subpage defrag will be submitted as an indepdent patchset.

v5:
- Rebased to latest misc-next branch

v6:
- Rebased to latest misc-next branch
  The 11 existing patches have no conflicts at all.

- Added four patches related to compression read path
  This involves:
  * One small fix for extent map grabbing
  * One preparation to remove GFP_HIGHMEM
    kmap()/kunmap() is not removed yet, as it's only for later
    subpage related decompression path rework.
  * Rework btrfs_decompress_buf2page() and lzo_decompress_bio()
    btrfs_decompress_buf2page() handles the copying of decompressed data
    to inode pages, without proper subpage handling, we can copy
    decompressed data to wrong location
    lzo_decompress_bio() needs a sectorsize related fix to handle
    padding zeros.
    Since we're here, I just reworked the code to make more rooms for
    proper comments.

    These two rework looks scary, and touches the core functions of
    compression, thus Josef gave me extra tests runs on them and no
    regression found.

    But still they definitely deserve more review.

Qu Wenruo (15):
  btrfs: grab correct extent map for subpage compressed extent read
  btrfs: remove the GFP_HIGHMEM flag for compression code
  btrfs: rework btrfs_decompress_buf2page()
  btrfs: rework lzo_decompress_bio() to make it subpage compatible
  btrfs: extract relocation page read and dirty part into its own
    function
  btrfs: make relocate_one_page() to handle subpage case
  btrfs: fix wild subpage writeback which does not have ordered extent.
  btrfs: disable inline extent creation for subpage
  btrfs: allow submit_extent_page() to do bio split for subpage
  btrfs: reject raid5/6 fs for subpage
  btrfs: fix a crash caused by race between prepare_pages() and
    btrfs_releasepage()
  btrfs: fix a use-after-free bug in writeback subpage helper
  btrfs: fix a subpage false alert for relocating partial preallocated
    data extents
  btrfs: fix a subpage relocation data corruption
  btrfs: allow read-write for 4K sectorsize on 64K page size systems

 fs/btrfs/compression.c | 152 ++++++++++----------
 fs/btrfs/compression.h |   5 +-
 fs/btrfs/disk-io.c     |  13 +-
 fs/btrfs/extent_io.c   | 209 ++++++++++++++++++++--------
 fs/btrfs/file.c        |  13 +-
 fs/btrfs/inode.c       |  78 +++++++++--
 fs/btrfs/ioctl.c       |   6 +
 fs/btrfs/lzo.c         | 210 ++++++++++++----------------
 fs/btrfs/relocation.c  | 308 +++++++++++++++++++++++++++--------------
 fs/btrfs/subpage.c     |  20 ++-
 fs/btrfs/subpage.h     |   7 +
 fs/btrfs/super.c       |   7 -
 fs/btrfs/sysfs.c       |   5 +
 fs/btrfs/volumes.c     |   8 ++
 fs/btrfs/zlib.c        |  18 +--
 fs/btrfs/zstd.c        |  12 +-
 16 files changed, 661 insertions(+), 410 deletions(-)

Comments

Qu Wenruo July 7, 2021, 8:28 a.m. UTC | #1
On 2021/7/5 上午10:00, Qu Wenruo wrote:
> This much smaller patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> 
> And thanks him again for fixing up the small typos and style problem in
> my old patches. Almost no patch is no untouched, really appreciate the
> effort.
> 
> These patchset is targeted at v5.15 merge window.
> There are 11 patches pending for a while, and not touched, thus they
> should be pretty stable and safe.
> 
> While there are 4 new patches, two of them are straightforward small
> fixes, the remaining 2 are a little scary as they reworked the core code
> of compression.
> 
> But the rework should improve the readabilty thus make reviewing a
> little easier (as least I hope so).
> 
> === Current stage ===
> The tests on x86 pass without new failure, and generic test group on
> arm64 with 64K page size passes except known failure and defrag group.
> 
> For btrfs test group, all pass except compression/raid56/defrag.
> 
> For anyone who is interested in testing, please use btrfs-progs v5.12 to
> avoid false alert at mkfs time.
> 
> === Limitation ===
> There are several limitations introduced just for subpage:
> - No compressed write support
>    Read is no problem, but compression write path has more things left to
>    be modified.

Well, without proper testing just for compression read, it turns out 
compression read path has more bugs than I thought.

The latest bug exposed during subpage compression write support is, the 
add_rd_bio_pages().

Obviously it has tons of hard coded PAGE_SIZE, and can easily cause 
ASSERT() for readers accounting.
As it added locked page without proper subpage::readers account updated.

For this case, I'm already crafting a proper fix for that.

But on the other hand, I'm not sure what's the proper way to introduce a 
fix for v5.15 window.

Should I just disable readahead for compression read (which just needs 
two lines to return 0 for subpage case in add_ra_bio_pages())?

Or should I add the proper fix into the patchset?

Thanks,
Qu

> 
>    I'm already working on that, the status is:
>    * Split compressed bio submission
>      Patchset submitted, since it's also cleaning up several BUG_ON()s, it
>      has a better chance to get merged.
>      But I'm not in a hurry to push this part into v5.14, especially
>      not before the initial subpage enablement.
> 
>    * Fix up extent_clear_unlock_delalloc() to avoid use subpage unlock
>      for pages not locked by subpage helpers
>      WIP
> 
>    * Testing
> 
> - No inline extent will be created
>    This is mostly due to the fact that filemap_fdatawrite_range() will
>    trigger more write than the range specified.
>    In fallocate calls, this behavior can make us to writeback which can
>    be inlined, before we enlarge the isize, causing inline extent being
>    created along with regular extents.
> 
>    In fact, even on x86_64, we can still have fsstress to create inodes
>    with mixed inline and regular extents.
>    Thus there is a much bigger problem to address.
> 
> - No support for RAID56
>    There are still too many hardcoded PAGE_SIZE in raid56 code.
>    Considering it's already considered unsafe due to its write-hole
>    problem, disabling RAID56 for subpage looks sane to me.
> 
> - No defrag support for subpage
>    The support for subpage defrag has already an initial version
>    submitted to the mail list.
>    Thus the correct support won't be included in this patchset.
> 
>    Currently I'm not pushing defrag patchset, as it's really not
>    the priority, and still has rare bugs related to EXTENT_DELALLOC_NEW
>    extent bit.
> 
> === Patchset structure ===
> Patch 01~04:	Subpage fixes for compression read path
> Patch 05~06:	Support for subpage relocation
> Patch 07~14:	Subpage specific fixes and extra limitations
> Patch 15:	Enable subpage support
> 
> === Changelog ===
> v2:
> - Rebased to latest misc-next
>    Now metadata write patches are removed from the series, as they are
>    already merged into misc-next.
> 
> - Added new Reviewed-by/Tested-by/Reported-by tags
> 
> - Use separate endio functions to subpage metadata write path
> 
> - Re-order the patches, to make refactors at the top of the series
>    One refactor, the submit_extent_page() one, should benefit 4K page
>    size more than 64K page size, thus it's worthy to be merged early
> 
> - New bug fixes exposed by Ritesh Harjani on Power
> 
> - Reject RAID56 completely
>    Exposed by btrfs test group, which caused BUG_ON() for various sites.
>    Considering RAID56 is already not considered safe, it's better to
>    reject them completely for now.
> 
> - Fix subpage scrub repair failure
>    Caused by hardcoded PAGE_SIZE
> 
> - Fix free space cache inode size
>    Same cause as scrub repair failure
> 
> v3:
> - Rebased to remove write path prepration patches
> 
> - Properly enable btrfs defrag
>    Previsouly, btrfs defrag is in fact just disabled.
>    This makes tons of tests in btrfs/defrag to fail.
> 
> - More bug fixes for rare race/crashes
>    * Fix relocation false alert on csum mismatch
>    * Fix relocation data corruption
>    * Fix a rare case of false ASSERT()
>      The fix already get merged into the prepration patches, thus no
>      longer in this patchset though.
>    
>    Mostly reported by Ritesh from IBM.
> 
> v4:
> - Disable subpage defrag completely
>    As full page defrag can race with fsstress in btrfs/062, causing
>    strange ordered extent bugs.
>    The full subpage defrag will be submitted as an indepdent patchset.
> 
> v5:
> - Rebased to latest misc-next branch
> 
> v6:
> - Rebased to latest misc-next branch
>    The 11 existing patches have no conflicts at all.
> 
> - Added four patches related to compression read path
>    This involves:
>    * One small fix for extent map grabbing
>    * One preparation to remove GFP_HIGHMEM
>      kmap()/kunmap() is not removed yet, as it's only for later
>      subpage related decompression path rework.
>    * Rework btrfs_decompress_buf2page() and lzo_decompress_bio()
>      btrfs_decompress_buf2page() handles the copying of decompressed data
>      to inode pages, without proper subpage handling, we can copy
>      decompressed data to wrong location
>      lzo_decompress_bio() needs a sectorsize related fix to handle
>      padding zeros.
>      Since we're here, I just reworked the code to make more rooms for
>      proper comments.
> 
>      These two rework looks scary, and touches the core functions of
>      compression, thus Josef gave me extra tests runs on them and no
>      regression found.
> 
>      But still they definitely deserve more review.
> 
> Qu Wenruo (15):
>    btrfs: grab correct extent map for subpage compressed extent read
>    btrfs: remove the GFP_HIGHMEM flag for compression code
>    btrfs: rework btrfs_decompress_buf2page()
>    btrfs: rework lzo_decompress_bio() to make it subpage compatible
>    btrfs: extract relocation page read and dirty part into its own
>      function
>    btrfs: make relocate_one_page() to handle subpage case
>    btrfs: fix wild subpage writeback which does not have ordered extent.
>    btrfs: disable inline extent creation for subpage
>    btrfs: allow submit_extent_page() to do bio split for subpage
>    btrfs: reject raid5/6 fs for subpage
>    btrfs: fix a crash caused by race between prepare_pages() and
>      btrfs_releasepage()
>    btrfs: fix a use-after-free bug in writeback subpage helper
>    btrfs: fix a subpage false alert for relocating partial preallocated
>      data extents
>    btrfs: fix a subpage relocation data corruption
>    btrfs: allow read-write for 4K sectorsize on 64K page size systems
> 
>   fs/btrfs/compression.c | 152 ++++++++++----------
>   fs/btrfs/compression.h |   5 +-
>   fs/btrfs/disk-io.c     |  13 +-
>   fs/btrfs/extent_io.c   | 209 ++++++++++++++++++++--------
>   fs/btrfs/file.c        |  13 +-
>   fs/btrfs/inode.c       |  78 +++++++++--
>   fs/btrfs/ioctl.c       |   6 +
>   fs/btrfs/lzo.c         | 210 ++++++++++++----------------
>   fs/btrfs/relocation.c  | 308 +++++++++++++++++++++++++++--------------
>   fs/btrfs/subpage.c     |  20 ++-
>   fs/btrfs/subpage.h     |   7 +
>   fs/btrfs/super.c       |   7 -
>   fs/btrfs/sysfs.c       |   5 +
>   fs/btrfs/volumes.c     |   8 ++
>   fs/btrfs/zlib.c        |  18 +--
>   fs/btrfs/zstd.c        |  12 +-
>   16 files changed, 661 insertions(+), 410 deletions(-)
>
Neal Gompa July 7, 2021, 5:41 p.m. UTC | #2
On Wed, Jul 7, 2021 at 4:30 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2021/7/5 上午10:00, Qu Wenruo wrote:
> > This much smaller patchset can be fetched from github:
> > https://github.com/adam900710/linux/tree/subpage
> >
> > And thanks him again for fixing up the small typos and style problem in
> > my old patches. Almost no patch is no untouched, really appreciate the
> > effort.
> >
> > These patchset is targeted at v5.15 merge window.
> > There are 11 patches pending for a while, and not touched, thus they
> > should be pretty stable and safe.
> >
> > While there are 4 new patches, two of them are straightforward small
> > fixes, the remaining 2 are a little scary as they reworked the core code
> > of compression.
> >
> > But the rework should improve the readabilty thus make reviewing a
> > little easier (as least I hope so).
> >
> > === Current stage ===
> > The tests on x86 pass without new failure, and generic test group on
> > arm64 with 64K page size passes except known failure and defrag group.
> >
> > For btrfs test group, all pass except compression/raid56/defrag.
> >
> > For anyone who is interested in testing, please use btrfs-progs v5.12 to
> > avoid false alert at mkfs time.
> >
> > === Limitation ===
> > There are several limitations introduced just for subpage:
> > - No compressed write support
> >    Read is no problem, but compression write path has more things left to
> >    be modified.
>
> Well, without proper testing just for compression read, it turns out
> compression read path has more bugs than I thought.
>
> The latest bug exposed during subpage compression write support is, the
> add_rd_bio_pages().
>
> Obviously it has tons of hard coded PAGE_SIZE, and can easily cause
> ASSERT() for readers accounting.
> As it added locked page without proper subpage::readers account updated.
>
> For this case, I'm already crafting a proper fix for that.
>
> But on the other hand, I'm not sure what's the proper way to introduce a
> fix for v5.15 window.
>
> Should I just disable readahead for compression read (which just needs
> two lines to return 0 for subpage case in add_ra_bio_pages())?
>
> Or should I add the proper fix into the patchset?
>

If this is going into 5.15 instead of 5.14, just add the proper fix
into this patch set. But if we want to land this in 5.14, I would
suggest disabling it for now and then having a separate patch set for
that.

You're already targeting 5.15 (though I kind of want this in 5.14...),
so I suggest going with adding the fix to the patch set.
David Sterba July 7, 2021, 6:14 p.m. UTC | #3
On Wed, Jul 07, 2021 at 01:41:46PM -0400, Neal Gompa wrote:
> > But on the other hand, I'm not sure what's the proper way to introduce a
> > fix for v5.15 window.
> >
> > Should I just disable readahead for compression read (which just needs
> > two lines to return 0 for subpage case in add_ra_bio_pages())?
> >
> > Or should I add the proper fix into the patchset?
> 
> If this is going into 5.15 instead of 5.14, just add the proper fix
> into this patch set. But if we want to land this in 5.14, I would
> suggest disabling it for now and then having a separate patch set for
> that.

5.14 is not possible, there are other subpage changes already merged so
fixes to existing level of support (still limited) can go to 5.14-rcs
but this whole patchses it is targeting 5.15.

> You're already targeting 5.15 (though I kind of want this in 5.14...),
> so I suggest going with adding the fix to the patch set.

For 5.15 it's free to do any change, eg. fold the fix or do a separate
patch explaining all the details.

In general the development changes must be ready at the rc5 time at the
latest (with a week or two for some testing and catching last bugs).
Once it's due there's like two months until the next code freeze.
Qu Wenruo July 7, 2021, 11:19 p.m. UTC | #4
On 2021/7/8 上午2:14, David Sterba wrote:
> On Wed, Jul 07, 2021 at 01:41:46PM -0400, Neal Gompa wrote:
>>> But on the other hand, I'm not sure what's the proper way to introduce a
>>> fix for v5.15 window.
>>>
>>> Should I just disable readahead for compression read (which just needs
>>> two lines to return 0 for subpage case in add_ra_bio_pages())?
>>>
>>> Or should I add the proper fix into the patchset?
>>
>> If this is going into 5.15 instead of 5.14, just add the proper fix
>> into this patch set. But if we want to land this in 5.14, I would
>> suggest disabling it for now and then having a separate patch set for
>> that.
>
> 5.14 is not possible, there are other subpage changes already merged so
> fixes to existing level of support (still limited) can go to 5.14-rcs
> but this whole patchses it is targeting 5.15.
>
>> You're already targeting 5.15 (though I kind of want this in 5.14...),
>> so I suggest going with adding the fix to the patch set.
>
> For 5.15 it's free to do any change, eg. fold the fix or do a separate
> patch explaining all the details.

Another reason why I'm considering to disable that readahead ability is,
it really makes little sense for 64K page size.

Our maximum compressed extent size is just 128K, at most 3 pages for 64K
page size (that's when the extent is not page aligned).

Just saving two reads (which may only be one or 2 4K sectors) doesn't
really make it worthy, especially considering the change to ra itself is
going to affecting common code path.

Thus I prefer to disable for the initial support, and only add the
proper fix for the compression write support.

>
> In general the development changes must be ready at the rc5 time at the
> latest (with a week or two for some testing and catching last bugs).
> Once it's due there's like two months until the next code freeze.
>

Great, I'll just add a simple disabling patch with comments on why the
current code doesn't work and why it's fine to disable it for 64K page size.

Thanks,
Qu
David Sterba July 8, 2021, 11:27 a.m. UTC | #5
On Thu, Jul 08, 2021 at 07:19:28AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/7/8 上午2:14, David Sterba wrote:
> > On Wed, Jul 07, 2021 at 01:41:46PM -0400, Neal Gompa wrote:
> >>> But on the other hand, I'm not sure what's the proper way to introduce a
> >>> fix for v5.15 window.
> >>>
> >>> Should I just disable readahead for compression read (which just needs
> >>> two lines to return 0 for subpage case in add_ra_bio_pages())?
> >>>
> >>> Or should I add the proper fix into the patchset?
> >>
> >> If this is going into 5.15 instead of 5.14, just add the proper fix
> >> into this patch set. But if we want to land this in 5.14, I would
> >> suggest disabling it for now and then having a separate patch set for
> >> that.
> >
> > 5.14 is not possible, there are other subpage changes already merged so
> > fixes to existing level of support (still limited) can go to 5.14-rcs
> > but this whole patchses it is targeting 5.15.
> >
> >> You're already targeting 5.15 (though I kind of want this in 5.14...),
> >> so I suggest going with adding the fix to the patch set.
> >
> > For 5.15 it's free to do any change, eg. fold the fix or do a separate
> > patch explaining all the details.
> 
> Another reason why I'm considering to disable that readahead ability is,
> it really makes little sense for 64K page size.
> 
> Our maximum compressed extent size is just 128K, at most 3 pages for 64K
> page size (that's when the extent is not page aligned).
> 
> Just saving two reads (which may only be one or 2 4K sectors) doesn't
> really make it worthy, especially considering the change to ra itself is
> going to affecting common code path.
> 
> Thus I prefer to disable for the initial support, and only add the
> proper fix for the compression write support.

Ok.

> > In general the development changes must be ready at the rc5 time at the
> > latest (with a week or two for some testing and catching last bugs).
> > Once it's due there's like two months until the next code freeze.
> >
> 
> Great, I'll just add a simple disabling patch with comments on why the
> current code doesn't work and why it's fine to disable it for 64K page size.

Yes please, there's a number of things that won't be initially
implemented so we need to keep track (changelogs and/or comments).