mbox series

[v3,00/22] btrfs: add read-only support for subpage sector size

Message ID 20210106010201.37864-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: add read-only support for subpage sector size | expand

Message

Qu Wenruo Jan. 6, 2021, 1:01 a.m. UTC
Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains partial RW data support (still some
out-of-sync subpage data page status).

Great thanks to David for his effort reviewing and merging the
preparation patches into misc-next.
Now all previously submitted preparation patches are already in
misc-next.

=== What works ===

Just from the patchset:
- Data read
  Both regular and compressed data, with csum check.

- Metadata read

This means, with these patchset, 64K page systems can at least mount
btrfs with 4K sector size.

In the subpage branch
- Metadata read write and balance
  Not yet full tested due to data write still has bugs need to be
  solved.
  But considering that metadata operations from previous iteration
  is mostly untouched, metadata read write should be pretty stable.

- Baisc data balance
  This is new.

- Data read write
  Only uncompressed data writes. Fsstress can survive.
  But still very rare data csum error, which only happens in bookend data
  extents (part of the data extent which is not referred by any file).
  This looks like something related to reflink/invalidate page and cow
  fixup.
  Still invetigating.

=== Needs feedback ===
The following design needs extra comments:

- u16 bitmap
  As David mentioned, using u16 as bit map is not the fastest way.
  That's also why current bitmap code requires unsigned long (u32) as
  minimal unit.
  But using bitmap directly would double the memory usage.
  Thus the best way is to pack two u16 bitmap into one u32 bitmap, but
  that still needs extra investigation to find better practice.

  Anyway the skeleton should be pretty simple to expand.

- Separate handling for subpage metadata
  Currently the metadata read and (later write path) handles subpage
  metadata differently. Mostly due to the page locking must be skipped
  for subpage metadata.
  I tried several times to use as many common code as possible, but
  every time I ended up reverting back to current code.

  Thankfully, for data handling we will use the same common code.

- Incompatible subpage strcuture against iomap_page
  In btrfs we need extra bits than iomap_page.
  This is due to we need sector perfect write for data balance.
  E.g. if only one 4K sector is dirty in a 64K page, we should only
  write that dirty 4K back to disk, not the full 64K page.

  As data balance requires the new data extents to have exactly the
  same size as the original ones.
  This means, unless iomap_page get extra bits like what we're doing in
  btrfs for dirty, we can't merge the btrfs_subpage with iomap_page.

=== Patchset structure ===
Patch 01~03:	Existing preparation patches.
		Mostly readability related patches found during RW
		development
Patch 04~05:	New preparation patches.
		Mostly related to __process_pages_contig().
Patch 06~10:	Subpage handling for extent buffer allocation and
		freeing
Patch 11~22:	Subpage handling for extent buffer read path

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.

v2:
- Use page::private as btrfs_subpage for extra info
  This replace old extent io tree based solution, which reduces latency
  and don't require memory allocation for its operations.

- Cherry-pick new preparation patches from RW development
  Those new preparation patches improves the readability by their own.

v3:
- Make dummy extent buffer to follow the same subpage accessors
  Fsstress exposed several ASSERT() for dummy extent buffers.
  It turns out we need to make dummy extent buffer to own the same
  btrfs_subpage structure to make eb accessors to work properly

- Two new small __process_pages_contig() related preparation patches
  One to make __process_pages_contig() to enhance the error handling
  path for locked_page, one to merge one macro.

- Extent buffers refs count update
  Except try_release_extent_buffer(), all other eb uses will try to
  increase the ref count of the eb.
  For try_release_extent_buffer(), the eb refs check will happen inside
  the rcu critical section to avoid eb being freed.

- Comment updates
  Addressing the comments from the mail list.

Qu Wenruo (22):
  btrfs: extent_io: rename @offset parameter to @disk_bytenr for
    submit_extent_page()
  btrfs: extent_io: refactor __extent_writepage_io() to improve
    readability
  btrfs: file: update comment for btrfs_dirty_pages()
  btrfs: extent_io: update locked page dirty/writeback/error bits in
    __process_pages_contig()
  btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into
    PAGE_START_WRITEBACK
  btrfs: extent_io: introduce a helper to grab an existing extent buffer
    from a page
  btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  btrfs: extent_io: make attach_extent_buffer_page() to handle subpage
    case
  btrfs: extent_io: make grab_extent_buffer_from_page() to handle
    subpage case
  btrfs: extent_io: support subpage for extent buffer page release
  btrfs: extent_io: attach private to dummy extent buffer pages
  btrfs: subpage: introduce helper for subpage uptodate status
  btrfs: subpage: introduce helper for subpage error status
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage
    compatible
  btrfs: extent_io: implement try_release_extent_buffer() for subpage
    metadata support
  btrfs: extent_io: introduce read_extent_buffer_subpage()
  btrfs: extent_io: make endio_readpage_update_page_status() to handle
    subpage case
  btrfs: disk-io: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/Makefile           |   3 +-
 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          |  82 +++++-
 fs/btrfs/extent_io.c        | 534 ++++++++++++++++++++++++++++--------
 fs/btrfs/extent_io.h        |  15 +-
 fs/btrfs/file.c             |  35 +--
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  40 ++-
 fs/btrfs/ioctl.c            |   5 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  12 +-
 fs/btrfs/subpage.c          |  39 +++
 fs/btrfs/subpage.h          | 256 +++++++++++++++++
 fs/btrfs/super.c            |   7 +
 14 files changed, 876 insertions(+), 182 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

Comments

David Sterba Jan. 12, 2021, 3:14 p.m. UTC | #1
On Wed, Jan 06, 2021 at 09:01:39AM +0800, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> out-of-sync subpage data page status).
> 
> Great thanks to David for his effort reviewing and merging the
> preparation patches into misc-next.
> Now all previously submitted preparation patches are already in
> misc-next.

I merged 1, 2, 3, 6 to misc-next as they're obvious and independent.
The rest is up for review, I haven't looked closely on the open
questions.

> Qu Wenruo (22):
>   btrfs: extent_io: rename @offset parameter to @disk_bytenr for
>     submit_extent_page()

Please drop "extent_io:" from any future patch subjects, they get too
long already and we haven't been using this prefix consistently anyway.
Qu Wenruo Jan. 13, 2021, 5:06 a.m. UTC | #2
On 2021/1/12 下午11:14, David Sterba wrote:
> On Wed, Jan 06, 2021 at 09:01:39AM +0800, Qu Wenruo wrote:
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage
>> Currently the branch also contains partial RW data support (still some
>> out-of-sync subpage data page status).
>>
>> Great thanks to David for his effort reviewing and merging the
>> preparation patches into misc-next.
>> Now all previously submitted preparation patches are already in
>> misc-next.
>
> I merged 1, 2, 3, 6 to misc-next as they're obvious and independent.
> The rest is up for review, I haven't looked closely on the open
> questions.

I just fetched misc-next from github/kernel, and none of the branches
have the patches.

Maybe I should be more patient?

>
>> Qu Wenruo (22):
>>    btrfs: extent_io: rename @offset parameter to @disk_bytenr for
>>      submit_extent_page()
>
> Please drop "extent_io:" from any future patch subjects, they get too
> long already and we haven't been using this prefix consistently anyway.
>
Got it.

Thanks,
Qu