mbox series

[v5,00/18] btrfs: add read-only support for subpage sector size

Message ID 20210126083402.142577-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: add read-only support for subpage sector size | expand

Message

Qu Wenruo Jan. 26, 2021, 8:33 a.m. UTC
Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains partial RW data support (still some
ordered extent and data csum mismatch problems)

Great thanks to David/Nikolay/Josef for their effort reviewing and
merging the preparation patches into misc-next.

=== What works ===
Just from the patchset:
- Data read
  Both regular and compressed data, with csum check.

- Metadata read

This means, with these patchset, 64K page systems can at least mount
btrfs with 4K sector size read-only.
This should provide the ability to migrate data at least.

While on the github branch, there are already experimental RW supports,
there are still ordered extent related bugs for me to fix.
Thus only the RO part is sent for review and testing.

=== Patchset structure ===
Patch 01~02:	Preparation patches which don't have functional change
Patch 03~12:	Subpage metadata allocation and freeing
Patch 13~15:	Subpage metadata read path
Patch 16~17:	Subpage data read path
Patch 18:	Enable subpage RO support

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.

v2:
- Use page::private as btrfs_subpage for extra info
  This replace old extent io tree based solution, which reduces latency
  and don't require memory allocation for its operations.

- Cherry-pick new preparation patches from RW development
  Those new preparation patches improves the readability by their own.

v3:
- Make dummy extent buffer to follow the same subpage accessors
  Fsstress exposed several ASSERT() for dummy extent buffers.
  It turns out we need to make dummy extent buffer to own the same
  btrfs_subpage structure to make eb accessors to work properly

- Two new small __process_pages_contig() related preparation patches
  One to make __process_pages_contig() to enhance the error handling
  path for locked_page, one to merge one macro.

- Extent buffers refs count update
  Except try_release_extent_buffer(), all other eb uses will try to
  increase the ref count of the eb.
  For try_release_extent_buffer(), the eb refs check will happen inside
  the rcu critical section to avoid eb being freed.

- Comment updates
  Addressing the comments from the mail list.

v4:
- Get rid of btrfs_subpage::tree_block_bitmap
  This is to reduce lock complexity (no need to bother extra subpage
  lock for metadata, all locks are existing locks)
  Now eb looking up mostly depends on radix tree, with small help from
  btrfs_subpage::under_alloc.
  Now I haven't experieneced metadata related problems any more during
  my local fsstress tests.

- Fix a race where metadata page dirty bit can race
  Fixed in the metadata RW patchset though.

- Rebased to latest misc-next branch
  With 4 patches removed, as they are already in misc-next.

v5:
- Use the updated version from David as base
  Most comment/commit message update should be kept as is.

- A new separate patch to move UNMAPPED bit set timing

- New comment on why we need to prealloc subpage inside a loop
  Mostly for further 16K page size support, where we can have
  eb across multiple pages.

- Remove one patch which is too RW specific
  Since it introduces functional change which only makes sense for RW
  support, it's not a good idea to include it in RO support.

- Error handling fixes
  Great thanks to Josef.

- Refactor btrfs_subpage allocation/freeing
  Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
  do all the allocation/freeing.
  It's pretty easy to convert to kmem_cache using above helpers.
  (already internally tested using kmem_cache without problem, in fact
   it's all the problems found in kmem_cache test leads to the new
   interface)

- Use btrfs_subpage::eb_refs to replace old under_alloc
  This makes checking whether the page has any eb left much easier.

Qu Wenruo (18):
  btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
    PAGE_START_WRITEBACK
  btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
    subpage support
  btrfs: introduce the skeleton of btrfs_subpage structure
  btrfs: make attach_extent_buffer_page() handle subpage case
  btrfs: make grab_extent_buffer_from_page() handle subpage case
  btrfs: support subpage for extent buffer page release
  btrfs: attach private to dummy extent buffer pages
  btrfs: introduce helpers for subpage uptodate status
  btrfs: introduce helpers for subpage error status
  btrfs: support subpage in set/clear_extent_buffer_uptodate()
  btrfs: support subpage in btrfs_clone_extent_buffer
  btrfs: support subpage in try_release_extent_buffer()
  btrfs: introduce read_extent_buffer_subpage()
  btrfs: support subpage in endio_readpage_update_page_status()
  btrfs: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for data read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/Makefile           |   3 +-
 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          |  81 +++++-
 fs/btrfs/extent_io.c        | 485 ++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  15 +-
 fs/btrfs/file.c             |  24 +-
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  42 ++--
 fs/btrfs/ioctl.c            |   8 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  11 +-
 fs/btrfs/subpage.c          | 278 +++++++++++++++++++++
 fs/btrfs/subpage.h          |  92 +++++++
 fs/btrfs/super.c            |   8 +-
 14 files changed, 964 insertions(+), 113 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

Comments

Josef Bacik Jan. 27, 2021, 4:17 p.m. UTC | #1
On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> ordered extent and data csum mismatch problems)
> 
> Great thanks to David/Nikolay/Josef for their effort reviewing and
> merging the preparation patches into misc-next.
> 
> === What works ===
> Just from the patchset:
> - Data read
>    Both regular and compressed data, with csum check.
> 
> - Metadata read
> 
> This means, with these patchset, 64K page systems can at least mount
> btrfs with 4K sector size read-only.
> This should provide the ability to migrate data at least.
> 
> While on the github branch, there are already experimental RW supports,
> there are still ordered extent related bugs for me to fix.
> Thus only the RO part is sent for review and testing.
> 
> === Patchset structure ===
> Patch 01~02:	Preparation patches which don't have functional change
> Patch 03~12:	Subpage metadata allocation and freeing
> Patch 13~15:	Subpage metadata read path
> Patch 16~17:	Subpage data read path
> Patch 18:	Enable subpage RO support
> 
> === Changelog ===
> v1:
> - Separate the main implementation from previous huge patchset
>    Huge patchset doesn't make much sense.
> 
> - Use bitmap implementation
>    Now page::private will be a pointer to btrfs_subpage structure, which
>    contains bitmaps for various page status.
> 
> v2:
> - Use page::private as btrfs_subpage for extra info
>    This replace old extent io tree based solution, which reduces latency
>    and don't require memory allocation for its operations.
> 
> - Cherry-pick new preparation patches from RW development
>    Those new preparation patches improves the readability by their own.
> 
> v3:
> - Make dummy extent buffer to follow the same subpage accessors
>    Fsstress exposed several ASSERT() for dummy extent buffers.
>    It turns out we need to make dummy extent buffer to own the same
>    btrfs_subpage structure to make eb accessors to work properly
> 
> - Two new small __process_pages_contig() related preparation patches
>    One to make __process_pages_contig() to enhance the error handling
>    path for locked_page, one to merge one macro.
> 
> - Extent buffers refs count update
>    Except try_release_extent_buffer(), all other eb uses will try to
>    increase the ref count of the eb.
>    For try_release_extent_buffer(), the eb refs check will happen inside
>    the rcu critical section to avoid eb being freed.
> 
> - Comment updates
>    Addressing the comments from the mail list.
> 
> v4:
> - Get rid of btrfs_subpage::tree_block_bitmap
>    This is to reduce lock complexity (no need to bother extra subpage
>    lock for metadata, all locks are existing locks)
>    Now eb looking up mostly depends on radix tree, with small help from
>    btrfs_subpage::under_alloc.
>    Now I haven't experieneced metadata related problems any more during
>    my local fsstress tests.
> 
> - Fix a race where metadata page dirty bit can race
>    Fixed in the metadata RW patchset though.
> 
> - Rebased to latest misc-next branch
>    With 4 patches removed, as they are already in misc-next.
> 
> v5:
> - Use the updated version from David as base
>    Most comment/commit message update should be kept as is.
> 
> - A new separate patch to move UNMAPPED bit set timing
> 
> - New comment on why we need to prealloc subpage inside a loop
>    Mostly for further 16K page size support, where we can have
>    eb across multiple pages.
> 
> - Remove one patch which is too RW specific
>    Since it introduces functional change which only makes sense for RW
>    support, it's not a good idea to include it in RO support.
> 
> - Error handling fixes
>    Great thanks to Josef.
> 
> - Refactor btrfs_subpage allocation/freeing
>    Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>    do all the allocation/freeing.
>    It's pretty easy to convert to kmem_cache using above helpers.
>    (already internally tested using kmem_cache without problem, in fact
>     it's all the problems found in kmem_cache test leads to the new
>     interface)
> 
> - Use btrfs_subpage::eb_refs to replace old under_alloc
>    This makes checking whether the page has any eb left much easier.
> 
> Qu Wenruo (18):
>    btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>      PAGE_START_WRITEBACK
>    btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>      subpage support
>    btrfs: introduce the skeleton of btrfs_subpage structure
>    btrfs: make attach_extent_buffer_page() handle subpage case
>    btrfs: make grab_extent_buffer_from_page() handle subpage case
>    btrfs: support subpage for extent buffer page release

I don't have this patch in my inbox so I can't reply to it directly, but you 
include refcount.h, but then use normal atomics.  Please used the actual 
refcount_t, as it gets us all the debugging stuff that makes finding problems 
much easier.  Thanks,

Josef
Qu Wenruo Jan. 28, 2021, 12:30 a.m. UTC | #2
On 2021/1/28 上午12:17, Josef Bacik wrote:
> On 1/26/21 3:33 AM, Qu Wenruo wrote:
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage
>> Currently the branch also contains partial RW data support (still some
>> ordered extent and data csum mismatch problems)
>>
>> Great thanks to David/Nikolay/Josef for their effort reviewing and
>> merging the preparation patches into misc-next.
>>
>> === What works ===
>> Just from the patchset:
>> - Data read
>>    Both regular and compressed data, with csum check.
>>
>> - Metadata read
>>
>> This means, with these patchset, 64K page systems can at least mount
>> btrfs with 4K sector size read-only.
>> This should provide the ability to migrate data at least.
>>
>> While on the github branch, there are already experimental RW supports,
>> there are still ordered extent related bugs for me to fix.
>> Thus only the RO part is sent for review and testing.
>>
>> === Patchset structure ===
>> Patch 01~02:    Preparation patches which don't have functional change
>> Patch 03~12:    Subpage metadata allocation and freeing
>> Patch 13~15:    Subpage metadata read path
>> Patch 16~17:    Subpage data read path
>> Patch 18:    Enable subpage RO support
>>
>> === Changelog ===
>> v1:
>> - Separate the main implementation from previous huge patchset
>>    Huge patchset doesn't make much sense.
>>
>> - Use bitmap implementation
>>    Now page::private will be a pointer to btrfs_subpage structure, which
>>    contains bitmaps for various page status.
>>
>> v2:
>> - Use page::private as btrfs_subpage for extra info
>>    This replace old extent io tree based solution, which reduces latency
>>    and don't require memory allocation for its operations.
>>
>> - Cherry-pick new preparation patches from RW development
>>    Those new preparation patches improves the readability by their own.
>>
>> v3:
>> - Make dummy extent buffer to follow the same subpage accessors
>>    Fsstress exposed several ASSERT() for dummy extent buffers.
>>    It turns out we need to make dummy extent buffer to own the same
>>    btrfs_subpage structure to make eb accessors to work properly
>>
>> - Two new small __process_pages_contig() related preparation patches
>>    One to make __process_pages_contig() to enhance the error handling
>>    path for locked_page, one to merge one macro.
>>
>> - Extent buffers refs count update
>>    Except try_release_extent_buffer(), all other eb uses will try to
>>    increase the ref count of the eb.
>>    For try_release_extent_buffer(), the eb refs check will happen inside
>>    the rcu critical section to avoid eb being freed.
>>
>> - Comment updates
>>    Addressing the comments from the mail list.
>>
>> v4:
>> - Get rid of btrfs_subpage::tree_block_bitmap
>>    This is to reduce lock complexity (no need to bother extra subpage
>>    lock for metadata, all locks are existing locks)
>>    Now eb looking up mostly depends on radix tree, with small help from
>>    btrfs_subpage::under_alloc.
>>    Now I haven't experieneced metadata related problems any more during
>>    my local fsstress tests.
>>
>> - Fix a race where metadata page dirty bit can race
>>    Fixed in the metadata RW patchset though.
>>
>> - Rebased to latest misc-next branch
>>    With 4 patches removed, as they are already in misc-next.
>>
>> v5:
>> - Use the updated version from David as base
>>    Most comment/commit message update should be kept as is.
>>
>> - A new separate patch to move UNMAPPED bit set timing
>>
>> - New comment on why we need to prealloc subpage inside a loop
>>    Mostly for further 16K page size support, where we can have
>>    eb across multiple pages.
>>
>> - Remove one patch which is too RW specific
>>    Since it introduces functional change which only makes sense for RW
>>    support, it's not a good idea to include it in RO support.
>>
>> - Error handling fixes
>>    Great thanks to Josef.
>>
>> - Refactor btrfs_subpage allocation/freeing
>>    Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>>    do all the allocation/freeing.
>>    It's pretty easy to convert to kmem_cache using above helpers.
>>    (already internally tested using kmem_cache without problem, in fact
>>     it's all the problems found in kmem_cache test leads to the new
>>     interface)
>>
>> - Use btrfs_subpage::eb_refs to replace old under_alloc
>>    This makes checking whether the page has any eb left much easier.
>>
>> Qu Wenruo (18):
>>    btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>>      PAGE_START_WRITEBACK
>>    btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>>      subpage support
>>    btrfs: introduce the skeleton of btrfs_subpage structure
>>    btrfs: make attach_extent_buffer_page() handle subpage case
>>    btrfs: make grab_extent_buffer_from_page() handle subpage case
>>    btrfs: support subpage for extent buffer page release
>
> I don't have this patch in my inbox so I can't reply to it directly, but
> you include refcount.h, but then use normal atomics.  Please used the
> actual refcount_t, as it gets us all the debugging stuff that makes
> finding problems much easier.  Thanks,

My bad, my initial plan is to use refcount, but the use case has valid 0
refcount usage, thus refcount is not good here.

I'll remove the remaining including line.

Thanks,
Qu
>
> Josef
David Sterba Jan. 28, 2021, 10:34 a.m. UTC | #3
On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
> >>    btrfs: support subpage for extent buffer page release
> >
> > I don't have this patch in my inbox so I can't reply to it directly, but
> > you include refcount.h, but then use normal atomics.  Please used the
> > actual refcount_t, as it gets us all the debugging stuff that makes
> > finding problems much easier.  Thanks,
> 
> My bad, my initial plan is to use refcount, but the use case has valid 0
> refcount usage, thus refcount is not good here.

In case you need to shift the "0" you can use refcount_dec_not_one or
refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
this applies in your case.
Qu Wenruo Jan. 28, 2021, 10:51 a.m. UTC | #4
On 2021/1/28 下午6:34, David Sterba wrote:
> On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
>>>>     btrfs: support subpage for extent buffer page release
>>>
>>> I don't have this patch in my inbox so I can't reply to it directly, but
>>> you include refcount.h, but then use normal atomics.  Please used the
>>> actual refcount_t, as it gets us all the debugging stuff that makes
>>> finding problems much easier.  Thanks,
>>
>> My bad, my initial plan is to use refcount, but the use case has valid 0
>> refcount usage, thus refcount is not good here.
> 
> In case you need to shift the "0" you can use refcount_dec_not_one or
> refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
> this applies in your case.
> 

In the code, what we want is inc on zero, which will cause warning on 
refcount. (initial subpage allocation has zero ref, then increased to 
one when one eb is attached to the page)

But maybe I can change the timing so that we can use refcount.
Current code uses ASSERT()s to prevent underflow, so it would be 
sufficient for current code base though.

I'll investigate more time on this topic in next update.

Thanks,
Qu
David Sterba Feb. 1, 2021, 2:50 p.m. UTC | #5
On Thu, Jan 28, 2021 at 06:51:46PM +0800, Qu Wenruo wrote:
> On 2021/1/28 下午6:34, David Sterba wrote:
> > On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
> >>>>     btrfs: support subpage for extent buffer page release
> >>>
> >>> I don't have this patch in my inbox so I can't reply to it directly, but
> >>> you include refcount.h, but then use normal atomics.  Please used the
> >>> actual refcount_t, as it gets us all the debugging stuff that makes
> >>> finding problems much easier.  Thanks,
> >>
> >> My bad, my initial plan is to use refcount, but the use case has valid 0
> >> refcount usage, thus refcount is not good here.
> > 
> > In case you need to shift the "0" you can use refcount_dec_not_one or
> > refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
> > this applies in your case.
> 
> In the code, what we want is inc on zero, which will cause warning on 
> refcount. (initial subpage allocation has zero ref, then increased to 
> one when one eb is attached to the page)
> 
> But maybe I can change the timing so that we can use refcount.
> Current code uses ASSERT()s to prevent underflow, so it would be 
> sufficient for current code base though.

Assert for an underflow is ok but the refcount catches inc from zero ie.
a potential use after free.

With lifted refcount it should be possible to distinguish states where
it's really freed (0, to be deallocated) and 1 which is some middle
state like initialized, valid but not yet attached. Usage will increase
the ref, once there are no users, compare to 1, and then final put is
back to 0. A similar pattern is done for extent buffers, the subpage
data probably have similar lifetime.
David Sterba Feb. 1, 2021, 3:55 p.m. UTC | #6
On Tue, Jan 26, 2021 at 04:33:44PM +0800, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> ordered extent and data csum mismatch problems)
> 
> Great thanks to David/Nikolay/Josef for their effort reviewing and
> merging the preparation patches into misc-next.
> 
> === What works ===
> Just from the patchset:
> - Data read
>   Both regular and compressed data, with csum check.
> 
> - Metadata read
> 
> This means, with these patchset, 64K page systems can at least mount
> btrfs with 4K sector size read-only.
> This should provide the ability to migrate data at least.
> 
> While on the github branch, there are already experimental RW supports,
> there are still ordered extent related bugs for me to fix.
> Thus only the RO part is sent for review and testing.
> 
> === Patchset structure ===
> Patch 01~02:	Preparation patches which don't have functional change
> Patch 03~12:	Subpage metadata allocation and freeing
> Patch 13~15:	Subpage metadata read path
> Patch 16~17:	Subpage data read path
> Patch 18:	Enable subpage RO support

> v5:
> - Use the updated version from David as base
>   Most comment/commit message update should be kept as is.
> 
> - A new separate patch to move UNMAPPED bit set timing
> 
> - New comment on why we need to prealloc subpage inside a loop
>   Mostly for further 16K page size support, where we can have
>   eb across multiple pages.
> 
> - Remove one patch which is too RW specific
>   Since it introduces functional change which only makes sense for RW
>   support, it's not a good idea to include it in RO support.
> 
> - Error handling fixes
>   Great thanks to Josef.
> 
> - Refactor btrfs_subpage allocation/freeing
>   Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>   do all the allocation/freeing.
>   It's pretty easy to convert to kmem_cache using above helpers.
>   (already internally tested using kmem_cache without problem, in fact
>    it's all the problems found in kmem_cache test leads to the new
>    interface)
> 
> - Use btrfs_subpage::eb_refs to replace old under_alloc
>   This makes checking whether the page has any eb left much easier.

All look reasonable for merge, patch 17 still needs an update that'll
replace once you send it.

I'll move it to misc-next after fstests finish, minor updates are still
possible during this week, merge window freeze is approaching.
Anand Jain Feb. 2, 2021, 9:21 a.m. UTC | #7
Qu,

  fstests ran fine on an aarch64 kvm with this patch set.

  Further, I was running few hand tests as below, and it fails
  with - Unable to handle kernel paging.

  Test case looks something like..

  On x86_64 create btrfs on a file 11g
  copy /usr into /test-mnt stops at enospc
  set compression property on the root sunvol
  run defrag with -czstd
  truncate a large file 4gb
  punch holes on it
  truncate couple of smaller files
  unmount
  send file to an aarch64 (64k pagesize) kvm
  mount -o ro
  run sha256sum on all the files

---------------------
[37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611 
off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
[37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, 
rd 0, flush 0, corrupt 9, gen 0
[37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616 
off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
[37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, 
rd 0, flush 0, corrupt 10, gen 0
[37012.123917] Unable to handle kernel paging request at virtual address 
0061d1f66c080000
[37012.126104] Mem abort info:
[37012.126951]   ESR = 0x96000004
[37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
[37012.129207]   SET = 0, FnV = 0
[37012.130043]   EA = 0, S1PTW = 0
[37012.131269] Data abort info:
[37012.132165]   ISV = 0, ISS = 0x00000004
[37012.133211]   CM = 0, WnR = 0
[37012.134014] [0061d1f66c080000] address between user and kernel 
address ranges
[37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon 
zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
[37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted 
5.11.0-rc5+ #10
[37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015
[37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
[37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[37012.148175] pc : __crc32c_le+0x84/0xe8
[37012.149266] lr : chksum_digest+0x24/0x40
[37012.150420] sp : ffff80001638f8f0
[37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
[37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
[37012.154565] x25: ffff800011df3948 x24: 0000004000000000
[37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
[37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
[37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
[37012.160684] x17: 0000000000000000 x16: 0000000000000000
[37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
[37012.163774] x13: 0000000000000145 x12: 0000000000000001
[37012.165282] x11: 0000000000000000 x10: 00000000000009d0
[37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
[37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
[37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
[37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
[37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
[37012.174642] Call trace:
[37012.175427]  __crc32c_le+0x84/0xe8
[37012.176419]  crypto_shash_digest+0x34/0x58
[37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
[37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
[37012.180731]  bio_endio+0x12c/0x1d8
[37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
[37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
[37012.184570]  process_one_work+0x1ec/0x4c0
[37012.185727]  worker_thread+0x48/0x478
[37012.186823]  kthread+0x158/0x160
[37012.187768]  ret_from_fork+0x10/0x34
[37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
[37012.190486] ---[ end trace 4f73e813d058b84c ]---
[37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
---------------

  Could you please take a look?

Thanks, Anand
Qu Wenruo Feb. 2, 2021, 10:23 a.m. UTC | #8
On 2021/2/2 下午5:21, Anand Jain wrote:
>
> Qu,
>
>   fstests ran fine on an aarch64 kvm with this patch set.

Do you mean subpage patchset?

With 4K sector size?

No way it can run fine...
Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
unable to locate the corresponding ordered extent.


>
>   Further, I was running few hand tests as below, and it fails
>   with - Unable to handle kernel paging.
>
>   Test case looks something like..
>
>   On x86_64 create btrfs on a file 11g
>   copy /usr into /test-mnt stops at enospc
>   set compression property on the root sunvol
>   run defrag with -czstd

I don't even consider compression a supported feature for subpage.

Are you really talking about the subpage patchset with 4K sector size,
on 64K page size AArch64?

If really so, I appreciate your effort on testing very much, it means
the patchset is doing way better than it is.
But I don't really believe it's even true to pass fstests....

Thanks,
Qu

>   truncate a large file 4gb
>   punch holes on it
>   truncate couple of smaller files
>   unmount
>   send file to an aarch64 (64k pagesize) kvm
>   mount -o ro
>   run sha256sum on all the files
>
> ---------------------
> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
> rd 0, flush 0, corrupt 9, gen 0
> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
> rd 0, flush 0, corrupt 10, gen 0
> [37012.123917] Unable to handle kernel paging request at virtual address
> 0061d1f66c080000
> [37012.126104] Mem abort info:
> [37012.126951]   ESR = 0x96000004
> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
> [37012.129207]   SET = 0, FnV = 0
> [37012.130043]   EA = 0, S1PTW = 0
> [37012.131269] Data abort info:
> [37012.132165]   ISV = 0, ISS = 0x00000004
> [37012.133211]   CM = 0, WnR = 0
> [37012.134014] [0061d1f66c080000] address between user and kernel
> address ranges
> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
> 5.11.0-rc5+ #10
> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
> 02/06/2015
> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
> [37012.148175] pc : __crc32c_le+0x84/0xe8
> [37012.149266] lr : chksum_digest+0x24/0x40
> [37012.150420] sp : ffff80001638f8f0
> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
> [37012.174642] Call trace:
> [37012.175427]  __crc32c_le+0x84/0xe8
> [37012.176419]  crypto_shash_digest+0x34/0x58
> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
> [37012.180731]  bio_endio+0x12c/0x1d8
> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
> [37012.184570]  process_one_work+0x1ec/0x4c0
> [37012.185727]  worker_thread+0x48/0x478
> [37012.186823]  kthread+0x158/0x160
> [37012.187768]  ret_from_fork+0x10/0x34
> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
> ---------------
>
>   Could you please take a look?
>
> Thanks, Anand
Anand Jain Feb. 2, 2021, 11:28 a.m. UTC | #9
On 2/2/2021 6:23 PM, Qu Wenruo wrote:
> 
> 
> On 2021/2/2 下午5:21, Anand Jain wrote:
>>
>> Qu,
>>
>>   fstests ran fine on an aarch64 kvm with this patch set.
> 
> Do you mean subpage patchset?
> 
> With 4K sector size?
> No way it can run fine...

  No . fstests ran with sectorsize == pagesize == 64k. These aren't
  subpage though. I mean just regression checks.

> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
> unable to locate the corresponding ordered extent.
>
>>   Further, I was running few hand tests as below, and it fails
>>   with - Unable to handle kernel paging.
>>
>>   Test case looks something like..
>>
>>   On x86_64 create btrfs on a file 11g
>>   copy /usr into /test-mnt stops at enospc
>>   set compression property on the root sunvol
>>   run defrag with -czstd
> 
> I don't even consider compression a supported feature for subpage.

  It should fail the ro mount, which it didn't. Similar test case
  without compression is fine.

> Are you really talking about the subpage patchset with 4K sector size,
> on 64K page size AArch64?

  yes readonly mount test case as above.

Thanks, Anand


> If really so, I appreciate your effort on testing very much, it means
> the patchset is doing way better than it is.
> But I don't really believe it's even true to pass fstests....



> Thanks,
> Qu
> 
>>   truncate a large file 4gb
>>   punch holes on it
>>   truncate couple of smaller files
>>   unmount
>>   send file to an aarch64 (64k pagesize) kvm
>>   mount -o ro
>>   run sha256sum on all the files
>>
>> ---------------------
>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>> rd 0, flush 0, corrupt 9, gen 0
>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>> rd 0, flush 0, corrupt 10, gen 0
>> [37012.123917] Unable to handle kernel paging request at virtual address
>> 0061d1f66c080000
>> [37012.126104] Mem abort info:
>> [37012.126951]   ESR = 0x96000004
>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [37012.129207]   SET = 0, FnV = 0
>> [37012.130043]   EA = 0, S1PTW = 0
>> [37012.131269] Data abort info:
>> [37012.132165]   ISV = 0, ISS = 0x00000004
>> [37012.133211]   CM = 0, WnR = 0
>> [37012.134014] [0061d1f66c080000] address between user and kernel
>> address ranges
>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>> 5.11.0-rc5+ #10
>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>> 02/06/2015
>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>> [37012.149266] lr : chksum_digest+0x24/0x40
>> [37012.150420] sp : ffff80001638f8f0
>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>> [37012.174642] Call trace:
>> [37012.175427]  __crc32c_le+0x84/0xe8
>> [37012.176419]  crypto_shash_digest+0x34/0x58
>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>> [37012.180731]  bio_endio+0x12c/0x1d8
>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>> [37012.184570]  process_one_work+0x1ec/0x4c0
>> [37012.185727]  worker_thread+0x48/0x478
>> [37012.186823]  kthread+0x158/0x160
>> [37012.187768]  ret_from_fork+0x10/0x34
>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>> ---------------
>>
>>   Could you please take a look?
>>
>> Thanks, Anand
Anand Jain Feb. 2, 2021, 1:37 p.m. UTC | #10
It is much simpler to reproduce. I am using two systems with different
pagesizes to test the subpage readonly support.

On a host with pagesize = 4k.
   truncate -s 3g 3g.img
   mkfs.btrfs ./3g.img
   mount -o loop,compress=zstd ./3g.img /btrfs
   xfs_io -f -c "pwrite -S 0xab 0 128k" /btrfs/foo
   umount /btrfs

Copy the file 3g.img to another host with pagesize = 64k.
   mount -o ro,loop ./3g.img /btrfs
   sha256sum /btrfs/foo

   leads to Unable to handle kernel NULL pointer dereference
----------------
[  +0.001387] BTRFS warning (device loop0): csum hole found for disk 
bytenr range [13672448, 13676544)
[  +0.001514] BTRFS warning (device loop0): csum failed root 5 ino 257 
off 13697024 csum 0xbcd798f5 expected csum 0xf11c5ebf mirror 1
[  +0.002301] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, rd 
0, flush 0, corrupt 1, gen 0
[  +0.001647] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000000
[  +0.001670] Mem abort info:
[  +0.000506]   ESR = 0x96000005
[  +0.000471]   EC = 0x25: DABT (current EL), IL = 32 bits
[  +0.000783]   SET = 0, FnV = 0
[  +0.000450]   EA = 0, S1PTW = 0
[  +0.000462] Data abort info:
[  +0.000530]   ISV = 0, ISS = 0x00000005
[  +0.000755]   CM = 0, WnR = 0
[  +0.000466] user pgtable: 64k pages, 48-bit VAs, pgdp=000000010717ce00
[  +0.001027] [0000000000000000] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000
[  +0.001402] Internal error: Oops: 96000005 [#1] PREEMPT SMP

Message from syslogd@aa3 at Feb  2 08:18:05 ...
  kernel:Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  +0.000958] Modules linked in: btrfs blake2b_generic xor xor_neon 
zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
[  +0.001779] CPU: 25 PID: 5754 Comm: kworker/u64:1 Not tainted 
5.11.0-rc5+ #10
[  +0.001122] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  +0.001286] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
[  +0.001139] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[  +0.001110] pc : __crc32c_le+0x84/0xe8
[  +0.000726] lr : chksum_digest+0x24/0x40
[  +0.000731] sp : ffff800017def8f0
[  +0.000624] x29: ffff800017def8f0 x28: ffff0000c84dca00
[  +0.000994] x27: ffff0000c44f5400 x26: ffff0000e3a008b0
[  +0.000985] x25: ffff800011df3948 x24: 0000004000000000
[  +0.001006] x23: ffff000000000000 x22: ffff800017defa00
[  +0.000993] x21: 0000000000000004 x20: ffff0000c84dca50
[  +0.000983] x19: ffff800017defc88 x18: 0000000000000010
[  +0.000995] x17: 0000000000000000 x16: ffff800009352a98
[  +0.001008] x15: 000009a9d48628c0 x14: 0000000000000209
[  +0.000999] x13: 00000000000003d1 x12: 0000000000000001
[  +0.000986] x11: 0000000000000001 x10: 00000000000009d0
[  +0.000982] x9 : ffff0000c5418064 x8 : 0000000000000000
[  +0.001008] x7 : 0000000000000000 x6 : ffff800011f23980
[  +0.001025] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
[  +0.000997] x3 : ffff800017defc88 x2 : 0000000000010000
[  +0.000986] x1 : 0000000000000000 x0 : 00000000ffffffff
[  +0.001011] Call trace:
[  +0.000459]  __crc32c_le+0x84/0xe8
[  +0.000649]  crypto_shash_digest+0x34/0x58
[  +0.000766]  check_compressed_csum+0xd0/0x2b0 [btrfs]
[  +0.001011]  end_compressed_bio_read+0xb8/0x308 [btrfs]
[  +0.001060]  bio_endio+0x12c/0x1d8
[  +0.000651]  end_workqueue_fn+0x3c/0x60 [btrfs]
[  +0.000916]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
[  +0.000934]  process_one_work+0x1ec/0x4c0
[  +0.000751]  worker_thread+0x48/0x478
[  +0.000701]  kthread+0x158/0x160
[  +0.000618]  ret_from_fork+0x10/0x34
[  +0.000697] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
[  +0.001075] ---[ end trace d4f31b4f11a947b7 ]---
[ +14.775765] note: kworker/u64:1[5754] exited with preempt_count 1
------------------------


Thanks, Anand



On 2/2/2021 7:28 PM, Anand Jain wrote:
> 
> 
> On 2/2/2021 6:23 PM, Qu Wenruo wrote:
>>
>>
>> On 2021/2/2 下午5:21, Anand Jain wrote:
>>>
>>> Qu,
>>>
>>>   fstests ran fine on an aarch64 kvm with this patch set.
>>
>> Do you mean subpage patchset?
>>
>> With 4K sector size?
>> No way it can run fine...
> 
>   No . fstests ran with sectorsize == pagesize == 64k. These aren't
>   subpage though. I mean just regression checks.
> 
>> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
>> unable to locate the corresponding ordered extent.
>>
>>>   Further, I was running few hand tests as below, and it fails
>>>   with - Unable to handle kernel paging.
>>>
>>>   Test case looks something like..
>>>
>>>   On x86_64 create btrfs on a file 11g
>>>   copy /usr into /test-mnt stops at enospc
>>>   set compression property on the root sunvol
>>>   run defrag with -czstd
>>
>> I don't even consider compression a supported feature for subpage.
> 
>   It should fail the ro mount, which it didn't. Similar test case
>   without compression is fine.
> 
>> Are you really talking about the subpage patchset with 4K sector size,
>> on 64K page size AArch64?
> 
>   yes readonly mount test case as above.
> 
> Thanks, Anand
> 
> 
>> If really so, I appreciate your effort on testing very much, it means
>> the patchset is doing way better than it is.
>> But I don't really believe it's even true to pass fstests....
> 
> 
> 
>> Thanks,
>> Qu
>>
>>>   truncate a large file 4gb
>>>   punch holes on it
>>>   truncate couple of smaller files
>>>   unmount
>>>   send file to an aarch64 (64k pagesize) kvm
>>>   mount -o ro
>>>   run sha256sum on all the files
>>>
>>> ---------------------
>>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>> rd 0, flush 0, corrupt 9, gen 0
>>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>> rd 0, flush 0, corrupt 10, gen 0
>>> [37012.123917] Unable to handle kernel paging request at virtual address
>>> 0061d1f66c080000
>>> [37012.126104] Mem abort info:
>>> [37012.126951]   ESR = 0x96000004
>>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> [37012.129207]   SET = 0, FnV = 0
>>> [37012.130043]   EA = 0, S1PTW = 0
>>> [37012.131269] Data abort info:
>>> [37012.132165]   ISV = 0, ISS = 0x00000004
>>> [37012.133211]   CM = 0, WnR = 0
>>> [37012.134014] [0061d1f66c080000] address between user and kernel
>>> address ranges
>>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>>> 5.11.0-rc5+ #10
>>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>>> 02/06/2015
>>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>>> [37012.149266] lr : chksum_digest+0x24/0x40
>>> [37012.150420] sp : ffff80001638f8f0
>>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>>> [37012.174642] Call trace:
>>> [37012.175427]  __crc32c_le+0x84/0xe8
>>> [37012.176419]  crypto_shash_digest+0x34/0x58
>>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>>> [37012.180731]  bio_endio+0x12c/0x1d8
>>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>>> [37012.184570]  process_one_work+0x1ec/0x4c0
>>> [37012.185727]  worker_thread+0x48/0x478
>>> [37012.186823]  kthread+0x158/0x160
>>> [37012.187768]  ret_from_fork+0x10/0x34
>>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>>> ---------------
>>>
>>>   Could you please take a look?
>>>
>>> Thanks, Anand
David Sterba Feb. 3, 2021, 1:20 p.m. UTC | #11
On Tue, Jan 26, 2021 at 04:33:44PM +0800, Qu Wenruo wrote:
> Qu Wenruo (18):
>   btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>     PAGE_START_WRITEBACK
>   btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>     subpage support
>   btrfs: introduce the skeleton of btrfs_subpage structure
>   btrfs: make attach_extent_buffer_page() handle subpage case
>   btrfs: make grab_extent_buffer_from_page() handle subpage case
>   btrfs: support subpage for extent buffer page release
>   btrfs: attach private to dummy extent buffer pages
>   btrfs: introduce helpers for subpage uptodate status
>   btrfs: introduce helpers for subpage error status
>   btrfs: support subpage in set/clear_extent_buffer_uptodate()
>   btrfs: support subpage in btrfs_clone_extent_buffer
>   btrfs: support subpage in try_release_extent_buffer()
>   btrfs: introduce read_extent_buffer_subpage()
>   btrfs: support subpage in endio_readpage_update_page_status()
>   btrfs: introduce subpage metadata validation check
>   btrfs: introduce btrfs_subpage for data inodes
>   btrfs: integrate page status update for data read path into
>     begin/end_page_read()
>   btrfs: allow RO mount of 4K sector size fs on 64K page system

This is now in misc-next, with the replaced patch 17 sent recently,
thanks.
Qu Wenruo Feb. 4, 2021, 5:13 a.m. UTC | #12
On 2021/2/2 下午9:37, Anand Jain wrote:
>
>
> It is much simpler to reproduce. I am using two systems with different
> pagesizes to test the subpage readonly support.
>
> On a host with pagesize = 4k.
>    truncate -s 3g 3g.img
>    mkfs.btrfs ./3g.img
>    mount -o loop,compress=zstd ./3g.img /btrfs
>    xfs_io -f -c "pwrite -S 0xab 0 128k" /btrfs/foo
>    umount /btrfs
>
> Copy the file 3g.img to another host with pagesize = 64k.
>    mount -o ro,loop ./3g.img /btrfs
>    sha256sum /btrfs/foo
>
>    leads to Unable to handle kernel NULL pointer dereference

Thanks for the report.

Although in my case, I can't reproduce the crash, but only csum data
mismatch with "csum hole found for disk bytenr range" error message.

Anyway, it should be fixed for compressed read.

I'll investigate the case.

Thanks,
Qu
> ----------------
> [  +0.001387] BTRFS warning (device loop0): csum hole found for disk
> bytenr range [13672448, 13676544)
> [  +0.001514] BTRFS warning (device loop0): csum failed root 5 ino 257
> off 13697024 csum 0xbcd798f5 expected csum 0xf11c5ebf mirror 1
> [  +0.002301] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, rd
> 0, flush 0, corrupt 1, gen 0
> [  +0.001647] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [  +0.001670] Mem abort info:
> [  +0.000506]   ESR = 0x96000005
> [  +0.000471]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  +0.000783]   SET = 0, FnV = 0
> [  +0.000450]   EA = 0, S1PTW = 0
> [  +0.000462] Data abort info:
> [  +0.000530]   ISV = 0, ISS = 0x00000005
> [  +0.000755]   CM = 0, WnR = 0
> [  +0.000466] user pgtable: 64k pages, 48-bit VAs, pgdp=000000010717ce00
> [  +0.001027] [0000000000000000] pgd=0000000000000000,
> p4d=0000000000000000, pud=0000000000000000
> [  +0.001402] Internal error: Oops: 96000005 [#1] PREEMPT SMP
>
> Message from syslogd@aa3 at Feb  2 08:18:05 ...
>   kernel:Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  +0.000958] Modules linked in: btrfs blake2b_generic xor xor_neon
> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
> [  +0.001779] CPU: 25 PID: 5754 Comm: kworker/u64:1 Not tainted
> 5.11.0-rc5+ #10
> [  +0.001122] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
> 02/06/2015
> [  +0.001286] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> [  +0.001139] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
> [  +0.001110] pc : __crc32c_le+0x84/0xe8
> [  +0.000726] lr : chksum_digest+0x24/0x40
> [  +0.000731] sp : ffff800017def8f0
> [  +0.000624] x29: ffff800017def8f0 x28: ffff0000c84dca00
> [  +0.000994] x27: ffff0000c44f5400 x26: ffff0000e3a008b0
> [  +0.000985] x25: ffff800011df3948 x24: 0000004000000000
> [  +0.001006] x23: ffff000000000000 x22: ffff800017defa00
> [  +0.000993] x21: 0000000000000004 x20: ffff0000c84dca50
> [  +0.000983] x19: ffff800017defc88 x18: 0000000000000010
> [  +0.000995] x17: 0000000000000000 x16: ffff800009352a98
> [  +0.001008] x15: 000009a9d48628c0 x14: 0000000000000209
> [  +0.000999] x13: 00000000000003d1 x12: 0000000000000001
> [  +0.000986] x11: 0000000000000001 x10: 00000000000009d0
> [  +0.000982] x9 : ffff0000c5418064 x8 : 0000000000000000
> [  +0.001008] x7 : 0000000000000000 x6 : ffff800011f23980
> [  +0.001025] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
> [  +0.000997] x3 : ffff800017defc88 x2 : 0000000000010000
> [  +0.000986] x1 : 0000000000000000 x0 : 00000000ffffffff
> [  +0.001011] Call trace:
> [  +0.000459]  __crc32c_le+0x84/0xe8
> [  +0.000649]  crypto_shash_digest+0x34/0x58
> [  +0.000766]  check_compressed_csum+0xd0/0x2b0 [btrfs]
> [  +0.001011]  end_compressed_bio_read+0xb8/0x308 [btrfs]
> [  +0.001060]  bio_endio+0x12c/0x1d8
> [  +0.000651]  end_workqueue_fn+0x3c/0x60 [btrfs]
> [  +0.000916]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
> [  +0.000934]  process_one_work+0x1ec/0x4c0
> [  +0.000751]  worker_thread+0x48/0x478
> [  +0.000701]  kthread+0x158/0x160
> [  +0.000618]  ret_from_fork+0x10/0x34
> [  +0.000697] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
> [  +0.001075] ---[ end trace d4f31b4f11a947b7 ]---
> [ +14.775765] note: kworker/u64:1[5754] exited with preempt_count 1
> ------------------------
>
>
> Thanks, Anand
>
>
>
> On 2/2/2021 7:28 PM, Anand Jain wrote:
>>
>>
>> On 2/2/2021 6:23 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/2/2 下午5:21, Anand Jain wrote:
>>>>
>>>> Qu,
>>>>
>>>>   fstests ran fine on an aarch64 kvm with this patch set.
>>>
>>> Do you mean subpage patchset?
>>>
>>> With 4K sector size?
>>> No way it can run fine...
>>
>>   No . fstests ran with sectorsize == pagesize == 64k. These aren't
>>   subpage though. I mean just regression checks.
>>
>>> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
>>> unable to locate the corresponding ordered extent.
>>>
>>>>   Further, I was running few hand tests as below, and it fails
>>>>   with - Unable to handle kernel paging.
>>>>
>>>>   Test case looks something like..
>>>>
>>>>   On x86_64 create btrfs on a file 11g
>>>>   copy /usr into /test-mnt stops at enospc
>>>>   set compression property on the root sunvol
>>>>   run defrag with -czstd
>>>
>>> I don't even consider compression a supported feature for subpage.
>>
>>   It should fail the ro mount, which it didn't. Similar test case
>>   without compression is fine.
>>
>>> Are you really talking about the subpage patchset with 4K sector size,
>>> on 64K page size AArch64?
>>
>>   yes readonly mount test case as above.
>>
>> Thanks, Anand
>>
>>
>>> If really so, I appreciate your effort on testing very much, it means
>>> the patchset is doing way better than it is.
>>> But I don't really believe it's even true to pass fstests....
>>
>>
>>
>>> Thanks,
>>> Qu
>>>
>>>>   truncate a large file 4gb
>>>>   punch holes on it
>>>>   truncate couple of smaller files
>>>>   unmount
>>>>   send file to an aarch64 (64k pagesize) kvm
>>>>   mount -o ro
>>>>   run sha256sum on all the files
>>>>
>>>> ---------------------
>>>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>>>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>>>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>>> rd 0, flush 0, corrupt 9, gen 0
>>>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>>>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>>>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>>> rd 0, flush 0, corrupt 10, gen 0
>>>> [37012.123917] Unable to handle kernel paging request at virtual
>>>> address
>>>> 0061d1f66c080000
>>>> [37012.126104] Mem abort info:
>>>> [37012.126951]   ESR = 0x96000004
>>>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [37012.129207]   SET = 0, FnV = 0
>>>> [37012.130043]   EA = 0, S1PTW = 0
>>>> [37012.131269] Data abort info:
>>>> [37012.132165]   ISV = 0, ISS = 0x00000004
>>>> [37012.133211]   CM = 0, WnR = 0
>>>> [37012.134014] [0061d1f66c080000] address between user and kernel
>>>> address ranges
>>>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>>>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>>>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>>>> 5.11.0-rc5+ #10
>>>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>>>> 02/06/2015
>>>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>>>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>>>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>>>> [37012.149266] lr : chksum_digest+0x24/0x40
>>>> [37012.150420] sp : ffff80001638f8f0
>>>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>>>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>>>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>>>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>>>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>>>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>>>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>>>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>>>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>>>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>>>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>>>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>>>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>>>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>>>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>>>> [37012.174642] Call trace:
>>>> [37012.175427]  __crc32c_le+0x84/0xe8
>>>> [37012.176419]  crypto_shash_digest+0x34/0x58
>>>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>>>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>>>> [37012.180731]  bio_endio+0x12c/0x1d8
>>>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>>>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>>>> [37012.184570]  process_one_work+0x1ec/0x4c0
>>>> [37012.185727]  worker_thread+0x48/0x478
>>>> [37012.186823]  kthread+0x158/0x160
>>>> [37012.187768]  ret_from_fork+0x10/0x34
>>>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>>>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>>>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>>>> ---------------
>>>>
>>>>   Could you please take a look?
>>>>
>>>> Thanks, Anand
>