[v3,0/8] btrfs: preparation patches for the incoming metadata folio conversion

Message ID	cover.1689418958.git.wqu@suse.com (mailing list archive)
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v3 0/8] btrfs: preparation patches for the incoming metadata folio conversion Date: Sat, 15 Jul 2023 19:08:26 +0800 Message-ID: <cover.1689418958.git.wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: preparation patches for the incoming metadata folio conversion \| expand [v3,0/8] btrfs: preparation patches for the incoming metadata folio conversion [v3,1/8] btrfs: tests: enhance extent buffer bitmap tests [v3,2/8] btrfs: tests: add self tests for extent buffer memory operations [v3,3/8] btrfs: refactor extent buffer bitmaps operations [v3,4/8] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() [v3,5/8] btrfs: refactor main loop in copy_extent_buffer_full() [v3,6/8] btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer() [v3,7/8] btrfs: refactor main loop in memcpy_extent_buffer() [v3,8/8] btrfs: refactor main loop in memmove_extent_buffer()

Qu Wenruo July 15, 2023, 11:08 a.m. UTC

[CHANGELOG]
v2:
- Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers

v3:
- Fix an undefined behavior bug in memcpy_extent_buffer()
  Unlike the name, memcpy_extent_buffer() needs to handle overlapping
  ranges, thus it calls copy_pages() which do overlap checks and switch
  to memmove() when needed.

  Here we introduce __write_extent_buffer() which allows us to switch
  to go memmove() if needed.

- Also refactor memmove_extent_buffer()
  Since we have __write_extent_buffer() which can go memmove(), it's
  not hard to refactor memmove_extent_buffer().

  But there is still a pitfall that we have to handle double page
  boundaries as the old behavior, explained in the last patch.

- Add selftests on extent buffer memory operations 
  I have failed too many times refactoring memmove_extent_buffer(), the
  wasted time should be a memorial for my stupidity.

[BACKGROUND]

Recently I'm checking on the feasibility on converting metadata handling
to go a folio based solution.

The best part of using a single folio for metadata is, we can get rid of
the complexity of cross-page handling, everything would be just a single
memory operation on a continuous memory range.

[PITFALLS]

One of the biggest problem for metadata folio conversion is, we still
need the current page based solution (or folios with order 0) as a
fallback solution when we can not get a high order folio.

In that case, there would be a hell to handle the four different
combinations (folio/folio, folio/page, page/folio, page/page) for extent
buffer helpers involving two extent buffers.

Although there are some new ideas on how to handle metadata memory (e.g.
go full vmallocated memory), reducing the open-coded memory handling for
metadata should always be a good start point.

[OBJECTIVE]

So this patchset is the preparation to reduce direct page operations for
metadata.

The patchset would do this mostly by concentrating the operations to use
the common helper, write_extent_buffer() and read_extent_buffer().

For bitmap operations it's much complex, thus this patchset refactor it
completely to go a 3 part solution:

- Handle the first byte
- Handle the byte aligned ranges
- Handle the last byte

This needs more complex testing (which I failed several times during
development) to prevent regression, thus extent buffer bitmap selftests
have been enhanced to catch all those new possible corner cases.

The same applies to memcpy_extent_buffer() and memmove_extent_buffer().
There are several pitfalls:

- memcpy_extent_buffer() name is not accurate
  Unlike plain memcpy(), memcpy_extent_buffer() needs to handle
  overlapping ranges.

- memmove_extent_buffer() must handle double page boundaries
  Explained in the last patch, thus its refactor can not go the same
  direction as memcpy_extent_buffer()

With too many times spent on debugging memmove_extent_buffer(), a new
selftest is added to prevent regression.

Qu Wenruo (8):
  btrfs: tests: enhance extent buffer bitmap tests
  btrfs: tests: add self tests for extent buffer memory operations
  btrfs: refactor extent buffer bitmaps operations
  btrfs: use write_extent_buffer() to implement
    write_extent_buffer_*id()
  btrfs: refactor main loop in copy_extent_buffer_full()
  btrfs: copy all pages at once at the end of
    btrfs_clone_extent_buffer()
  btrfs: refactor main loop in memcpy_extent_buffer()
  btrfs: refactor main loop in memmove_extent_buffer()

 fs/btrfs/extent_io.c             | 292 +++++++++++++----------------
 fs/btrfs/extent_io.h             |  19 +-
 fs/btrfs/tests/extent-io-tests.c | 309 +++++++++++++++++++++++++------
 3 files changed, 396 insertions(+), 224 deletions(-)

David Sterba July 18, 2023, 4:01 p.m. UTC | #1

On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
> 
> v3:
> - Fix an undefined behavior bug in memcpy_extent_buffer()
>   Unlike the name, memcpy_extent_buffer() needs to handle overlapping
>   ranges, thus it calls copy_pages() which do overlap checks and switch
>   to memmove() when needed.
> 
>   Here we introduce __write_extent_buffer() which allows us to switch
>   to go memmove() if needed.
> 
> - Also refactor memmove_extent_buffer()
>   Since we have __write_extent_buffer() which can go memmove(), it's
>   not hard to refactor memmove_extent_buffer().
> 
>   But there is still a pitfall that we have to handle double page
>   boundaries as the old behavior, explained in the last patch.
> 
> - Add selftests on extent buffer memory operations 
>   I have failed too many times refactoring memmove_extent_buffer(), the
>   wasted time should be a memorial for my stupidity.

Seems that v3 has proceeded up to btrfs/143 that prints a lot test
output errors and following tests fails too. It's on top of misc-next so
it could be caused by some other recent patch. I'll do another round, if
this patchset turns out to be ok I'll add it to misc-next.

Qu Wenruo July 18, 2023, 10:51 p.m. UTC | #2

On 2023/7/19 00:01, David Sterba wrote:
> On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
>>
>> v3:
>> - Fix an undefined behavior bug in memcpy_extent_buffer()
>>    Unlike the name, memcpy_extent_buffer() needs to handle overlapping
>>    ranges, thus it calls copy_pages() which do overlap checks and switch
>>    to memmove() when needed.
>>
>>    Here we introduce __write_extent_buffer() which allows us to switch
>>    to go memmove() if needed.
>>
>> - Also refactor memmove_extent_buffer()
>>    Since we have __write_extent_buffer() which can go memmove(), it's
>>    not hard to refactor memmove_extent_buffer().
>>
>>    But there is still a pitfall that we have to handle double page
>>    boundaries as the old behavior, explained in the last patch.
>>
>> - Add selftests on extent buffer memory operations
>>    I have failed too many times refactoring memmove_extent_buffer(), the
>>    wasted time should be a memorial for my stupidity.
>
> Seems that v3 has proceeded up to btrfs/143 that prints a lot test
> output errors and following tests fails too. It's on top of misc-next so
> it could be caused by some other recent patch. I'll do another round, if
> this patchset turns out to be ok I'll add it to misc-next.

btrfs/143 has a known (?) regression that dm devices are not properly
cleaned up, causing all later tests to fail (as scratch device is taken
by the dm device, all later mkfs would fail).

I notice that is fixed recently in upstream for-next branch, you may
want to update/rebase your fstests.

Thanks,
Qu

David Sterba July 19, 2023, 9:49 p.m. UTC | #3

On Wed, Jul 19, 2023 at 06:51:18AM +0800, Qu Wenruo wrote:
> On 2023/7/19 00:01, David Sterba wrote:
> > On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
> >> [CHANGELOG]
> >> v2:
> >> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
> >>
> >> v3:
> >> - Fix an undefined behavior bug in memcpy_extent_buffer()
> >>    Unlike the name, memcpy_extent_buffer() needs to handle overlapping
> >>    ranges, thus it calls copy_pages() which do overlap checks and switch
> >>    to memmove() when needed.
> >>
> >>    Here we introduce __write_extent_buffer() which allows us to switch
> >>    to go memmove() if needed.
> >>
> >> - Also refactor memmove_extent_buffer()
> >>    Since we have __write_extent_buffer() which can go memmove(), it's
> >>    not hard to refactor memmove_extent_buffer().
> >>
> >>    But there is still a pitfall that we have to handle double page
> >>    boundaries as the old behavior, explained in the last patch.
> >>
> >> - Add selftests on extent buffer memory operations
> >>    I have failed too many times refactoring memmove_extent_buffer(), the
> >>    wasted time should be a memorial for my stupidity.
> >
> > Seems that v3 has proceeded up to btrfs/143 that prints a lot test
> > output errors and following tests fails too. It's on top of misc-next so
> > it could be caused by some other recent patch. I'll do another round, if
> > this patchset turns out to be ok I'll add it to misc-next.
> 
> btrfs/143 has a known (?) regression that dm devices are not properly
> cleaned up, causing all later tests to fail (as scratch device is taken
> by the dm device, all later mkfs would fail).
> 
> I notice that is fixed recently in upstream for-next branch, you may
> want to update/rebase your fstests.

That's quite possible, thanks. I've updated the VMs and restarted tests,
we'll see.

David Sterba July 20, 2023, 3:06 p.m. UTC | #4

On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
> 
> v3:
> - Fix an undefined behavior bug in memcpy_extent_buffer()
>   Unlike the name, memcpy_extent_buffer() needs to handle overlapping
>   ranges, thus it calls copy_pages() which do overlap checks and switch
>   to memmove() when needed.
> 
>   Here we introduce __write_extent_buffer() which allows us to switch
>   to go memmove() if needed.
> 
> - Also refactor memmove_extent_buffer()
>   Since we have __write_extent_buffer() which can go memmove(), it's
>   not hard to refactor memmove_extent_buffer().
> 
>   But there is still a pitfall that we have to handle double page
>   boundaries as the old behavior, explained in the last patch.
> 
> - Add selftests on extent buffer memory operations 
>   I have failed too many times refactoring memmove_extent_buffer(), the
>   wasted time should be a memorial for my stupidity.

btrfs/125 kasan complains:

btrfs/125        [01:09:17][12387.340788] run fstests btrfs/125 at 2023-07-20 01:09:18
[12389.539422] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e devid 1 transid 297 /dev/vda scanned by mount (1360)
[12389.543907] BTRFS info (device vda): using sha256 (sha256-generic) checksum algorithm
[12389.545345] BTRFS info (device vda): using free space tree
[12389.568662] BTRFS info (device vda): auto enabling async discard
[12393.628549] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (1544)
[12393.630846] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 2 transid 6 /dev/vdc scanned by mkfs.btrfs (1544)
[12393.633042] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 3 transid 6 /dev/vdd scanned by mkfs.btrfs (1544)
[12393.675240] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
[12393.676651] BTRFS info (device vdb): using free space tree
[12393.705607] BTRFS info (device vdb): auto enabling async discard
[12393.708477] BTRFS info (device vdb): checking UUID tree
[12394.479228] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 2 transid 8 /dev/vdc scanned by mount (1573)
[12394.481329] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 1 transid 8 /dev/vdb scanned by mount (1573)
[12394.484821] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
[12394.486018] BTRFS info (device vdb): allowing degraded mounts
[12394.486801] BTRFS info (device vdb): using free space tree
[12394.495639] BTRFS warning (device vdb): devid 3 uuid 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
[12394.499898] BTRFS warning (device vdb): devid 3 uuid 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
[12394.523726] BTRFS info (device vdb): auto enabling async discard
[12398.021206] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e devid 1 transid 298 /dev/vda scanned by btrfs (1597)
[12398.066913] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
[12398.068414] BTRFS info (device vdb): using free space tree
[12398.080629] BTRFS error (device vdb): bad tree block start, mirror 1 want 40239104 have 31129600
[12398.085719] BTRFS info (device vdb): read error corrected: ino 0 off 40239104 (dev /dev/vdd sector 19840)
[12398.087705] BTRFS info (device vdb): read error corrected: ino 0 off 40243200 (dev /dev/vdd sector 19848)
[12398.089689] BTRFS info (device vdb): read error corrected: ino 0 off 40247296 (dev /dev/vdd sector 19856)
[12398.091575] BTRFS info (device vdb): read error corrected: ino 0 off 40251392 (dev /dev/vdd sector 19864)
[12398.093929] BTRFS error (device vdb): bad tree block start, mirror 1 want 40255488 have 31145984
[12398.097548] BTRFS info (device vdb): read error corrected: ino 0 off 40255488 (dev /dev/vdd sector 19872)
[12398.099311] BTRFS info (device vdb): read error corrected: ino 0 off 40259584 (dev /dev/vdd sector 19880)
[12398.101038] BTRFS info (device vdb): read error corrected: ino 0 off 40263680 (dev /dev/vdd sector 19888)
[12398.102663] BTRFS info (device vdb): read error corrected: ino 0 off 40267776 (dev /dev/vdd sector 19896)
[12398.105020] BTRFS error (device vdb): bad tree block start, mirror 1 want 40271872 have 31162368
[12398.107479] BTRFS info (device vdb): read error corrected: ino 0 off 40271872 (dev /dev/vdd sector 19904)
[12398.109094] BTRFS info (device vdb): read error corrected: ino 0 off 40275968 (dev /dev/vdd sector 19912)
[12398.111111] BTRFS error (device vdb): bad tree block start, mirror 1 want 40222720 have 31113216
[12398.121818] BTRFS info (device vdb): auto enabling async discard
[12398.219247] BTRFS error (device vdb): bad tree block start, mirror 1 want 40288256 have 31178752
[12398.233989] BTRFS info (device vdb): balance: start -d -m -s
[12398.235327] BTRFS info (device vdb): relocating block group 2365194240 flags data|raid5
[12398.310482] BTRFS error (device vdb): bad tree block start, mirror 1 want 40189952 have 31080448
[12398.482607] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
[12398.489325] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
[12398.493394] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
[12398.496146] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
[12398.499510] BTRFS error (device vdb): parent transid verify failed on logical 39108608 mirror 1 wanted 9 found 7
[12398.736591] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
[12398.740199] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
[12398.907346] BTRFS info (device vdb): balance: ended with status: -5
[12399.168513] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
[12399.174882] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
[12399.180441] ==================================================================
[12399.183100] BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
[12399.186056] Read of size 8 at addr ffff888029c96c80 by task kworker/u8:4/21890
[12399.188440] 
[12399.188965] CPU: 1 PID: 21890 Comm: kworker/u8:4 Not tainted 6.5.0-rc2-default+ #2130
[12399.191616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[12399.193366] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[12399.194534] Call Trace:
[12399.195039]  <TASK>
[12399.195484]  dump_stack_lvl+0x46/0x70
[12399.196182]  print_address_description.constprop.0+0x30/0x420
[12399.197136]  ? preempt_count_sub+0x18/0xc0
[12399.197858]  print_report+0xb0/0x260
[12399.198497]  ? __virt_addr_valid+0xbb/0xf0
[12399.199204]  ? kasan_addr_to_slab+0x94/0xc0
[12399.199936]  kasan_report+0xbe/0xf0
[12399.200562]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
[12399.201618]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
[12399.202667]  btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
[12399.203703]  ? lock_sync+0x100/0x100
[12399.204344]  ? try_to_wake_up+0x50/0x880
[12399.205025]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
[12399.206116]  ? mark_held_locks+0x1a/0x80
[12399.206802]  process_one_work+0x504/0xa00
[12399.207530]  ? pwq_dec_nr_in_flight+0x100/0x100
[12399.208305]  ? worker_thread+0x160/0x630
[12399.208996]  worker_thread+0x8e/0x630
[12399.209638]  ? __kthread_parkme+0xd8/0xf0
[12399.210331]  ? process_one_work+0xa00/0xa00
[12399.211032]  kthread+0x198/0x1e0
[12399.211634]  ? kthread_complete_and_exit+0x20/0x20
[12399.212432]  ret_from_fork+0x2d/0x50
[12399.213087]  ? kthread_complete_and_exit+0x20/0x20
[12399.213895]  ret_from_fork_asm+0x11/0x20
[12399.214585] RIP: 0000:0x0
[12399.215098] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[12399.216131] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[12399.217361] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[12399.218442] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[12399.219546] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[12399.220629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12399.221710] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[12399.222811]  </TASK>
[12399.223263] 
[12399.223628] Allocated by task 1621:
[12399.224238]  kasan_save_stack+0x1c/0x40
[12399.224900]  kasan_set_track+0x21/0x30
[12399.225558]  __kasan_slab_alloc+0x62/0x70
[12399.226240]  kmem_cache_alloc+0x194/0x370
[12399.226920]  mempool_alloc+0xe1/0x260
[12399.227573]  bio_alloc_bioset+0x2c7/0x450
[12399.228266]  btrfs_bio_alloc+0x2e/0x50 [btrfs]
[12399.229208]  submit_extent_page+0x2e0/0x5c0 [btrfs]
[12399.230206]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
[12399.231188]  extent_readahead+0x1c3/0x2b0 [btrfs]
[12399.232141]  read_pages+0x10e/0x5f0
[12399.232748]  page_cache_ra_unbounded+0x1ed/0x2c0
[12399.233508]  filemap_get_pages+0x218/0x620
[12399.234196]  filemap_read+0x1ef/0x660
[12399.234825]  vfs_read+0x3b7/0x4f0
[12399.235433]  ksys_read+0xc7/0x160
[12399.236035]  do_syscall_64+0x3d/0x90
[12399.236675]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[12399.237505] 
[12399.237856] Freed by task 1621:
[12399.238423]  kasan_save_stack+0x1c/0x40
[12399.239084]  kasan_set_track+0x21/0x30
[12399.239728]  kasan_save_free_info+0x27/0x40
[12399.240435]  ____kasan_slab_free+0x1c2/0x230
[12399.241141]  kmem_cache_free+0x13a/0x410
[12399.241813]  bio_free+0x76/0xa0
[12399.242386]  end_bio_extent_readpage+0x139/0x400 [btrfs]
[12399.243434]  btrfs_submit_chunk+0x6e9/0x9b0 [btrfs]
[12399.244421]  btrfs_submit_bio+0x21/0x60 [btrfs]
[12399.245356]  submit_one_bio+0x6a/0xb0 [btrfs]
[12399.246273]  submit_extent_page+0x232/0x5c0 [btrfs]
[12399.247268]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
[12399.248648]  extent_readahead+0x1c3/0x2b0 [btrfs]
[12399.249608]  read_pages+0x10e/0x5f0
[12399.250236]  page_cache_ra_unbounded+0x1ed/0x2c0
[12399.251006]  filemap_get_pages+0x218/0x620
[12399.251688]  filemap_read+0x1ef/0x660
[12399.252304]  vfs_read+0x3b7/0x4f0
[12399.252880]  ksys_read+0xc7/0x160
[12399.253463]  do_syscall_64+0x3d/0x90
[12399.254086]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[12399.254878] 
[12399.255239] The buggy address belongs to the object at ffff888029c96c80
[12399.255239]  which belongs to the cache biovec-max of size 4096
[12399.257027] The buggy address is located 0 bytes inside of
[12399.257027]  freed 4096-byte region [ffff888029c96c80, ffff888029c97c80)
[12399.258790] 
[12399.259143] The buggy address belongs to the physical page:
[12399.259994] page:ffff88807e872400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x29c90
[12399.261410] head:ffff88807e872400 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[12399.262656] flags: 0xa80000010200(slab|head|section=5|zone=1)
[12399.263548] page_type: 0xffffffff()
[12399.264144] raw: 0000a80000010200 ffff888001310ac0 ffff88807e099a10 ffff888001312b70
[12399.265851] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[12399.267051] page dumped because: kasan: bad access detected
[12399.267900] 
[12399.268239] Memory state around the buggy address:
[12399.268986]  ffff888029c96b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[12399.270092]  ffff888029c96c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[12399.271203] >ffff888029c96c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[12399.272590]                    ^
[12399.273152]  ffff888029c96d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[12399.274267]  ffff888029c96d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[12399.275385] ==================================================================
[12399.276551] Disabling lock debugging due to kernel taint
[12399.277362] assertion failed: bv->bv_len == fs_info->sectorsize, in fs/btrfs/inode.c:3441
[12399.278654] ------------[ cut here ]------------
[12399.279387] kernel BUG at fs/btrfs/inode.c:3441!
[12399.280165] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[12399.280979] CPU: 1 PID: 21890 Comm: kworker/u8:4 Tainted: G    B              6.5.0-rc2-default+ #2130
[12399.282353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[12399.283964] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[12399.285309] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.288907] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
[12399.290022] RAX: 000000000000004d RBX: ffff888015166d80 RCX: 0000000000000000
[12399.291074] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI: ffffffff99c9e0e0
[12399.292385] RBP: ffff888049277cc0 R08: 0000000000000001 R09: ffffed100924ef0f
[12399.293411] R10: ffff88804927787f R11: fffffffffffe37c0 R12: ffff888014bc8000
[12399.294446] R13: ffff88804abdc000 R14: 0000000000000655 R15: ffff8880168b3b78
[12399.295468] FS:  0000000000000000(0000) GS:ffff888068c00000(0000) knlGS:0000000000000000
[12399.296693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12399.297549] CR2: ffffffffffffffd6 CR3: 000000007288b000 CR4: 00000000000006a0
[12399.298577] Call Trace:
[12399.299049]  <TASK>
[12399.299472]  ? die+0x32/0x80
[12399.302753]  ? do_trap+0x12d/0x160
[12399.303356]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.304487]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.305593]  ? do_error_trap+0x90/0x130
[12399.306241]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.307314]  ? handle_invalid_op+0x2c/0x30
[12399.307999]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.308972]  ? exc_invalid_op+0x29/0x40
[12399.309616]  ? asm_exc_invalid_op+0x16/0x20
[12399.310302]  ? preempt_count_sub+0x18/0xc0
[12399.310988]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.311976]  ? end_report+0x7a/0x130
[12399.312594]  ? btrfs_check_sector_csum+0x210/0x210 [btrfs]
[12399.313625]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
[12399.314637]  btrfs_check_read_bio+0x238/0x8d0 [btrfs]
[12399.315641]  ? lock_sync+0x100/0x100
[12399.316247]  ? try_to_wake_up+0x50/0x880
[12399.316906]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
[12399.317946]  process_one_work+0x504/0xa00
[12399.318625]  ? pwq_dec_nr_in_flight+0x100/0x100
[12399.319363]  ? worker_thread+0x160/0x630
[12399.320023]  worker_thread+0x8e/0x630
[12399.320640]  ? __kthread_parkme+0xd8/0xf0
[12399.321299]  ? process_one_work+0xa00/0xa00
[12399.321990]  kthread+0x198/0x1e0
[12399.322564]  ? kthread_complete_and_exit+0x20/0x20
[12399.323334]  ret_from_fork+0x2d/0x50
[12399.323945]  ? kthread_complete_and_exit+0x20/0x20
[12399.324713]  ret_from_fork_asm+0x11/0x20
[12399.325364] RIP: 0000:0x0
[12399.325855] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[12399.326833] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[12399.328007] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[12399.329049] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[12399.330095] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[12399.331146] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12399.332197] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[12399.333246]  </TASK>
[12399.333679] Modules linked in: dm_flakey dm_mod btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
[12399.335968] ---[ end trace 0000000000000000 ]---
[12399.336714] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
[12399.340450] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
[12399.341276] RAX: 000000000000004d RBX: ffff888015166d80 RCX: 0000000000000000
[12399.342337] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI: ffffffff99c9e0e0
[12399.343389] RBP: ffff888049277cc0 R08: 0000000000000001 R09: ffffed100924ef0f
[12399.349246] R10: ffff88804927787f R11: fffffffffffe37c0 R12: ffff888014bc8000
[12399.350585] R13: ffff88804abdc000 R14: 0000000000000655 R15: ffff8880168b3b78
[12399.351846] FS:  0000000000000000(0000) GS:ffff888069000000(0000) knlGS:0000000000000000
[12399.353471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12399.354460] CR2: 000055a729c8d000 CR3: 000000003b421000 CR4: 00000000000006a0
[12399.357113] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
Connection closed by foreign host.

Qu Wenruo July 20, 2023, 10:15 p.m. UTC | #5

On 2023/7/20 23:06, David Sterba wrote:
> On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
>>
>> v3:
>> - Fix an undefined behavior bug in memcpy_extent_buffer()
>>    Unlike the name, memcpy_extent_buffer() needs to handle overlapping
>>    ranges, thus it calls copy_pages() which do overlap checks and switch
>>    to memmove() when needed.
>>
>>    Here we introduce __write_extent_buffer() which allows us to switch
>>    to go memmove() if needed.
>>
>> - Also refactor memmove_extent_buffer()
>>    Since we have __write_extent_buffer() which can go memmove(), it's
>>    not hard to refactor memmove_extent_buffer().
>>
>>    But there is still a pitfall that we have to handle double page
>>    boundaries as the old behavior, explained in the last patch.
>>
>> - Add selftests on extent buffer memory operations
>>    I have failed too many times refactoring memmove_extent_buffer(), the
>>    wasted time should be a memorial for my stupidity.
>
> btrfs/125 kasan complains:
>
> btrfs/125        [01:09:17][12387.340788] run fstests btrfs/125 at 2023-07-20 01:09:18
> [12389.539422] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e devid 1 transid 297 /dev/vda scanned by mount (1360)
> [12389.543907] BTRFS info (device vda): using sha256 (sha256-generic) checksum algorithm
> [12389.545345] BTRFS info (device vda): using free space tree
> [12389.568662] BTRFS info (device vda): auto enabling async discard
> [12393.628549] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (1544)
> [12393.630846] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 2 transid 6 /dev/vdc scanned by mkfs.btrfs (1544)
> [12393.633042] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 3 transid 6 /dev/vdd scanned by mkfs.btrfs (1544)
> [12393.675240] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
> [12393.676651] BTRFS info (device vdb): using free space tree
> [12393.705607] BTRFS info (device vdb): auto enabling async discard
> [12393.708477] BTRFS info (device vdb): checking UUID tree
> [12394.479228] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 2 transid 8 /dev/vdc scanned by mount (1573)
> [12394.481329] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c devid 1 transid 8 /dev/vdb scanned by mount (1573)
> [12394.484821] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
> [12394.486018] BTRFS info (device vdb): allowing degraded mounts
> [12394.486801] BTRFS info (device vdb): using free space tree
> [12394.495639] BTRFS warning (device vdb): devid 3 uuid 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
> [12394.499898] BTRFS warning (device vdb): devid 3 uuid 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
> [12394.523726] BTRFS info (device vdb): auto enabling async discard
> [12398.021206] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e devid 1 transid 298 /dev/vda scanned by btrfs (1597)
> [12398.066913] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
> [12398.068414] BTRFS info (device vdb): using free space tree
> [12398.080629] BTRFS error (device vdb): bad tree block start, mirror 1 want 40239104 have 31129600
> [12398.085719] BTRFS info (device vdb): read error corrected: ino 0 off 40239104 (dev /dev/vdd sector 19840)
> [12398.087705] BTRFS info (device vdb): read error corrected: ino 0 off 40243200 (dev /dev/vdd sector 19848)
> [12398.089689] BTRFS info (device vdb): read error corrected: ino 0 off 40247296 (dev /dev/vdd sector 19856)
> [12398.091575] BTRFS info (device vdb): read error corrected: ino 0 off 40251392 (dev /dev/vdd sector 19864)
> [12398.093929] BTRFS error (device vdb): bad tree block start, mirror 1 want 40255488 have 31145984
> [12398.097548] BTRFS info (device vdb): read error corrected: ino 0 off 40255488 (dev /dev/vdd sector 19872)
> [12398.099311] BTRFS info (device vdb): read error corrected: ino 0 off 40259584 (dev /dev/vdd sector 19880)
> [12398.101038] BTRFS info (device vdb): read error corrected: ino 0 off 40263680 (dev /dev/vdd sector 19888)
> [12398.102663] BTRFS info (device vdb): read error corrected: ino 0 off 40267776 (dev /dev/vdd sector 19896)
> [12398.105020] BTRFS error (device vdb): bad tree block start, mirror 1 want 40271872 have 31162368
> [12398.107479] BTRFS info (device vdb): read error corrected: ino 0 off 40271872 (dev /dev/vdd sector 19904)
> [12398.109094] BTRFS info (device vdb): read error corrected: ino 0 off 40275968 (dev /dev/vdd sector 19912)
> [12398.111111] BTRFS error (device vdb): bad tree block start, mirror 1 want 40222720 have 31113216
> [12398.121818] BTRFS info (device vdb): auto enabling async discard
> [12398.219247] BTRFS error (device vdb): bad tree block start, mirror 1 want 40288256 have 31178752
> [12398.233989] BTRFS info (device vdb): balance: start -d -m -s
> [12398.235327] BTRFS info (device vdb): relocating block group 2365194240 flags data|raid5
> [12398.310482] BTRFS error (device vdb): bad tree block start, mirror 1 want 40189952 have 31080448
> [12398.482607] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
> [12398.489325] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
> [12398.493394] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
> [12398.496146] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
> [12398.499510] BTRFS error (device vdb): parent transid verify failed on logical 39108608 mirror 1 wanted 9 found 7
> [12398.736591] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
> [12398.740199] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7
> [12398.907346] BTRFS info (device vdb): balance: ended with status: -5
> [12399.168513] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
> [12399.174882] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 2 wanted 9 found 7

Comparing it to my pass runs, the rebuild is not working for metadata.

As my passing bios shows no error on mirror 2 (rebuilt from P).

> [12399.180441] ==================================================================
> [12399.183100] BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> [12399.186056] Read of size 8 at addr ffff888029c96c80 by task kworker/u8:4/21890
> [12399.188440]
> [12399.188965] CPU: 1 PID: 21890 Comm: kworker/u8:4 Not tainted 6.5.0-rc2-default+ #2130
> [12399.191616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> [12399.193366] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
> [12399.194534] Call Trace:
> [12399.195039]  <TASK>
> [12399.195484]  dump_stack_lvl+0x46/0x70
> [12399.196182]  print_address_description.constprop.0+0x30/0x420
> [12399.197136]  ? preempt_count_sub+0x18/0xc0
> [12399.197858]  print_report+0xb0/0x260
> [12399.198497]  ? __virt_addr_valid+0xbb/0xf0
> [12399.199204]  ? kasan_addr_to_slab+0x94/0xc0
> [12399.199936]  kasan_report+0xbe/0xf0
> [12399.200562]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> [12399.201618]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> [12399.202667]  btrfs_check_read_bio+0x19c/0x8d0 [btrfs]

This is werid, as btrfs_check_read_bio() can only happen for data bios.

Let me double check what's going wrong.

Thanks,
Qu
> [12399.203703]  ? lock_sync+0x100/0x100
> [12399.204344]  ? try_to_wake_up+0x50/0x880
> [12399.205025]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
> [12399.206116]  ? mark_held_locks+0x1a/0x80
> [12399.206802]  process_one_work+0x504/0xa00
> [12399.207530]  ? pwq_dec_nr_in_flight+0x100/0x100
> [12399.208305]  ? worker_thread+0x160/0x630
> [12399.208996]  worker_thread+0x8e/0x630
> [12399.209638]  ? __kthread_parkme+0xd8/0xf0
> [12399.210331]  ? process_one_work+0xa00/0xa00
> [12399.211032]  kthread+0x198/0x1e0
> [12399.211634]  ? kthread_complete_and_exit+0x20/0x20
> [12399.212432]  ret_from_fork+0x2d/0x50
> [12399.213087]  ? kthread_complete_and_exit+0x20/0x20
> [12399.213895]  ret_from_fork_asm+0x11/0x20
> [12399.214585] RIP: 0000:0x0
> [12399.215098] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> [12399.216131] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
> [12399.217361] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [12399.218442] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [12399.219546] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [12399.220629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [12399.221710] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [12399.222811]  </TASK>
> [12399.223263]
> [12399.223628] Allocated by task 1621:
> [12399.224238]  kasan_save_stack+0x1c/0x40
> [12399.224900]  kasan_set_track+0x21/0x30
> [12399.225558]  __kasan_slab_alloc+0x62/0x70
> [12399.226240]  kmem_cache_alloc+0x194/0x370
> [12399.226920]  mempool_alloc+0xe1/0x260
> [12399.227573]  bio_alloc_bioset+0x2c7/0x450
> [12399.228266]  btrfs_bio_alloc+0x2e/0x50 [btrfs]
> [12399.229208]  submit_extent_page+0x2e0/0x5c0 [btrfs]
> [12399.230206]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
> [12399.231188]  extent_readahead+0x1c3/0x2b0 [btrfs]
> [12399.232141]  read_pages+0x10e/0x5f0
> [12399.232748]  page_cache_ra_unbounded+0x1ed/0x2c0
> [12399.233508]  filemap_get_pages+0x218/0x620
> [12399.234196]  filemap_read+0x1ef/0x660
> [12399.234825]  vfs_read+0x3b7/0x4f0
> [12399.235433]  ksys_read+0xc7/0x160
> [12399.236035]  do_syscall_64+0x3d/0x90
> [12399.236675]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [12399.237505]
> [12399.237856] Freed by task 1621:
> [12399.238423]  kasan_save_stack+0x1c/0x40
> [12399.239084]  kasan_set_track+0x21/0x30
> [12399.239728]  kasan_save_free_info+0x27/0x40
> [12399.240435]  ____kasan_slab_free+0x1c2/0x230
> [12399.241141]  kmem_cache_free+0x13a/0x410
> [12399.241813]  bio_free+0x76/0xa0
> [12399.242386]  end_bio_extent_readpage+0x139/0x400 [btrfs]
> [12399.243434]  btrfs_submit_chunk+0x6e9/0x9b0 [btrfs]
> [12399.244421]  btrfs_submit_bio+0x21/0x60 [btrfs]
> [12399.245356]  submit_one_bio+0x6a/0xb0 [btrfs]
> [12399.246273]  submit_extent_page+0x232/0x5c0 [btrfs]
> [12399.247268]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
> [12399.248648]  extent_readahead+0x1c3/0x2b0 [btrfs]
> [12399.249608]  read_pages+0x10e/0x5f0
> [12399.250236]  page_cache_ra_unbounded+0x1ed/0x2c0
> [12399.251006]  filemap_get_pages+0x218/0x620
> [12399.251688]  filemap_read+0x1ef/0x660
> [12399.252304]  vfs_read+0x3b7/0x4f0
> [12399.252880]  ksys_read+0xc7/0x160
> [12399.253463]  do_syscall_64+0x3d/0x90
> [12399.254086]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [12399.254878]
> [12399.255239] The buggy address belongs to the object at ffff888029c96c80
> [12399.255239]  which belongs to the cache biovec-max of size 4096
> [12399.257027] The buggy address is located 0 bytes inside of
> [12399.257027]  freed 4096-byte region [ffff888029c96c80, ffff888029c97c80)
> [12399.258790]
> [12399.259143] The buggy address belongs to the physical page:
> [12399.259994] page:ffff88807e872400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x29c90
> [12399.261410] head:ffff88807e872400 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [12399.262656] flags: 0xa80000010200(slab|head|section=5|zone=1)
> [12399.263548] page_type: 0xffffffff()
> [12399.264144] raw: 0000a80000010200 ffff888001310ac0 ffff88807e099a10 ffff888001312b70
> [12399.265851] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> [12399.267051] page dumped because: kasan: bad access detected
> [12399.267900]
> [12399.268239] Memory state around the buggy address:
> [12399.268986]  ffff888029c96b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [12399.270092]  ffff888029c96c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [12399.271203] >ffff888029c96c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [12399.272590]                    ^
> [12399.273152]  ffff888029c96d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [12399.274267]  ffff888029c96d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [12399.275385] ==================================================================
> [12399.276551] Disabling lock debugging due to kernel taint
> [12399.277362] assertion failed: bv->bv_len == fs_info->sectorsize, in fs/btrfs/inode.c:3441
> [12399.278654] ------------[ cut here ]------------
> [12399.279387] kernel BUG at fs/btrfs/inode.c:3441!
> [12399.280165] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [12399.280979] CPU: 1 PID: 21890 Comm: kworker/u8:4 Tainted: G    B              6.5.0-rc2-default+ #2130
> [12399.282353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> [12399.283964] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
> [12399.285309] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.288907] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
> [12399.290022] RAX: 000000000000004d RBX: ffff888015166d80 RCX: 0000000000000000
> [12399.291074] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI: ffffffff99c9e0e0
> [12399.292385] RBP: ffff888049277cc0 R08: 0000000000000001 R09: ffffed100924ef0f
> [12399.293411] R10: ffff88804927787f R11: fffffffffffe37c0 R12: ffff888014bc8000
> [12399.294446] R13: ffff88804abdc000 R14: 0000000000000655 R15: ffff8880168b3b78
> [12399.295468] FS:  0000000000000000(0000) GS:ffff888068c00000(0000) knlGS:0000000000000000
> [12399.296693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12399.297549] CR2: ffffffffffffffd6 CR3: 000000007288b000 CR4: 00000000000006a0
> [12399.298577] Call Trace:
> [12399.299049]  <TASK>
> [12399.299472]  ? die+0x32/0x80
> [12399.302753]  ? do_trap+0x12d/0x160
> [12399.303356]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.304487]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.305593]  ? do_error_trap+0x90/0x130
> [12399.306241]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.307314]  ? handle_invalid_op+0x2c/0x30
> [12399.307999]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.308972]  ? exc_invalid_op+0x29/0x40
> [12399.309616]  ? asm_exc_invalid_op+0x16/0x20
> [12399.310302]  ? preempt_count_sub+0x18/0xc0
> [12399.310988]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.311976]  ? end_report+0x7a/0x130
> [12399.312594]  ? btrfs_check_sector_csum+0x210/0x210 [btrfs]
> [12399.313625]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> [12399.314637]  btrfs_check_read_bio+0x238/0x8d0 [btrfs]
> [12399.315641]  ? lock_sync+0x100/0x100
> [12399.316247]  ? try_to_wake_up+0x50/0x880
> [12399.316906]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
> [12399.317946]  process_one_work+0x504/0xa00
> [12399.318625]  ? pwq_dec_nr_in_flight+0x100/0x100
> [12399.319363]  ? worker_thread+0x160/0x630
> [12399.320023]  worker_thread+0x8e/0x630
> [12399.320640]  ? __kthread_parkme+0xd8/0xf0
> [12399.321299]  ? process_one_work+0xa00/0xa00
> [12399.321990]  kthread+0x198/0x1e0
> [12399.322564]  ? kthread_complete_and_exit+0x20/0x20
> [12399.323334]  ret_from_fork+0x2d/0x50
> [12399.323945]  ? kthread_complete_and_exit+0x20/0x20
> [12399.324713]  ret_from_fork_asm+0x11/0x20
> [12399.325364] RIP: 0000:0x0
> [12399.325855] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> [12399.326833] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
> [12399.328007] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [12399.329049] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [12399.330095] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [12399.331146] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [12399.332197] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [12399.333246]  </TASK>
> [12399.333679] Modules linked in: dm_flakey dm_mod btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
> [12399.335968] ---[ end trace 0000000000000000 ]---
> [12399.336714] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
> [12399.340450] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
> [12399.341276] RAX: 000000000000004d RBX: ffff888015166d80 RCX: 0000000000000000
> [12399.342337] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI: ffffffff99c9e0e0
> [12399.343389] RBP: ffff888049277cc0 R08: 0000000000000001 R09: ffffed100924ef0f
> [12399.349246] R10: ffff88804927787f R11: fffffffffffe37c0 R12: ffff888014bc8000
> [12399.350585] R13: ffff88804abdc000 R14: 0000000000000655 R15: ffff8880168b3b78
> [12399.351846] FS:  0000000000000000(0000) GS:ffff888069000000(0000) knlGS:0000000000000000
> [12399.353471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12399.354460] CR2: 000055a729c8d000 CR3: 000000003b421000 CR4: 00000000000006a0
> [12399.357113] BTRFS error (device vdb): parent transid verify failed on logical 38993920 mirror 1 wanted 9 found 7
> Connection closed by foreign host.

Qu Wenruo July 20, 2023, 10:55 p.m. UTC | #6

On 2023/7/21 06:15, Qu Wenruo wrote:
>
>
> On 2023/7/20 23:06, David Sterba wrote:
>> On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
>>> [CHANGELOG]
>>> v2:
>>> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
>>>
>>> v3:
>>> - Fix an undefined behavior bug in memcpy_extent_buffer()
>>>    Unlike the name, memcpy_extent_buffer() needs to handle overlapping
>>>    ranges, thus it calls copy_pages() which do overlap checks and switch
>>>    to memmove() when needed.
>>>
>>>    Here we introduce __write_extent_buffer() which allows us to switch
>>>    to go memmove() if needed.
>>>
>>> - Also refactor memmove_extent_buffer()
>>>    Since we have __write_extent_buffer() which can go memmove(), it's
>>>    not hard to refactor memmove_extent_buffer().
>>>
>>>    But there is still a pitfall that we have to handle double page
>>>    boundaries as the old behavior, explained in the last patch.
>>>
>>> - Add selftests on extent buffer memory operations
>>>    I have failed too many times refactoring memmove_extent_buffer(), the
>>>    wasted time should be a memorial for my stupidity.
>>
>> btrfs/125 kasan complains:
>>
>> btrfs/125        [01:09:17][12387.340788] run fstests btrfs/125 at
>> 2023-07-20 01:09:18
>> [12389.539422] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e
>> devid 1 transid 297 /dev/vda scanned by mount (1360)
>> [12389.543907] BTRFS info (device vda): using sha256 (sha256-generic)
>> checksum algorithm
>> [12389.545345] BTRFS info (device vda): using free space tree
>> [12389.568662] BTRFS info (device vda): auto enabling async discard
>> [12393.628549] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c
>> devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (1544)
>> [12393.630846] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c
>> devid 2 transid 6 /dev/vdc scanned by mkfs.btrfs (1544)
>> [12393.633042] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c
>> devid 3 transid 6 /dev/vdd scanned by mkfs.btrfs (1544)
>> [12393.675240] BTRFS info (device vdb): using sha256 (sha256-generic)
>> checksum algorithm
>> [12393.676651] BTRFS info (device vdb): using free space tree
>> [12393.705607] BTRFS info (device vdb): auto enabling async discard
>> [12393.708477] BTRFS info (device vdb): checking UUID tree
>> [12394.479228] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c
>> devid 2 transid 8 /dev/vdc scanned by mount (1573)
>> [12394.481329] BTRFS: device fsid 472a6171-cb8b-4916-8353-172e05aa255c
>> devid 1 transid 8 /dev/vdb scanned by mount (1573)
>> [12394.484821] BTRFS info (device vdb): using sha256 (sha256-generic)
>> checksum algorithm
>> [12394.486018] BTRFS info (device vdb): allowing degraded mounts
>> [12394.486801] BTRFS info (device vdb): using free space tree
>> [12394.495639] BTRFS warning (device vdb): devid 3 uuid
>> 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
>> [12394.499898] BTRFS warning (device vdb): devid 3 uuid
>> 8c6b8e23-2053-4b0a-9d30-0facd2dad945 is missing
>> [12394.523726] BTRFS info (device vdb): auto enabling async discard
>> [12398.021206] BTRFS: device fsid b349d2bf-44dc-4990-8e64-c4933de9e42e
>> devid 1 transid 298 /dev/vda scanned by btrfs (1597)
>> [12398.066913] BTRFS info (device vdb): using sha256 (sha256-generic)
>> checksum algorithm
>> [12398.068414] BTRFS info (device vdb): using free space tree
>> [12398.080629] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40239104 have 31129600
>> [12398.085719] BTRFS info (device vdb): read error corrected: ino 0
>> off 40239104 (dev /dev/vdd sector 19840)
>> [12398.087705] BTRFS info (device vdb): read error corrected: ino 0
>> off 40243200 (dev /dev/vdd sector 19848)
>> [12398.089689] BTRFS info (device vdb): read error corrected: ino 0
>> off 40247296 (dev /dev/vdd sector 19856)
>> [12398.091575] BTRFS info (device vdb): read error corrected: ino 0
>> off 40251392 (dev /dev/vdd sector 19864)
>> [12398.093929] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40255488 have 31145984
>> [12398.097548] BTRFS info (device vdb): read error corrected: ino 0
>> off 40255488 (dev /dev/vdd sector 19872)
>> [12398.099311] BTRFS info (device vdb): read error corrected: ino 0
>> off 40259584 (dev /dev/vdd sector 19880)
>> [12398.101038] BTRFS info (device vdb): read error corrected: ino 0
>> off 40263680 (dev /dev/vdd sector 19888)
>> [12398.102663] BTRFS info (device vdb): read error corrected: ino 0
>> off 40267776 (dev /dev/vdd sector 19896)
>> [12398.105020] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40271872 have 31162368
>> [12398.107479] BTRFS info (device vdb): read error corrected: ino 0
>> off 40271872 (dev /dev/vdd sector 19904)
>> [12398.109094] BTRFS info (device vdb): read error corrected: ino 0
>> off 40275968 (dev /dev/vdd sector 19912)
>> [12398.111111] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40222720 have 31113216
>> [12398.121818] BTRFS info (device vdb): auto enabling async discard
>> [12398.219247] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40288256 have 31178752
>> [12398.233989] BTRFS info (device vdb): balance: start -d -m -s
>> [12398.235327] BTRFS info (device vdb): relocating block group
>> 2365194240 flags data|raid5
>> [12398.310482] BTRFS error (device vdb): bad tree block start, mirror
>> 1 want 40189952 have 31080448
>> [12398.482607] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 1 wanted 9 found 7
>> [12398.489325] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 2 wanted 9 found 7
>> [12398.493394] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 1 wanted 9 found 7
>> [12398.496146] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 2 wanted 9 found 7
>> [12398.499510] BTRFS error (device vdb): parent transid verify failed
>> on logical 39108608 mirror 1 wanted 9 found 7
>> [12398.736591] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 1 wanted 9 found 7
>> [12398.740199] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 2 wanted 9 found 7
>> [12398.907346] BTRFS info (device vdb): balance: ended with status: -5
>> [12399.168513] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 1 wanted 9 found 7
>> [12399.174882] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 2 wanted 9 found 7
>
> Comparing it to my pass runs, the rebuild is not working for metadata.
>
> As my passing bios shows no error on mirror 2 (rebuilt from P).
>
>> [12399.180441]
>> ==================================================================
>> [12399.183100] BUG: KASAN: slab-use-after-free in
>> btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
>> [12399.186056] Read of size 8 at addr ffff888029c96c80 by task
>> kworker/u8:4/21890
>> [12399.188440]
>> [12399.188965] CPU: 1 PID: 21890 Comm: kworker/u8:4 Not tainted
>> 6.5.0-rc2-default+ #2130
>> [12399.191616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
>> [12399.193366] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
>> [12399.194534] Call Trace:
>> [12399.195039]  <TASK>
>> [12399.195484]  dump_stack_lvl+0x46/0x70
>> [12399.196182]  print_address_description.constprop.0+0x30/0x420
>> [12399.197136]  ? preempt_count_sub+0x18/0xc0
>> [12399.197858]  print_report+0xb0/0x260
>> [12399.198497]  ? __virt_addr_valid+0xbb/0xf0
>> [12399.199204]  ? kasan_addr_to_slab+0x94/0xc0
>> [12399.199936]  kasan_report+0xbe/0xf0
>> [12399.200562]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
>> [12399.201618]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
>> [12399.202667]  btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
>
> This is werid, as btrfs_check_read_bio() can only happen for data bios.
>
> Let me double check what's going wrong.

What about the reproducibility? I failed to reproduce here, and I
checked the git log, it doesn't has any obvious changes to RAID56 code
either (all are already in my code base).

Thanks,
Qu
>
> Thanks,
> Qu
>> [12399.203703]  ? lock_sync+0x100/0x100
>> [12399.204344]  ? try_to_wake_up+0x50/0x880
>> [12399.205025]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
>> [12399.206116]  ? mark_held_locks+0x1a/0x80
>> [12399.206802]  process_one_work+0x504/0xa00
>> [12399.207530]  ? pwq_dec_nr_in_flight+0x100/0x100
>> [12399.208305]  ? worker_thread+0x160/0x630
>> [12399.208996]  worker_thread+0x8e/0x630
>> [12399.209638]  ? __kthread_parkme+0xd8/0xf0
>> [12399.210331]  ? process_one_work+0xa00/0xa00
>> [12399.211032]  kthread+0x198/0x1e0
>> [12399.211634]  ? kthread_complete_and_exit+0x20/0x20
>> [12399.212432]  ret_from_fork+0x2d/0x50
>> [12399.213087]  ? kthread_complete_and_exit+0x20/0x20
>> [12399.213895]  ret_from_fork_asm+0x11/0x20
>> [12399.214585] RIP: 0000:0x0
>> [12399.215098] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>> [12399.216131] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX:
>> 0000000000000000
>> [12399.217361] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> 0000000000000000
>> [12399.218442] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> [12399.219546] RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [12399.220629] R10: 0000000000000000 R11: 0000000000000000 R12:
>> 0000000000000000
>> [12399.221710] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [12399.222811]  </TASK>
>> [12399.223263]
>> [12399.223628] Allocated by task 1621:
>> [12399.224238]  kasan_save_stack+0x1c/0x40
>> [12399.224900]  kasan_set_track+0x21/0x30
>> [12399.225558]  __kasan_slab_alloc+0x62/0x70
>> [12399.226240]  kmem_cache_alloc+0x194/0x370
>> [12399.226920]  mempool_alloc+0xe1/0x260
>> [12399.227573]  bio_alloc_bioset+0x2c7/0x450
>> [12399.228266]  btrfs_bio_alloc+0x2e/0x50 [btrfs]
>> [12399.229208]  submit_extent_page+0x2e0/0x5c0 [btrfs]
>> [12399.230206]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
>> [12399.231188]  extent_readahead+0x1c3/0x2b0 [btrfs]
>> [12399.232141]  read_pages+0x10e/0x5f0
>> [12399.232748]  page_cache_ra_unbounded+0x1ed/0x2c0
>> [12399.233508]  filemap_get_pages+0x218/0x620
>> [12399.234196]  filemap_read+0x1ef/0x660
>> [12399.234825]  vfs_read+0x3b7/0x4f0
>> [12399.235433]  ksys_read+0xc7/0x160
>> [12399.236035]  do_syscall_64+0x3d/0x90
>> [12399.236675]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>> [12399.237505]
>> [12399.237856] Freed by task 1621:
>> [12399.238423]  kasan_save_stack+0x1c/0x40
>> [12399.239084]  kasan_set_track+0x21/0x30
>> [12399.239728]  kasan_save_free_info+0x27/0x40
>> [12399.240435]  ____kasan_slab_free+0x1c2/0x230
>> [12399.241141]  kmem_cache_free+0x13a/0x410
>> [12399.241813]  bio_free+0x76/0xa0
>> [12399.242386]  end_bio_extent_readpage+0x139/0x400 [btrfs]
>> [12399.243434]  btrfs_submit_chunk+0x6e9/0x9b0 [btrfs]
>> [12399.244421]  btrfs_submit_bio+0x21/0x60 [btrfs]
>> [12399.245356]  submit_one_bio+0x6a/0xb0 [btrfs]
>> [12399.246273]  submit_extent_page+0x232/0x5c0 [btrfs]
>> [12399.247268]  btrfs_do_readpage+0x52a/0xb50 [btrfs]
>> [12399.248648]  extent_readahead+0x1c3/0x2b0 [btrfs]
>> [12399.249608]  read_pages+0x10e/0x5f0
>> [12399.250236]  page_cache_ra_unbounded+0x1ed/0x2c0
>> [12399.251006]  filemap_get_pages+0x218/0x620
>> [12399.251688]  filemap_read+0x1ef/0x660
>> [12399.252304]  vfs_read+0x3b7/0x4f0
>> [12399.252880]  ksys_read+0xc7/0x160
>> [12399.253463]  do_syscall_64+0x3d/0x90
>> [12399.254086]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>> [12399.254878]
>> [12399.255239] The buggy address belongs to the object at
>> ffff888029c96c80
>> [12399.255239]  which belongs to the cache biovec-max of size 4096
>> [12399.257027] The buggy address is located 0 bytes inside of
>> [12399.257027]  freed 4096-byte region [ffff888029c96c80,
>> ffff888029c97c80)
>> [12399.258790]
>> [12399.259143] The buggy address belongs to the physical page:
>> [12399.259994] page:ffff88807e872400 refcount:1 mapcount:0
>> mapping:0000000000000000 index:0x0 pfn:0x29c90
>> [12399.261410] head:ffff88807e872400 order:3 entire_mapcount:0
>> nr_pages_mapped:0 pincount:0
>> [12399.262656] flags: 0xa80000010200(slab|head|section=5|zone=1)
>> [12399.263548] page_type: 0xffffffff()
>> [12399.264144] raw: 0000a80000010200 ffff888001310ac0 ffff88807e099a10
>> ffff888001312b70
>> [12399.265851] raw: 0000000000000000 0000000000070007 00000001ffffffff
>> 0000000000000000
>> [12399.267051] page dumped because: kasan: bad access detected
>> [12399.267900]
>> [12399.268239] Memory state around the buggy address:
>> [12399.268986]  ffff888029c96b80: fc fc fc fc fc fc fc fc fc fc fc fc
>> fc fc fc fc
>> [12399.270092]  ffff888029c96c00: fc fc fc fc fc fc fc fc fc fc fc fc
>> fc fc fc fc
>> [12399.271203] >ffff888029c96c80: fa fb fb fb fb fb fb fb fb fb fb fb
>> fb fb fb fb
>> [12399.272590]                    ^
>> [12399.273152]  ffff888029c96d00: fb fb fb fb fb fb fb fb fb fb fb fb
>> fb fb fb fb
>> [12399.274267]  ffff888029c96d80: fb fb fb fb fb fb fb fb fb fb fb fb
>> fb fb fb fb
>> [12399.275385]
>> ==================================================================
>> [12399.276551] Disabling lock debugging due to kernel taint
>> [12399.277362] assertion failed: bv->bv_len == fs_info->sectorsize, in
>> fs/btrfs/inode.c:3441
>> [12399.278654] ------------[ cut here ]------------
>> [12399.279387] kernel BUG at fs/btrfs/inode.c:3441!
>> [12399.280165] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>> [12399.280979] CPU: 1 PID: 21890 Comm: kworker/u8:4 Tainted: G
>> B              6.5.0-rc2-default+ #2130
>> [12399.282353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
>> [12399.283964] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
>> [12399.285309] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.288907] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
>> [12399.290022] RAX: 000000000000004d RBX: ffff888015166d80 RCX:
>> 0000000000000000
>> [12399.291074] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI:
>> ffffffff99c9e0e0
>> [12399.292385] RBP: ffff888049277cc0 R08: 0000000000000001 R09:
>> ffffed100924ef0f
>> [12399.293411] R10: ffff88804927787f R11: fffffffffffe37c0 R12:
>> ffff888014bc8000
>> [12399.294446] R13: ffff88804abdc000 R14: 0000000000000655 R15:
>> ffff8880168b3b78
>> [12399.295468] FS:  0000000000000000(0000) GS:ffff888068c00000(0000)
>> knlGS:0000000000000000
>> [12399.296693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [12399.297549] CR2: ffffffffffffffd6 CR3: 000000007288b000 CR4:
>> 00000000000006a0
>> [12399.298577] Call Trace:
>> [12399.299049]  <TASK>
>> [12399.299472]  ? die+0x32/0x80
>> [12399.302753]  ? do_trap+0x12d/0x160
>> [12399.303356]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.304487]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.305593]  ? do_error_trap+0x90/0x130
>> [12399.306241]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.307314]  ? handle_invalid_op+0x2c/0x30
>> [12399.307999]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.308972]  ? exc_invalid_op+0x29/0x40
>> [12399.309616]  ? asm_exc_invalid_op+0x16/0x20
>> [12399.310302]  ? preempt_count_sub+0x18/0xc0
>> [12399.310988]  ? btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.311976]  ? end_report+0x7a/0x130
>> [12399.312594]  ? btrfs_check_sector_csum+0x210/0x210 [btrfs]
>> [12399.313625]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
>> [12399.314637]  btrfs_check_read_bio+0x238/0x8d0 [btrfs]
>> [12399.315641]  ? lock_sync+0x100/0x100
>> [12399.316247]  ? try_to_wake_up+0x50/0x880
>> [12399.316906]  ? btrfs_repair_io_failure+0x490/0x490 [btrfs]
>> [12399.317946]  process_one_work+0x504/0xa00
>> [12399.318625]  ? pwq_dec_nr_in_flight+0x100/0x100
>> [12399.319363]  ? worker_thread+0x160/0x630
>> [12399.320023]  worker_thread+0x8e/0x630
>> [12399.320640]  ? __kthread_parkme+0xd8/0xf0
>> [12399.321299]  ? process_one_work+0xa00/0xa00
>> [12399.321990]  kthread+0x198/0x1e0
>> [12399.322564]  ? kthread_complete_and_exit+0x20/0x20
>> [12399.323334]  ret_from_fork+0x2d/0x50
>> [12399.323945]  ? kthread_complete_and_exit+0x20/0x20
>> [12399.324713]  ret_from_fork_asm+0x11/0x20
>> [12399.325364] RIP: 0000:0x0
>> [12399.325855] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>> [12399.326833] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX:
>> 0000000000000000
>> [12399.328007] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> 0000000000000000
>> [12399.329049] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> [12399.330095] RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [12399.331146] R10: 0000000000000000 R11: 0000000000000000 R12:
>> 0000000000000000
>> [12399.332197] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [12399.333246]  </TASK>
>> [12399.333679] Modules linked in: dm_flakey dm_mod btrfs
>> blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq
>> zstd_decompress zstd_compress xxhash zstd_common loop
>> [12399.335968] ---[ end trace 0000000000000000 ]---
>> [12399.336714] RIP: 0010:btrfs_data_csum_ok+0x40f/0x530 [btrfs]
>> [12399.340450] RSP: 0018:ffff888049277b30 EFLAGS: 00010246
>> [12399.341276] RAX: 000000000000004d RBX: ffff888015166d80 RCX:
>> 0000000000000000
>> [12399.342337] RDX: 0000000000000000 RSI: ffffffff961007f8 RDI:
>> ffffffff99c9e0e0
>> [12399.343389] RBP: ffff888049277cc0 R08: 0000000000000001 R09:
>> ffffed100924ef0f
>> [12399.349246] R10: ffff88804927787f R11: fffffffffffe37c0 R12:
>> ffff888014bc8000
>> [12399.350585] R13: ffff88804abdc000 R14: 0000000000000655 R15:
>> ffff8880168b3b78
>> [12399.351846] FS:  0000000000000000(0000) GS:ffff888069000000(0000)
>> knlGS:0000000000000000
>> [12399.353471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [12399.354460] CR2: 000055a729c8d000 CR3: 000000003b421000 CR4:
>> 00000000000006a0
>> [12399.357113] BTRFS error (device vdb): parent transid verify failed
>> on logical 38993920 mirror 1 wanted 9 found 7
>> Connection closed by foreign host.

David Sterba July 21, 2023, 3:13 p.m. UTC | #7

On Fri, Jul 21, 2023 at 06:55:49AM +0800, Qu Wenruo wrote:
> On 2023/7/21 06:15, Qu Wenruo wrote:
> > On 2023/7/20 23:06, David Sterba wrote:
> >> On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
> >> [12399.180441]
> >> ==================================================================
> >> [12399.183100] BUG: KASAN: slab-use-after-free in
> >> btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> >> [12399.186056] Read of size 8 at addr ffff888029c96c80 by task
> >> kworker/u8:4/21890
> >> [12399.188440]
> >> [12399.188965] CPU: 1 PID: 21890 Comm: kworker/u8:4 Not tainted
> >> 6.5.0-rc2-default+ #2130
> >> [12399.191616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> >> [12399.193366] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
> >> [12399.194534] Call Trace:
> >> [12399.195039]  <TASK>
> >> [12399.195484]  dump_stack_lvl+0x46/0x70
> >> [12399.196182]  print_address_description.constprop.0+0x30/0x420
> >> [12399.197136]  ? preempt_count_sub+0x18/0xc0
> >> [12399.197858]  print_report+0xb0/0x260
> >> [12399.198497]  ? __virt_addr_valid+0xbb/0xf0
> >> [12399.199204]  ? kasan_addr_to_slab+0x94/0xc0
> >> [12399.199936]  kasan_report+0xbe/0xf0
> >> [12399.200562]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> >> [12399.201618]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> >> [12399.202667]  btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> >
> > This is werid, as btrfs_check_read_bio() can only happen for data bios.
> >
> > Let me double check what's going wrong.
> 
> What about the reproducibility? I failed to reproduce here, and I
> checked the git log, it doesn't has any obvious changes to RAID56 code
> either (all are already in my code base).

This was first run, I'll do another one.

David Sterba July 27, 2023, 6:27 p.m. UTC | #8

On Fri, Jul 21, 2023 at 05:13:47PM +0200, David Sterba wrote:
> On Fri, Jul 21, 2023 at 06:55:49AM +0800, Qu Wenruo wrote:
> > On 2023/7/21 06:15, Qu Wenruo wrote:
> > > On 2023/7/20 23:06, David Sterba wrote:
> > >> On Sat, Jul 15, 2023 at 07:08:26PM +0800, Qu Wenruo wrote:
> > >> [12399.180441]
> > >> ==================================================================
> > >> [12399.183100] BUG: KASAN: slab-use-after-free in
> > >> btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> > >> [12399.186056] Read of size 8 at addr ffff888029c96c80 by task
> > >> kworker/u8:4/21890
> > >> [12399.188440]
> > >> [12399.188965] CPU: 1 PID: 21890 Comm: kworker/u8:4 Not tainted
> > >> 6.5.0-rc2-default+ #2130
> > >> [12399.191616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> > >> [12399.193366] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
> > >> [12399.194534] Call Trace:
> > >> [12399.195039]  <TASK>
> > >> [12399.195484]  dump_stack_lvl+0x46/0x70
> > >> [12399.196182]  print_address_description.constprop.0+0x30/0x420
> > >> [12399.197136]  ? preempt_count_sub+0x18/0xc0
> > >> [12399.197858]  print_report+0xb0/0x260
> > >> [12399.198497]  ? __virt_addr_valid+0xbb/0xf0
> > >> [12399.199204]  ? kasan_addr_to_slab+0x94/0xc0
> > >> [12399.199936]  kasan_report+0xbe/0xf0
> > >> [12399.200562]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> > >> [12399.201618]  ? btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> > >> [12399.202667]  btrfs_check_read_bio+0x19c/0x8d0 [btrfs]
> > >
> > > This is werid, as btrfs_check_read_bio() can only happen for data bios.
> > >
> > > Let me double check what's going wrong.
> > 
> > What about the reproducibility? I failed to reproduce here, and I
> > checked the git log, it doesn't has any obvious changes to RAID56 code
> > either (all are already in my code base).
> 
> This was first run, I'll do another one.

With reworked misc-next this patchset does not reproduce the errors so
I'll add it back.

[v3,0/8] btrfs: preparation patches for the incoming metadata folio conversion

Message

Comments