diff mbox series

btrfs: tree-checker: dump the page status if hit something wrong

Message ID f51a6d5d7432455a6a858d51b49ecac183e0bbc9.1706312914.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: tree-checker: dump the page status if hit something wrong | expand

Commit Message

Qu Wenruo Jan. 26, 2024, 11:48 p.m. UTC
[BUG]
There is a bug report about very suspicious tree-checker got triggered:

  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
ino=5737268

[ANALYZE]
The root cause is still unclear, but there are some clues already:

- Unaligned eb bytenr
  The block bytenr is 8550954455682405139, which is not even aligned to
  2.
  This bytenr is fetched from extent buffer header, not from eb->start.

  This means, at the initial time of read, eb header bytenr is still
  correct (the very basis check to continue read), but later something
  wrong happened, got at least the first page corrupted.
  Thus we got such obviously incorrect value.

- Invalid extent buffer header owner
  The read itself is triggered for subvolume 256, but the eb header
  owner is 11858205567642294356, which is not really possible.
  The problem here is, subovlume id is limited to (1 << 48 - 1),
  and this one definitely goes beyond that limit.

  So this value is another garbage.

We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.

This looks like a page lifespan problem (e.g. we didn't proper hold the
page).

[ENHANCEMENT]
The current tree-checker only output things from the extent buffer,
nothing with the page status.

So this patch would enhance the tree-checker output by also dumpping the
first page, which would look like this:

 page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
 memcg:ffff888103456000
 aops:btree_aops [btrfs] ino:1
 flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
 page_type: 0xffffffff()
 raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
 raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
 page dumped because: eb page dump
 BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
 BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1

From the dump we can see some extra info, something can help us to do
extra cross-checks:

- Page refcount
  if it's too low, it definitely means something bad.

- Page aops
  Any mapped eb page should have btree_aops with inode number 1.

- Page index
  Since a mapped eb page should has its bytenr matching the page
  position, (index << PAGE_SHIFT) should match the bytenr of the
  bytenr from the critical line.

- Page Private flags
  A mapped eb page should have Private flag set to indicate it's managed
  by btrfs.

Link: https://marc.info/?l=linux-btrfs&m=170629708724284&w=2
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Tavian Barnes Feb. 6, 2024, 3:38 a.m. UTC | #1
On Sat, 27 Jan 2024 10:18:36 +1030, Qu Wenruo wrote:
> [BUG]
> There is a bug report about very suspicious tree-checker got triggered:
>
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
> ino=5737268

I can reproduce this error.  I applied a modified version of your patch,
against v6.7.2 because that's what I triggered it on.

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 50fdc69fdddf..3f1fc49cd4dc 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -2038,6 +2044,7 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
        if (!is_subvol) {
                /* For non-subvolume trees, the eb owner should match root owner */
                if (unlikely(root_owner != eb_owner)) {
+                       dump_page(eb->pages[0], "eb page dump");
                        btrfs_crit(eb->fs_info,
 "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu",
                                btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -2053,6 +2060,7 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
         * to subvolume trees.
         */
        if (unlikely(is_subvol != is_fstree(eb_owner))) {
+               dump_page(eb->pages[0], "eb page dump");
                btrfs_crit(eb->fs_info,
 "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect [%llu, %llu]",
                        btrfs_header_level(eb) == 0 ? "leaf" : "node",

Here's the corresponding dmesg output:

    page:00000000789c68b4 refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7df93c74 pfn:0x1269558
    memcg:ffff9f20d10df000
    aops:btree_aops [btrfs] ino:1
    flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
    raw: 000000007df93c74 ffff9f2232376e80 00000004ffffffff ffff9f20d10df000
    page dumped because: eb page dump
    BTRFS critical (device dm-1): corrupted leaf, root=709 block=8656838410240 owner mismatch, have 2694891690930195334 expect [256, 18446744073709551360]
    page:000000006b7dfcdc refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x8dae804c pfn:0x408347
    memcg:ffff9f20d10df000
    aops:btree_aops [btrfs] ino:1
    flags: 0xaffff180000820c(referenced|uptodate|workingset|private|node=1|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    raw: 0affff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
    raw: 000000008dae804c ffff9f1497257d00 00000004ffffffff ffff9f20d10df000
    page dumped because: eb page dump
    BTRFS critical (device dm-1): corrupted leaf, root=518 block=9736288518144 owner mismatch, have 1691386650333431481 expect [256, 18446744073709551360]
    page:00000000fb0df6cd refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7609cbdc pfn:0x129e719
    memcg:ffff9f20d10df000
    aops:btree_aops [btrfs] ino:1
    flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
    raw: 000000007609cbdc ffff9f231de92658 00000004ffffffff ffff9f20d10df000
    page dumped because: eb page dump
    BTRFS critical (device dm-1): corrupted leaf, root=518 block=8111527936000 owner mismatch, have 10652220539197264134 expect [256, 18446744073709551360]

Hope this helps!  Let me know if you have other debug patches to try.

Here's my reproducer if you want to try it yourself.  It uses bfs, a
find(1) clone I wrote with multi-threading and io_uring support.  I'm
in the process of adding multi-threaded stat(), which is what I assume
triggers the bug.

    $ git clone "https://github.com/tavianator/bfs"
    $ cd bfs
    $ git checkout euclean
    $ make release

Then repeat these steps until it triggers:

    # sysctl vm.drop_caches=3
    $ ./bin/bfs /mnt -links 100
    bfs: error: /mnt/slash/@/var/lib/docker/btrfs/subvolumes/f07d37d1c148e9fcdbae166a3a4de36eec49009ce651174d0921fab18d55cee6/dev/ram0: Structure needs cleaning.
    ...
Qu Wenruo Feb. 6, 2024, 5:54 a.m. UTC | #2
On 2024/2/6 14:08, tavianator@tavianator.com wrote:
> On Sat, 27 Jan 2024 10:18:36 +1030, Qu Wenruo wrote:
>> [BUG]
>> There is a bug report about very suspicious tree-checker got triggered:
>>
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
>> ino=5737268
>
> I can reproduce this error.  I applied a modified version of your patch,
> against v6.7.2 because that's what I triggered it on.
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 50fdc69fdddf..3f1fc49cd4dc 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -2038,6 +2044,7 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
>          if (!is_subvol) {
>                  /* For non-subvolume trees, the eb owner should match root owner */
>                  if (unlikely(root_owner != eb_owner)) {
> +                       dump_page(eb->pages[0], "eb page dump");
>                          btrfs_crit(eb->fs_info,
>   "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu",
>                                  btrfs_header_level(eb) == 0 ? "leaf" : "node",
> @@ -2053,6 +2060,7 @@ int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner)
>           * to subvolume trees.
>           */
>          if (unlikely(is_subvol != is_fstree(eb_owner))) {
> +               dump_page(eb->pages[0], "eb page dump");
>                  btrfs_crit(eb->fs_info,
>   "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect [%llu, %llu]",
>                          btrfs_header_level(eb) == 0 ? "leaf" : "node",
>
> Here's the corresponding dmesg output:
>
>      page:00000000789c68b4 refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7df93c74 pfn:0x1269558
>      memcg:ffff9f20d10df000
>      aops:btree_aops [btrfs] ino:1
>      flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
>      raw: 000000007df93c74 ffff9f2232376e80 00000004ffffffff ffff9f20d10df000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-1): corrupted leaf, root=709 block=8656838410240 owner mismatch, have 2694891690930195334 expect [256, 18446744073709551360]

The page index and eb->start matches page index, so that page attaching
part is correct.

And the refcount is also 4, which matches the common case.

Although I still need to check the extra flags for workingset.

>      page:000000006b7dfcdc refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x8dae804c pfn:0x408347
>      memcg:ffff9f20d10df000
>      aops:btree_aops [btrfs] ino:1
>      flags: 0xaffff180000820c(referenced|uptodate|workingset|private|node=1|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      raw: 0affff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
>      raw: 000000008dae804c ffff9f1497257d00 00000004ffffffff ffff9f20d10df000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-1): corrupted leaf, root=518 block=9736288518144 owner mismatch, have 1691386650333431481 expect [256, 18446744073709551360]
>      page:00000000fb0df6cd refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7609cbdc pfn:0x129e719
>      memcg:ffff9f20d10df000
>      aops:btree_aops [btrfs] ino:1
>      flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
>      raw: 000000007609cbdc ffff9f231de92658 00000004ffffffff ffff9f20d10df000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-1): corrupted leaf, root=518 block=8111527936000 owner mismatch, have 10652220539197264134 expect [256, 18446744073709551360]
>
> Hope this helps!  Let me know if you have other debug patches to try.
>
> Here's my reproducer if you want to try it yourself.  It uses bfs, a
> find(1) clone I wrote with multi-threading and io_uring support.  I'm
> in the process of adding multi-threaded stat(), which is what I assume
> triggers the bug.
>
>      $ git clone "https://github.com/tavianator/bfs"
>      $ cd bfs
>      $ git checkout euclean
>      $ make release
>
> Then repeat these steps until it triggers:
>
>      # sysctl vm.drop_caches=3
>      $ ./bin/bfs /mnt -links 100
>      bfs: error: /mnt/slash/@/var/lib/docker/btrfs/subvolumes/f07d37d1c148e9fcdbae166a3a4de36eec49009ce651174d0921fab18d55cee6/dev/ram0: Structure needs cleaning.

It looks like the mount point /mnt/ is pretty large with a lot of things
pre-populated?

I tried to populate the btrfs with my linux git repo (which is around
6.5G with some GC needed), but even 256 runs didn't hit the problem.

The main part of the script looks like this:

for (( i = 0; i < 256; i++ )); do
	mount $dev1 $mnt
	sysctl vm.drop_caches=3
	/home/adam/bfs/bin/bfs $mnt -links 100
	umount $mnt
done

And the device looks like this:

/dev/mapper/test-scratch1  10485760  6472292   3679260  64% /mnt/btrfs

Although the difference is, I'm using btrfs/for-next branch
(https://github.com/btrfs/linux/tree/for-next).

Maybe it's missing some fixes not yet in upstream?
My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
alloc_extent_buffer() to allocate-then-attach method"), but since I can
not reproduce it, it's only a guess...

Thanks,
Qu

>      ...
>
David Sterba Feb. 6, 2024, 12:46 p.m. UTC | #3
On Sat, Jan 27, 2024 at 10:18:36AM +1030, Qu Wenruo wrote:
> [BUG]
> There is a bug report about very suspicious tree-checker got triggered:
> 
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   BTRFS critical (device dm-0): corrupted node, root=256
> block=8550954455682405139 owner mismatch, have 11858205567642294356
> expect [256, 18446744073709551360]
>   SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
> ino=5737268
> 
> [ANALYZE]
> The root cause is still unclear, but there are some clues already:
> 
> - Unaligned eb bytenr
>   The block bytenr is 8550954455682405139, which is not even aligned to
>   2.
>   This bytenr is fetched from extent buffer header, not from eb->start.
> 
>   This means, at the initial time of read, eb header bytenr is still
>   correct (the very basis check to continue read), but later something
>   wrong happened, got at least the first page corrupted.
>   Thus we got such obviously incorrect value.
> 
> - Invalid extent buffer header owner
>   The read itself is triggered for subvolume 256, but the eb header
>   owner is 11858205567642294356, which is not really possible.
>   The problem here is, subovlume id is limited to (1 << 48 - 1),
>   and this one definitely goes beyond that limit.
> 
>   So this value is another garbage.
> 
> We already got two garbage from an extent buffer, which passed the
> initial bytenr and csum checks, but later the contents become garbage at
> some point.
> 
> This looks like a page lifespan problem (e.g. we didn't proper hold the
> page).
> 
> [ENHANCEMENT]
> The current tree-checker only output things from the extent buffer,
> nothing with the page status.
> 
> So this patch would enhance the tree-checker output by also dumpping the
> first page, which would look like this:
> 
>  page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
>  memcg:ffff888103456000
>  aops:btree_aops [btrfs] ino:1
>  flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
>  page_type: 0xffffffff()
>  raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
>  raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
>  page dumped because: eb page dump
>  BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
>  BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
> 
> >From the dump we can see some extra info, something can help us to do
> extra cross-checks:
> 
> - Page refcount
>   if it's too low, it definitely means something bad.
> 
> - Page aops
>   Any mapped eb page should have btree_aops with inode number 1.
> 
> - Page index
>   Since a mapped eb page should has its bytenr matching the page
>   position, (index << PAGE_SHIFT) should match the bytenr of the
>   bytenr from the critical line.
> 
> - Page Private flags
>   A mapped eb page should have Private flag set to indicate it's managed
>   by btrfs.
> 
> Link: https://marc.info/?l=linux-btrfs&m=170629708724284&w=2

Please use a link to lore.kernel.org, this keeps the threading and the
message id is in the url so it's possible to look it up elsewhere.

> Signed-off-by: Qu Wenruo <wqu@suse.com>

For debugging the patch is useful, I'd say go on and add it.

Reviewed-by: David Sterba <dsterba@suse.com>
David Sterba Feb. 6, 2024, 12:51 p.m. UTC | #4
On Mon, Feb 05, 2024 at 10:38:07PM -0500, tavianator@tavianator.com wrote:
> On Sat, 27 Jan 2024 10:18:36 +1030, Qu Wenruo wrote:

> Here's my reproducer if you want to try it yourself.  It uses bfs, a
> find(1) clone I wrote with multi-threading and io_uring support.  I'm

Do you use other fancy tech like io_uring? This itself can be a
significant factor, other than config, host etc.
Tavian Barnes Feb. 6, 2024, 8:12 p.m. UTC | #5
On Tue, 6 Feb 2024 16:24:32 +1030, Qu Wenruo wrote:
> On 2024/2/6 14:08, tavianator@tavianator.com wrote:
> > Here's the corresponding dmesg output:
> >
> >      page:00000000789c68b4 refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7df93c74 pfn:0x1269558
> >      memcg:ffff9f20d10df000
> >      aops:btree_aops [btrfs] ino:1
> >      flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
> >      page_type: 0xffffffff()
> >      raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
> >      raw: 000000007df93c74 ffff9f2232376e80 00000004ffffffff ffff9f20d10df000
> >      page dumped because: eb page dump
> >      BTRFS critical (device dm-1): corrupted leaf, root=709 block=8656838410240 owner mismatch, have 2694891690930195334 expect [256, 18446744073709551360]
>
> The page index and eb->start matches page index, so that page attaching
> part is correct.
>
> And the refcount is also 4, which matches the common case.
>
> Although I still need to check the extra flags for workingset.

I did get some other splats with refcount:3, e.g.

    page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
    page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
    memcg:ffff9f211ab95000
    page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
    memcg:ffff9f211ab95000
    page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
    memcg:ffff9f211ab95000
    memcg:ffff9f211ab95000
    page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
    memcg:ffff9f211ab95000
    BTRFS critical (device dm-1): inode mode mismatch with dir: inode mode=042255 btrfs type=2 dir type=1
    aops:btree_aops [btrfs] ino:1
    aops:btree_aops [btrfs] ino:1
    aops:btree_aops [btrfs] ino:1
    aops:btree_aops [btrfs] ino:1
    aops:btree_aops [btrfs] ino:1
    flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    page_type: 0xffffffff()
    raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
    raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
    raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
    raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
    flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page dumped because: eb page dump
    page dumped because: eb page dump
    page_type: 0xffffffff()
    BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
    BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
    flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
    raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
    page_type: 0xffffffff()
    page dumped because: eb page dump
    raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
    BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
    raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
    page dumped because: eb page dump
    BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
    flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
    raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
    page dumped because: eb page dump
    BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]

> > Here's my reproducer if you want to try it yourself.  It uses bfs, a
> > find(1) clone I wrote with multi-threading and io_uring support.  I'm
> > in the process of adding multi-threaded stat(), which is what I assume
> > triggers the bug.
> >
> >      $ git clone "https://github.com/tavianator/bfs"
> >      $ cd bfs
> >      $ git checkout euclean
> >      $ make release
> >
> > Then repeat these steps until it triggers:
> >
> >      # sysctl vm.drop_caches=3
> >      $ ./bin/bfs /mnt -links 100
> >      bfs: error: /mnt/slash/@/var/lib/docker/btrfs/subvolumes/f07d37d1c148e9fcdbae166a3a4de36eec49009ce651174d0921fab18d55cee6/dev/ram0: Structure needs cleaning.
>
> It looks like the mount point /mnt/ is pretty large with a lot of things
> pre-populated?

Right, /mnt contains a few filesystems.  /mnt/slash is my root fs (the
subvolume @ is mounted as /).  It's quite large, with over 41 million
files and 640 subvolumes.  It's a BTRFS RAID0 array on 4 1TB NVMEs with
LUKS encryption.

> I tried to populate the btrfs with my linux git repo (which is around
> 6.5G with some GC needed), but even 256 runs didn't hit the problem.
>
> The main part of the script looks like this:
>
> for (( i = 0; i < 256; i++ )); do
> 	mount $dev1 $mnt
> 	sysctl vm.drop_caches=3
> 	/home/adam/bfs/bin/bfs $mnt -links 100
> 	umount $mnt
> done
>
> And the device looks like this:
>
> /dev/mapper/test-scratch1  10485760  6472292   3679260  64% /mnt/btrfs

I also noticed that it seems easier to reproduce right after a reboot.
I failed to reproduce it this morning, but after a reboot it triggered
immediately.

> Although the difference is, I'm using btrfs/for-next branch
> (https://github.com/btrfs/linux/tree/for-next).
>
> Maybe it's missing some fixes not yet in upstream?
> My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
> alloc_extent_buffer() to allocate-then-attach method"), but since I can
> not reproduce it, it's only a guess...

That's possible!  I tried to follow the existing code in
alloc_extent_buffer() but didn't see any obvious races.  I will try again
with the for-next tree and report back.

> Thanks,
> Qu

--

Tavian Barnes
Tavian Barnes Feb. 6, 2024, 8:19 p.m. UTC | #6
On Tue, Feb 6, 2024 at 7:51 AM David Sterba <dsterba@suse.cz> wrote:
> On Mon, Feb 05, 2024 at 10:38:07PM -0500, tavianator@tavianator.com wrote:
> > On Sat, 27 Jan 2024 10:18:36 +1030, Qu Wenruo wrote:
>
> > Here's my reproducer if you want to try it yourself.  It uses bfs, a
> > find(1) clone I wrote with multi-threading and io_uring support.  I'm
>
> Do you use other fancy tech like io_uring? This itself can be a
> significant factor, other than config, host etc.

Nothing too fancy.  Actually it's reproducible without io_uring:

    $ make release USE_LIBURING=
    $ ./bin/bfs -j24 /mnt -links 100
    bfs: error:
/mnt/slash/@home/tavianator/code/android/external/honggfuzz/examples/bind/corpus/9fb9c1f611ea4f2a92273d317a872a69.0000021c.honggfuzz.cov:
Structure needs cleaning.
    bfs: error:
/mnt/slash/@home/tavianator/code/android/external/honggfuzz/examples/bind/corpus/9f40657d9a60bac7a329c42339c31397.00004436.honggfuzz.cov:
Structure needs cleaning.
    ...

The fs itself is 4 1TB NVMEs in BTRFS RAID0, stacked on top of LUKS.
The LUKS devices have --perf-no_{read,write}_workqueue enabled.
Qu Wenruo Feb. 6, 2024, 8:34 p.m. UTC | #7
On 2024/2/6 23:16, David Sterba wrote:
> On Sat, Jan 27, 2024 at 10:18:36AM +1030, Qu Wenruo wrote:
>> [BUG]
>> There is a bug report about very suspicious tree-checker got triggered:
>>
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    BTRFS critical (device dm-0): corrupted node, root=256
>> block=8550954455682405139 owner mismatch, have 11858205567642294356
>> expect [256, 18446744073709551360]
>>    SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
>> ino=5737268
>>
>> [ANALYZE]
>> The root cause is still unclear, but there are some clues already:
>>
>> - Unaligned eb bytenr
>>    The block bytenr is 8550954455682405139, which is not even aligned to
>>    2.
>>    This bytenr is fetched from extent buffer header, not from eb->start.
>>
>>    This means, at the initial time of read, eb header bytenr is still
>>    correct (the very basis check to continue read), but later something
>>    wrong happened, got at least the first page corrupted.
>>    Thus we got such obviously incorrect value.
>>
>> - Invalid extent buffer header owner
>>    The read itself is triggered for subvolume 256, but the eb header
>>    owner is 11858205567642294356, which is not really possible.
>>    The problem here is, subovlume id is limited to (1 << 48 - 1),
>>    and this one definitely goes beyond that limit.
>>
>>    So this value is another garbage.
>>
>> We already got two garbage from an extent buffer, which passed the
>> initial bytenr and csum checks, but later the contents become garbage at
>> some point.
>>
>> This looks like a page lifespan problem (e.g. we didn't proper hold the
>> page).
>>
>> [ENHANCEMENT]
>> The current tree-checker only output things from the extent buffer,
>> nothing with the page status.
>>
>> So this patch would enhance the tree-checker output by also dumpping the
>> first page, which would look like this:
>>
>>   page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
>>   memcg:ffff888103456000
>>   aops:btree_aops [btrfs] ino:1
>>   flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
>>   page_type: 0xffffffff()
>>   raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
>>   raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
>>   page dumped because: eb page dump
>>   BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
>>   BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
>>
>> >From the dump we can see some extra info, something can help us to do
>> extra cross-checks:
>>
>> - Page refcount
>>    if it's too low, it definitely means something bad.
>>
>> - Page aops
>>    Any mapped eb page should have btree_aops with inode number 1.
>>
>> - Page index
>>    Since a mapped eb page should has its bytenr matching the page
>>    position, (index << PAGE_SHIFT) should match the bytenr of the
>>    bytenr from the critical line.
>>
>> - Page Private flags
>>    A mapped eb page should have Private flag set to indicate it's managed
>>    by btrfs.
>>
>> Link: https://marc.info/?l=linux-btrfs&m=170629708724284&w=2
>
> Please use a link to lore.kernel.org, this keeps the threading and the
> message id is in the url so it's possible to look it up elsewhere.

The problem is, at the time of writing, lore.kernel.org is down...
That's why I have to go marc.info for it.

Just hope the infrastructure would be a little more stable than vger.

Thanks,
Qu
>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> For debugging the patch is useful, I'd say go on and add it.
>
> Reviewed-by: David Sterba <dsterba@suse.com>
>
Qu Wenruo Feb. 6, 2024, 8:39 p.m. UTC | #8
On 2024/2/7 06:42, Tavian Barnes wrote:
> On Tue, 6 Feb 2024 16:24:32 +1030, Qu Wenruo wrote:
>> On 2024/2/6 14:08, tavianator@tavianator.com wrote:
>>> Here's the corresponding dmesg output:
>>>
>>>       page:00000000789c68b4 refcount:4 mapcount:0 mapping:00000000ce99bfc3 index:0x7df93c74 pfn:0x1269558
>>>       memcg:ffff9f20d10df000
>>>       aops:btree_aops [btrfs] ino:1
>>>       flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>>>       page_type: 0xffffffff()
>>>       raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff9f118586feb8
>>>       raw: 000000007df93c74 ffff9f2232376e80 00000004ffffffff ffff9f20d10df000
>>>       page dumped because: eb page dump
>>>       BTRFS critical (device dm-1): corrupted leaf, root=709 block=8656838410240 owner mismatch, have 2694891690930195334 expect [256, 18446744073709551360]
>>
>> The page index and eb->start matches page index, so that page attaching
>> part is correct.
>>
>> And the refcount is also 4, which matches the common case.
>>
>> Although I still need to check the extra flags for workingset.
>
> I did get some other splats with refcount:3, e.g.
>
>      page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
>      page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
>      memcg:ffff9f211ab95000
>      page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
>      memcg:ffff9f211ab95000
>      page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
>      memcg:ffff9f211ab95000
>      memcg:ffff9f211ab95000
>      page:000000005ca43abb refcount:3 mapcount:0 mapping:00000000ce99bfc3 index:0x8eb49f38 pfn:0x17e8520
>      memcg:ffff9f211ab95000
>      BTRFS critical (device dm-1): inode mode mismatch with dir: inode mode=042255 btrfs type=2 dir type=1
>      aops:btree_aops [btrfs] ino:1
>      aops:btree_aops [btrfs] ino:1
>      aops:btree_aops [btrfs] ino:1
>      aops:btree_aops [btrfs] ino:1
>      aops:btree_aops [btrfs] ino:1
>      flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      page_type: 0xffffffff()
>      raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
>      raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
>      raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
>      raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
>      flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page dumped because: eb page dump
>      page dumped because: eb page dump
>      page_type: 0xffffffff()
>      BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
>      BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
>      flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
>      raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
>      page_type: 0xffffffff()
>      page dumped because: eb page dump
>      raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
>      BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
>      raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
>      flags: 0x12ffff580000822c(referenced|uptodate|lru|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      raw: 12ffff580000822c fffffabb9f5f8288 fffffabb9fa14848 ffff9f118586feb8
>      raw: 000000008eb49f38 ffff9f16ae564cb0 00000003ffffffff ffff9f211ab95000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-1): corrupted leaf, root=136202 block=9806651031552 owner mismatch, have 174692946400338119 expect [256, 18446744073709551360]
>
>>> Here's my reproducer if you want to try it yourself.  It uses bfs, a
>>> find(1) clone I wrote with multi-threading and io_uring support.  I'm
>>> in the process of adding multi-threaded stat(), which is what I assume
>>> triggers the bug.
>>>
>>>       $ git clone "https://github.com/tavianator/bfs"
>>>       $ cd bfs
>>>       $ git checkout euclean
>>>       $ make release
>>>
>>> Then repeat these steps until it triggers:
>>>
>>>       # sysctl vm.drop_caches=3
>>>       $ ./bin/bfs /mnt -links 100
>>>       bfs: error: /mnt/slash/@/var/lib/docker/btrfs/subvolumes/f07d37d1c148e9fcdbae166a3a4de36eec49009ce651174d0921fab18d55cee6/dev/ram0: Structure needs cleaning.
>>
>> It looks like the mount point /mnt/ is pretty large with a lot of things
>> pre-populated?
>
> Right, /mnt contains a few filesystems.  /mnt/slash is my root fs (the
> subvolume @ is mounted as /).  It's quite large, with over 41 million
> files and 640 subvolumes.  It's a BTRFS RAID0 array on 4 1TB NVMEs with
> LUKS encryption.
>
>> I tried to populate the btrfs with my linux git repo (which is around
>> 6.5G with some GC needed), but even 256 runs didn't hit the problem.
>>
>> The main part of the script looks like this:
>>
>> for (( i = 0; i < 256; i++ )); do
>> 	mount $dev1 $mnt
>> 	sysctl vm.drop_caches=3
>> 	/home/adam/bfs/bin/bfs $mnt -links 100
>> 	umount $mnt
>> done
>>
>> And the device looks like this:
>>
>> /dev/mapper/test-scratch1  10485760  6472292   3679260  64% /mnt/btrfs
>
> I also noticed that it seems easier to reproduce right after a reboot.
> I failed to reproduce it this morning, but after a reboot it triggered
> immediately.
>
>> Although the difference is, I'm using btrfs/for-next branch
>> (https://github.com/btrfs/linux/tree/for-next).
>>
>> Maybe it's missing some fixes not yet in upstream?
>> My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
>> alloc_extent_buffer() to allocate-then-attach method"), but since I can
>> not reproduce it, it's only a guess...
>
> That's possible!  I tried to follow the existing code in
> alloc_extent_buffer() but didn't see any obvious races.  I will try again
> with the for-next tree and report back.

The most obvious way to proof is, if you can reproduce it really
reliably, then just go back to that commit and verify (it can still
cause the problem).
Then go one commit before for, and verify it doesn't cause the problem
anymore.

Although without a way to reproduce locally, it's really hard to say or
debug from my end.

Thanks,
Qu
>
>> Thanks,
>> Qu
>
> --
>
> Tavian Barnes
Tavian Barnes Feb. 6, 2024, 9:48 p.m. UTC | #9
On Tue, Feb 6, 2024 at 3:40 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2024/2/7 06:42, Tavian Barnes wrote:
> > On Tue, 6 Feb 2024 16:24:32 +1030, Qu Wenruo wrote:
> >> Maybe it's missing some fixes not yet in upstream?
> >> My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
> >> alloc_extent_buffer() to allocate-then-attach method"), but since I can
> >> not reproduce it, it's only a guess...
> >
> > That's possible!  I tried to follow the existing code in
> > alloc_extent_buffer() but didn't see any obvious races.  I will try again
> > with the for-next tree and report back.
>
> The most obvious way to proof is, if you can reproduce it really
> reliably, then just go back to that commit and verify (it can still
> cause the problem).
> Then go one commit before for, and verify it doesn't cause the problem
> anymore.

I just tried the tip of btrfs/for-next (6a1dc34172e0, "btrfs: move
transaction abort to the error site btrfs_rebuild_free_space_tree()"),
plus the dump_page() patch, and it still reproduces:

    BTRFS critical (device dm-2): inode mode mismatch with dir: inode
mode=0142721 btrfs type=6 dir type=1
    page:000000004209c922 refcount:4 mapcount:0
mapping:000000007cadbbf5 index:0x8379d17c pfn:0x13ca315
    memcg:ffff8f2cba7d0000
    aops:btree_aops [btrfs] ino:1
    flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
    page_type: 0xffffffff()
    raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff8f1d446218a0
    raw: 000000008379d17c ffff8f2faa26ea50 00000004ffffffff ffff8f2cba7d0000
    page dumped because: eb page dump
    BTRFS critical (device dm-2): corrupted leaf, root=518
block=9034951802880 owner mismatch, have 15999665770497355816 expect
[256, 18446744073709551360]

Is it still worth trying that specific commit?  I'm guessing not.

> Although without a way to reproduce locally, it's really hard to say or
> debug from my end.

I can try to make a VM image reproducer if that would help.

> Thanks,
> Qu
Qu Wenruo Feb. 6, 2024, 9:53 p.m. UTC | #10
On 2024/2/7 08:18, Tavian Barnes wrote:
> On Tue, Feb 6, 2024 at 3:40 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2024/2/7 06:42, Tavian Barnes wrote:
>>> On Tue, 6 Feb 2024 16:24:32 +1030, Qu Wenruo wrote:
>>>> Maybe it's missing some fixes not yet in upstream?
>>>> My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
>>>> alloc_extent_buffer() to allocate-then-attach method"), but since I can
>>>> not reproduce it, it's only a guess...
>>>
>>> That's possible!  I tried to follow the existing code in
>>> alloc_extent_buffer() but didn't see any obvious races.  I will try again
>>> with the for-next tree and report back.
>>
>> The most obvious way to proof is, if you can reproduce it really
>> reliably, then just go back to that commit and verify (it can still
>> cause the problem).
>> Then go one commit before for, and verify it doesn't cause the problem
>> anymore.
>
> I just tried the tip of btrfs/for-next (6a1dc34172e0, "btrfs: move
> transaction abort to the error site btrfs_rebuild_free_space_tree()"),
> plus the dump_page() patch, and it still reproduces:
>
>      BTRFS critical (device dm-2): inode mode mismatch with dir: inode
> mode=0142721 btrfs type=6 dir type=1
>      page:000000004209c922 refcount:4 mapcount:0
> mapping:000000007cadbbf5 index:0x8379d17c pfn:0x13ca315
>      memcg:ffff8f2cba7d0000
>      aops:btree_aops [btrfs] ino:1
>      flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
>      page_type: 0xffffffff()
>      raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff8f1d446218a0
>      raw: 000000008379d17c ffff8f2faa26ea50 00000004ffffffff ffff8f2cba7d0000
>      page dumped because: eb page dump
>      BTRFS critical (device dm-2): corrupted leaf, root=518
> block=9034951802880 owner mismatch, have 15999665770497355816 expect
> [256, 18446744073709551360]
>
> Is it still worth trying that specific commit?  I'm guessing not.

Yes, still worthy.

The btrfs/for-next contains that commit (which is already upstreamed).
That patch itself has some bugs fixed early (before hitting upstream),
but since it's touching the whole memory management of tree blocks, it
is still the best possible culprit.

>
>> Although without a way to reproduce locally, it's really hard to say or
>> debug from my end.
>
> I can try to make a VM image reproducer if that would help.

That would help a lot.
Although the main part I guess is just the content/size of the target
fs, and transferring tens of gigabytes over internet would never be a
good experience AFAIK.

Thanks,
Qu

>
>> Thanks,
>> Qu
>
Tavian Barnes Feb. 6, 2024, 9:53 p.m. UTC | #11
On Tue, Feb 6, 2024 at 4:48 PM Tavian Barnes <tavianator@tavianator.com> wrote:
> I just tried the tip of btrfs/for-next (6a1dc34172e0, "btrfs: move
> transaction abort to the error site btrfs_rebuild_free_space_tree()"),
> plus the dump_page() patch

By the way Qu, you should fold

@@ -2036,6 +2042,7 @@ int btrfs_check_eb_owner(const struct
extent_buffer *eb, u64 root_owner)
        if (!is_subvol) {
                /* For non-subvolume trees, the eb owner should match
root owner */
                if (unlikely(root_owner != eb_owner)) {
+                       dump_page(folio_page(eb->folios[0], 0), "eb page dump");
                        btrfs_crit(eb->fs_info,
 "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu",
                                btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -2051,6 +2058,7 @@ int btrfs_check_eb_owner(const struct
extent_buffer *eb, u64 root_owner)
         * to subvolume trees.
         */
        if (unlikely(is_subvol != is_fstree(eb_owner))) {
+               dump_page(folio_page(eb->folios[0], 0), "eb page dump");
                btrfs_crit(eb->fs_info,
 "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect
[%llu, %llu]",
                        btrfs_header_level(eb) == 0 ? "leaf" : "node",

into the patch.  Right now it's missing the dump_page() for the
relevant error message.
Qu Wenruo Feb. 6, 2024, 10:01 p.m. UTC | #12
On 2024/2/7 08:23, Tavian Barnes wrote:
> On Tue, Feb 6, 2024 at 4:48 PM Tavian Barnes <tavianator@tavianator.com> wrote:
>> I just tried the tip of btrfs/for-next (6a1dc34172e0, "btrfs: move
>> transaction abort to the error site btrfs_rebuild_free_space_tree()"),
>> plus the dump_page() patch
>
> By the way Qu, you should fold

Oh thanks, indeed I missed some callsites, and only relies on the
*_err() helpers, thus the eb owner check didn't trigger the page dump.

Thanks for the fix!
Qu

>
> @@ -2036,6 +2042,7 @@ int btrfs_check_eb_owner(const struct
> extent_buffer *eb, u64 root_owner)
>          if (!is_subvol) {
>                  /* For non-subvolume trees, the eb owner should match
> root owner */
>                  if (unlikely(root_owner != eb_owner)) {
> +                       dump_page(folio_page(eb->folios[0], 0), "eb page dump");
>                          btrfs_crit(eb->fs_info,
>   "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu",
>                                  btrfs_header_level(eb) == 0 ? "leaf" : "node",
> @@ -2051,6 +2058,7 @@ int btrfs_check_eb_owner(const struct
> extent_buffer *eb, u64 root_owner)
>           * to subvolume trees.
>           */
>          if (unlikely(is_subvol != is_fstree(eb_owner))) {
> +               dump_page(folio_page(eb->folios[0], 0), "eb page dump");
>                  btrfs_crit(eb->fs_info,
>   "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect
> [%llu, %llu]",
>                          btrfs_header_level(eb) == 0 ? "leaf" : "node",
>
> into the patch.  Right now it's missing the dump_page() for the
> relevant error message.
>
Tavian Barnes Feb. 13, 2024, 6:07 p.m. UTC | #13
On Tue, Feb 6, 2024 at 4:53 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2024/2/7 08:18, Tavian Barnes wrote:
> > On Tue, Feb 6, 2024 at 3:40 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> On 2024/2/7 06:42, Tavian Barnes wrote:
> >>> On Tue, 6 Feb 2024 16:24:32 +1030, Qu Wenruo wrote:
> >>>> Maybe it's missing some fixes not yet in upstream?
> >>>> My current guess is related to my commit 09e6cef19c9f ("btrfs: refactor
> >>>> alloc_extent_buffer() to allocate-then-attach method"), but since I can
> >>>> not reproduce it, it's only a guess...
> >>>
> >>> That's possible!  I tried to follow the existing code in
> >>> alloc_extent_buffer() but didn't see any obvious races.  I will try again
> >>> with the for-next tree and report back.
> >>
> >> The most obvious way to proof is, if you can reproduce it really
> >> reliably, then just go back to that commit and verify (it can still
> >> cause the problem).
> >> Then go one commit before for, and verify it doesn't cause the problem
> >> anymore.
> >
> > I just tried the tip of btrfs/for-next (6a1dc34172e0, "btrfs: move
> > transaction abort to the error site btrfs_rebuild_free_space_tree()"),
> > plus the dump_page() patch, and it still reproduces:
> >
> >      BTRFS critical (device dm-2): inode mode mismatch with dir: inode
> > mode=0142721 btrfs type=6 dir type=1
> >      page:000000004209c922 refcount:4 mapcount:0
> > mapping:000000007cadbbf5 index:0x8379d17c pfn:0x13ca315
> >      memcg:ffff8f2cba7d0000
> >      aops:btree_aops [btrfs] ino:1
> >      flags: 0x12ffff180000820c(referenced|uptodate|workingset|private|node=2|zone=2|lastcpupid=0xffff)
> >      page_type: 0xffffffff()
> >      raw: 12ffff180000820c 0000000000000000 dead000000000122 ffff8f1d446218a0
> >      raw: 000000008379d17c ffff8f2faa26ea50 00000004ffffffff ffff8f2cba7d0000
> >      page dumped because: eb page dump
> >      BTRFS critical (device dm-2): corrupted leaf, root=518
> > block=9034951802880 owner mismatch, have 15999665770497355816 expect
> > [256, 18446744073709551360]
> >
> > Is it still worth trying that specific commit?  I'm guessing not.
>
> Yes, still worthy.
>
> The btrfs/for-next contains that commit (which is already upstreamed).
> That patch itself has some bugs fixed early (before hitting upstream),
> but since it's touching the whole memory management of tree blocks, it
> is still the best possible culprit.

Ah okay, I see what you mean.  Unfortunately it still reproduces on
both that commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer()
to allocate-then-attach method"), and the commit before it
2b0122aaa800 ("btrfs: sysfs: validate scrub_speed_max value").

I tried to bisect but I don't know where to start from.  It still
reproduces all the way back to v6.5, although with a different splat:

    RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
    Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
    RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
    RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
    RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
    RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
    R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
    R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
    FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3d65db3010 CR3: 000000010fcd6000 CR4: 0000000000350ee0
    Call Trace:
     <TASK>
     ? die_addr+0x36/0x90
     ? exc_general_protection+0x1c5/0x430
     ? asm_exc_general_protection+0x26/0x30
     ? btrfs_bin_search+0xd7/0x1d0 [btrfs
698563e3c4412867d9f65411f4b3f353931d836b]
     btrfs_search_slot+0x458/0xd00 [btrfs
698563e3c4412867d9f65411f4b3f353931d836b]
     btrfs_lookup_inode+0x55/0xe0 [btrfs
698563e3c4412867d9f65411f4b3f353931d836b]
     btrfs_read_locked_inode+0x52a/0x610 [btrfs
698563e3c4412867d9f65411f4b3f353931d836b]
     btrfs_iget_path+0x93/0xe0 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
     btrfs_lookup_dentry+0x394/0x630 [btrfs
698563e3c4412867d9f65411f4b3f353931d836b]
     ? d_alloc_parallel+0x230/0x3f0
     btrfs_lookup+0x12/0x30 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
     __lookup_slow+0x86/0x130
     walk_component+0xdb/0x150
     path_lookupat+0x6a/0x1a0
     filename_lookup+0xe8/0x1f0
     vfs_statx+0x9e/0x180
     do_statx+0x66/0xb0
     io_statx+0x27/0x40
     io_issue_sqe+0x63/0x3c0
     io_wq_submit_work+0x89/0x2c0
     io_worker_handle_work+0x189/0x560
     io_wq_worker+0x10a/0x360
     ? srso_return_thunk+0x5/0x10
     ? __pfx_io_wq_worker+0x10/0x10
     ret_from_fork+0x34/0x50
     ? __pfx_io_wq_worker+0x10/0x10
     ret_from_fork_asm+0x1b/0x30
     </TASK>
    Modules linked in: xt_conntrack xt_comment veth rpcrdma rdma_cm
iw_cm ib_cm ib_core cmac algif_hash algif_skcipher nct6775 af_alg
nct6775_core hwmon_vid bnep lm92 nls_iso8859_1 vfat fat intel_rapl_msr
intel_rapl_common amd64_edac edac_mce_amd snd_hda_codec_hdmi kvm_amd
uvcvideo snd_hda_intel snd_intel_dspcfg uvc snd_intel_sdw_acpi
snd_usb_audio gspca_vc032x snd_hda_codec gspca_main btusb
snd_usbmidi_lib snd_hda_core kvm btrtl videobuf2_vmalloc snd_ump btbcm
videobuf2_memops snd_rawmidi snd_hwdep btintel videobuf2_v4l2
snd_seq_device btmtk videodev snd_pcm bluetooth mxm_wmi wmi_bmof
videobuf2_common snd_timer irqbypass rapl snd mc ecdh_generic pcspkr
acpi_cpufreq sp5100_tco crc16 soundcore i2c_piix4 k10temp mousedev
joydev wmi mac_hid nfsd auth_rpcgss nfs_acl lockd usbip_host
usbip_core pkcs8_key_parser grace i2c_dev sg sunrpc crypto_user fuse
loop btrfs blake2b_generic xor raid6_pq dm_crypt cbc encrypted_keys
trusted asn1_encoder tee xt_MASQUERADE xt_tcpudp xt_mark uas
usb_storage hid_logitech_hidpp dm_mod
     tun hid_logitech_dj usbhid crct10dif_pclmul crc32_pclmul
polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel
sha512_ssse3 iwlwifi igb aesni_intel nvme crypto_simd ccp cryptd
sr_mod i2c_algo_bit nvme_core xhci_pci cdrom dca xhci_pci_renesas
bridge nf_tables stp llc ip6table_nat ip6table_filter ip6_tables
cfg80211 iptable_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c crc32c_generic crc32c_intel iptable_filter
nfnetlink ip_tables x_tables
    ---[ end trace 0000000000000000 ]---
    RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
    Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
    RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
    RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
    RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
    RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
    R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
    R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
    FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007feb3489f960 CR3: 000000010fcd6000 CR4: 0000000000350ee0
    BTRFS critical (device dm-3): corrupted node, root=518
block=17613216952440067356 owner mismatch, have 16303017448389165215
expect [256, 18446744073709551360]
Tavian Barnes Feb. 13, 2024, 6:26 p.m. UTC | #14
On Tue, Feb 13, 2024 at 1:07 PM Tavian Barnes <tavianator@tavianator.com> wrote:
> I tried to bisect but I don't know where to start from.  It still
> reproduces all the way back to v6.5, although with a different splat:

Oh I missed part of the splat:

      general protection fault, probably for non-canonical address
0x7f99872a6b80f0: 0000 [#1] PREEMPT SMP NOPTI
      BTRFS critical (device dm-3): corrupted node, root=518
block=16000637395156534217 owner mismatch, have 12049901028372027545
expect [256, 18446744073709551360]
      CPU: 47 PID: 3729 Comm: iou-wrk-3310 Not tainted 6.5.0-euclean
#10 4197dfd21e86f976fbd69cbd6a56016cf20d42e1
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40
PRO WIFI (MS-7C60), BIOS 2.80 05/17/2022
>     RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
>     ...

Normally I would suspect bad RAM for something like this, but I have
ECC memory and no reports of corrected errors.  0x7f99872a6b80f0
doesn't look like a single bit flip either.  I'm leaning towards it
being a race condition somewhere.

I'll run memtest anyway just to be sure.  I already ran btrfs check
with no errors.
Qu Wenruo Feb. 13, 2024, 9:26 p.m. UTC | #15
On 2024/2/14 04:37, Tavian Barnes wrote:
> On Tue, Feb 6, 2024 at 4:53 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> Yes, still worthy.
>>
>> The btrfs/for-next contains that commit (which is already upstreamed).
>> That patch itself has some bugs fixed early (before hitting upstream),
>> but since it's touching the whole memory management of tree blocks, it
>> is still the best possible culprit.
> 
> Ah okay, I see what you mean.  Unfortunately it still reproduces on
> both that commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer()
> to allocate-then-attach method"), and the commit before it
> 2b0122aaa800 ("btrfs: sysfs: validate scrub_speed_max value").

At least we have one less thing to worry.

> 
> I tried to bisect but I don't know where to start from.  It still
> reproduces all the way back to v6.5, although with a different splat:
>       general protection fault, probably for non-canonical address
> 0x7f99872a6b80f0: 0000 [#1] PREEMPT SMP NOPTI
>       BTRFS critical (device dm-3): corrupted node, root=518
> block=16000637395156534217 owner mismatch, have 12049901028372027545
> expect [256, 18446744073709551360]
>       CPU: 47 PID: 3729 Comm: iou-wrk-3310 Not tainted 6.5.0-euclean
> #10 4197dfd21e86f976fbd69cbd6a56016cf20d42e1
>       Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40
> PRO WIFI (MS-7C60), BIOS 2.80 05/17/2022

Since it's threadripper and support ECC memory, and you're already using 
one, I don't believe it's hardware problem.

Furthermore, Linus himself also hit it once, it must be something 
related to our extent buffer memory management.


The call trace is different but I believe the problem is the same.
For now I don't have much clue unfortunately.

The only recommendation I have is to try version by version, if the 
problem persists even at v6.0, I believe we're having a bigger problem.

I can add some extra trace_printk() for you to test.
Before that, please give me sometime to craft a debug patch, meanwhile 
feel free to try older kernels until v6.0.

Really appreciate not only your report but all the effort,

Thanks,
Qu

> 
>      RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
>      Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
> 0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
> 8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
>      RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
>      RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
>      RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
>      RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
>      R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
>      R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
>      FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 00007f3d65db3010 CR3: 000000010fcd6000 CR4: 0000000000350ee0
>      Call Trace:
>       <TASK>
>       ? die_addr+0x36/0x90
>       ? exc_general_protection+0x1c5/0x430
>       ? asm_exc_general_protection+0x26/0x30
>       ? btrfs_bin_search+0xd7/0x1d0 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_search_slot+0x458/0xd00 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_lookup_inode+0x55/0xe0 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_read_locked_inode+0x52a/0x610 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_iget_path+0x93/0xe0 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_lookup_dentry+0x394/0x630 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       ? d_alloc_parallel+0x230/0x3f0
>       btrfs_lookup+0x12/0x30 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
>       __lookup_slow+0x86/0x130
>       walk_component+0xdb/0x150
>       path_lookupat+0x6a/0x1a0
>       filename_lookup+0xe8/0x1f0
>       vfs_statx+0x9e/0x180
>       do_statx+0x66/0xb0
>       io_statx+0x27/0x40
>       io_issue_sqe+0x63/0x3c0
>       io_wq_submit_work+0x89/0x2c0
>       io_worker_handle_work+0x189/0x560
>       io_wq_worker+0x10a/0x360
>       ? srso_return_thunk+0x5/0x10
>       ? __pfx_io_wq_worker+0x10/0x10
>       ret_from_fork+0x34/0x50
>       ? __pfx_io_wq_worker+0x10/0x10
>       ret_from_fork_asm+0x1b/0x30
>       </TASK>
>      Modules linked in: xt_conntrack xt_comment veth rpcrdma rdma_cm
> iw_cm ib_cm ib_core cmac algif_hash algif_skcipher nct6775 af_alg
> nct6775_core hwmon_vid bnep lm92 nls_iso8859_1 vfat fat intel_rapl_msr
> intel_rapl_common amd64_edac edac_mce_amd snd_hda_codec_hdmi kvm_amd
> uvcvideo snd_hda_intel snd_intel_dspcfg uvc snd_intel_sdw_acpi
> snd_usb_audio gspca_vc032x snd_hda_codec gspca_main btusb
> snd_usbmidi_lib snd_hda_core kvm btrtl videobuf2_vmalloc snd_ump btbcm
> videobuf2_memops snd_rawmidi snd_hwdep btintel videobuf2_v4l2
> snd_seq_device btmtk videodev snd_pcm bluetooth mxm_wmi wmi_bmof
> videobuf2_common snd_timer irqbypass rapl snd mc ecdh_generic pcspkr
> acpi_cpufreq sp5100_tco crc16 soundcore i2c_piix4 k10temp mousedev
> joydev wmi mac_hid nfsd auth_rpcgss nfs_acl lockd usbip_host
> usbip_core pkcs8_key_parser grace i2c_dev sg sunrpc crypto_user fuse
> loop btrfs blake2b_generic xor raid6_pq dm_crypt cbc encrypted_keys
> trusted asn1_encoder tee xt_MASQUERADE xt_tcpudp xt_mark uas
> usb_storage hid_logitech_hidpp dm_mod
>       tun hid_logitech_dj usbhid crct10dif_pclmul crc32_pclmul
> polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel
> sha512_ssse3 iwlwifi igb aesni_intel nvme crypto_simd ccp cryptd
> sr_mod i2c_algo_bit nvme_core xhci_pci cdrom dca xhci_pci_renesas
> bridge nf_tables stp llc ip6table_nat ip6table_filter ip6_tables
> cfg80211 iptable_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6
> nf_defrag_ipv4 libcrc32c crc32c_generic crc32c_intel iptable_filter
> nfnetlink ip_tables x_tables
>      ---[ end trace 0000000000000000 ]---
>      RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
>      Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
> 0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
> 8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
>      RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
>      RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
>      RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
>      RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
>      R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
>      R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
>      FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 00007feb3489f960 CR3: 000000010fcd6000 CR4: 0000000000350ee0
>      BTRFS critical (device dm-3): corrupted node, root=518
> block=17613216952440067356 owner mismatch, have 16303017448389165215
> expect [256, 18446744073709551360]
>
diff mbox series

Patch

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 4fa95eca285e..c8fbcae4e88e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -65,6 +65,7 @@  static void generic_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(fs_info,
 		"corrupt %s: root=%llu block=%llu slot=%d, %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -92,6 +93,7 @@  static void file_extent_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(fs_info,
 	"corrupt %s: root=%llu block=%llu slot=%d ino=%llu file_offset=%llu, %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -152,6 +154,7 @@  static void dir_item_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(fs_info,
 		"corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -647,6 +650,7 @@  static void block_group_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(fs_info,
 	"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -1003,6 +1007,7 @@  static void dev_item_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(eb->fs_info,
 	"corrupt %s: root=%llu block=%llu slot=%d devid=%llu %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",
@@ -1258,6 +1263,7 @@  static void extent_err(const struct extent_buffer *eb, int slot,
 	vaf.fmt = fmt;
 	vaf.va = &args;
 
+	dump_page(folio_page(eb->folios[0], 0), "eb page dump");
 	btrfs_crit(eb->fs_info,
 	"corrupt %s: block=%llu slot=%d extent bytenr=%llu len=%llu %pV",
 		btrfs_header_level(eb) == 0 ? "leaf" : "node",