diff mbox series

btrfs: zoned: prevent extent buffer header to be cleared

Message ID 3f4f2a0ff1a6c818050434288925bdcf3cd719e5.1709124777.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned: prevent extent buffer header to be cleared | expand

Commit Message

Naohiro Aota Feb. 28, 2024, 12:53 p.m. UTC
Btrfs clears the content of an extent buffer marked as
EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
introduced to prevent a write hole of an extent buffer, which is once
allocated, marked dirty, but turns out unnecessary and cleaned up within
one transaction operation.

However, btrfs_free_tree_block() can be called on an extent buffer after
its content is cleared. Then, it inserts a faulty delayed reference entry,
which makes the FS corrupted.

This bug can be triggered running generic/013 several (~200) times. It
failed as following:

    ------------[ cut here ]------------
    WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
    Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
    CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403
    Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
    RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
    Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b
    RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246
    RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004
    R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001
    R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
    FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
    Call Trace:
     <TASK>
     ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
     ? __warn+0x81/0x170
     ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
     ? report_bug+0x18d/0x1c0
     ? handle_bug+0x3c/0x70
     ? exc_invalid_op+0x13/0x60
     ? asm_exc_invalid_op+0x16/0x20
     ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
     ? srso_return_thunk+0x5/0x5f
     ? rcu_is_watching+0xd/0x40
     ? srso_return_thunk+0x5/0x5f
     ? lock_release+0x1e5/0x280
     __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
     ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
     btrfs_run_delayed_refs+0x92/0x130 [btrfs]
     btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
     ? srso_return_thunk+0x5/0x5f
     ? srso_return_thunk+0x5/0x5f
     ? rcu_is_watching+0xd/0x40
     ? srso_return_thunk+0x5/0x5f
     ? lock_release+0x1e5/0x280
     btrfs_sync_file+0x532/0x660 [btrfs]
     __x64_sys_fsync+0x37/0x60
     do_syscall_64+0x79/0x1a0
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
    RIP: 0033:0x7f823f6f8400
    Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
    RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
    RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
    RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
    R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
    R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
     </TASK>
    irq event stamp: 0
    hardirqs last  enabled at (0): [<0000000000000000>] 0x0
    hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
    softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
    softirqs last disabled at (0): [<0000000000000000>] 0x0
    ---[ end trace 0000000000000000 ]---
    ------------[ cut here ]------------
    BTRFS: Transaction aborted (error -117)
    WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
    Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
    CPU: 9 PID: 29834 Comm: fsstress Tainted: G        W          6.8.0-rc4-BTRFS-ZNS+ #403
    Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
    RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
    Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28
    RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000
    RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff
    RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff
    R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001
    R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
    FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
    Call Trace:
     <TASK>
     ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
     ? __warn+0x81/0x170
     ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
     ? report_bug+0x18d/0x1c0
     ? tick_nohz_tick_stopped+0x12/0x30
     ? srso_return_thunk+0x5/0x5f
     ? handle_bug+0x3c/0x70
     ? exc_invalid_op+0x13/0x60
     ? asm_exc_invalid_op+0x16/0x20
     ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
     ? srso_return_thunk+0x5/0x5f
     ? rcu_is_watching+0xd/0x40
     ? srso_return_thunk+0x5/0x5f
     ? lock_release+0x1e5/0x280
     __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
     ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
     btrfs_run_delayed_refs+0x92/0x130 [btrfs]
     btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
     ? srso_return_thunk+0x5/0x5f
     ? srso_return_thunk+0x5/0x5f
     ? rcu_is_watching+0xd/0x40
     ? srso_return_thunk+0x5/0x5f
     ? lock_release+0x1e5/0x280
     btrfs_sync_file+0x532/0x660 [btrfs]
     __x64_sys_fsync+0x37/0x60
     do_syscall_64+0x79/0x1a0
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
    RIP: 0033:0x7f823f6f8400
    Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
    RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
    RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
    RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
    R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
    R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
     </TASK>
    irq event stamp: 0
    hardirqs last  enabled at (0): [<0000000000000000>] 0x0
    hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
    softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
    softirqs last disabled at (0): [<0000000000000000>] 0x0
    ---[ end trace 0000000000000000 ]---
    BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted
    BTRFS info (device nvme1n2: state EA): forced readonly
    BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2
    BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834
            item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53
                    extent refs 1 gen 3835 flags 1
                    ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1
(snip)...
            item 164 key (66948923392 169 0) itemoff 8229 itemsize 33
                    extent refs 1 gen 3872 flags 2
                    ref#0: tree block backref root 2
            item 165 key (66948939776 169 1) itemoff 8196 itemsize 33
                    extent refs 1 gen 3873 flags 2
                    ref#0: tree block backref root 5
            item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53
                    extent refs 1 gen 3841 flags 1
                    ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1
(snip)...
            item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53
                    extent refs 1 gen 3842 flags 1
                    ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1
    BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166
    BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2
    BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry
    BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction.
    BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry

This happens maybe because clearing the contents is too early. It should
clear the content after all the reference to the node is dropped.

Addressing that root cause needs more time. Until that, leave the extent
buffer header intact, so that we can add a proper delayed reference entry.

Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer")
Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

David Sterba Feb. 28, 2024, 1:22 p.m. UTC | #1
On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote:
> Btrfs clears the content of an extent buffer marked as
> EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
> introduced to prevent a write hole of an extent buffer, which is once
> allocated, marked dirty, but turns out unnecessary and cleaned up within
> one transaction operation.
> 
> However, btrfs_free_tree_block() can be called on an extent buffer after
> its content is cleared. Then, it inserts a faulty delayed reference entry,
> which makes the FS corrupted.
> 
> This bug can be triggered running generic/013 several (~200) times. It
> failed as following:
> 
>     ------------[ cut here ]------------
>     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
>     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
>     CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403
>     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
>     RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
>     Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b
>     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246
>     RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001
>     RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
>     RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004
>     R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001
>     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
>     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
>     Call Trace:
>      <TASK>
>      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
>      ? __warn+0x81/0x170
>      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
>      ? report_bug+0x18d/0x1c0
>      ? handle_bug+0x3c/0x70
>      ? exc_invalid_op+0x13/0x60
>      ? asm_exc_invalid_op+0x16/0x20
>      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
>      ? srso_return_thunk+0x5/0x5f
>      ? rcu_is_watching+0xd/0x40
>      ? srso_return_thunk+0x5/0x5f
>      ? lock_release+0x1e5/0x280
>      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
>      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
>      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
>      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
>      ? srso_return_thunk+0x5/0x5f
>      ? srso_return_thunk+0x5/0x5f
>      ? rcu_is_watching+0xd/0x40
>      ? srso_return_thunk+0x5/0x5f
>      ? lock_release+0x1e5/0x280
>      btrfs_sync_file+0x532/0x660 [btrfs]
>      __x64_sys_fsync+0x37/0x60
>      do_syscall_64+0x79/0x1a0
>      entry_SYSCALL_64_after_hwframe+0x6e/0x76
>     RIP: 0033:0x7f823f6f8400
>     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
>     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
>     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
>     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
>     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
>     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
>     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
>      </TASK>
>     irq event stamp: 0
>     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
>     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
>     softirqs last disabled at (0): [<0000000000000000>] 0x0
>     ---[ end trace 0000000000000000 ]---
>     ------------[ cut here ]------------
>     BTRFS: Transaction aborted (error -117)
>     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
>     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
>     CPU: 9 PID: 29834 Comm: fsstress Tainted: G        W          6.8.0-rc4-BTRFS-ZNS+ #403
>     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
>     RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
>     Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28
>     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282
>     RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000
>     RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff
>     RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff
>     R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001
>     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
>     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
>     Call Trace:
>      <TASK>
>      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
>      ? __warn+0x81/0x170
>      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
>      ? report_bug+0x18d/0x1c0
>      ? tick_nohz_tick_stopped+0x12/0x30
>      ? srso_return_thunk+0x5/0x5f
>      ? handle_bug+0x3c/0x70
>      ? exc_invalid_op+0x13/0x60
>      ? asm_exc_invalid_op+0x16/0x20
>      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
>      ? srso_return_thunk+0x5/0x5f
>      ? rcu_is_watching+0xd/0x40
>      ? srso_return_thunk+0x5/0x5f
>      ? lock_release+0x1e5/0x280
>      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
>      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
>      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
>      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
>      ? srso_return_thunk+0x5/0x5f
>      ? srso_return_thunk+0x5/0x5f
>      ? rcu_is_watching+0xd/0x40
>      ? srso_return_thunk+0x5/0x5f
>      ? lock_release+0x1e5/0x280
>      btrfs_sync_file+0x532/0x660 [btrfs]
>      __x64_sys_fsync+0x37/0x60
>      do_syscall_64+0x79/0x1a0
>      entry_SYSCALL_64_after_hwframe+0x6e/0x76
>     RIP: 0033:0x7f823f6f8400
>     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
>     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
>     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
>     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
>     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
>     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
>     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
>      </TASK>
>     irq event stamp: 0
>     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
>     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
>     softirqs last disabled at (0): [<0000000000000000>] 0x0
>     ---[ end trace 0000000000000000 ]---
>     BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted
>     BTRFS info (device nvme1n2: state EA): forced readonly
>     BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2
>     BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834
>             item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53
>                     extent refs 1 gen 3835 flags 1
>                     ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1
> (snip)...
>             item 164 key (66948923392 169 0) itemoff 8229 itemsize 33
>                     extent refs 1 gen 3872 flags 2
>                     ref#0: tree block backref root 2
>             item 165 key (66948939776 169 1) itemoff 8196 itemsize 33
>                     extent refs 1 gen 3873 flags 2
>                     ref#0: tree block backref root 5
>             item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53
>                     extent refs 1 gen 3841 flags 1
>                     ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1
> (snip)...
>             item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53
>                     extent refs 1 gen 3842 flags 1
>                     ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1
>     BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166
>     BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2
>     BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry
>     BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction.
>     BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry
> 
> This happens maybe because clearing the contents is too early. It should
> clear the content after all the reference to the node is dropped.

If you have such suspicion you can add assertions to validate the state,
bits and other constraints.

> Addressing that root cause needs more time. Until that, leave the extent
> buffer header intact, so that we can add a proper delayed reference entry.
> 
> Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer")
> Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/disk-io.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index a2e45ed6ef14..8aaed8719394 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
>  	 * ordering of I/O without unnecessarily writing out data.
>  	 */
>  	if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) {
> -		memzero_extent_buffer(eb, 0, eb->len);
> +		const unsigned long header_size = sizeof(struct btrfs_header);
> +
> +		memzero_extent_buffer(eb, header_size, eb->len - header_size);

So this means anything that finds the buffer will have to rely on the
state in the header and if it's with ZONED_ZEROOUT then stop processing
it. btree_csum_one_bio() is the only function that checks the bit
AFAICS.

Otherwise, if this is a sufficient fix we'd need it for 6.8.
Naohiro Aota March 5, 2024, 6:57 a.m. UTC | #2
On Wed, Feb 28, 2024 at 02:22:49PM +0100, David Sterba wrote:
> On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote:
> > Btrfs clears the content of an extent buffer marked as
> > EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
> > introduced to prevent a write hole of an extent buffer, which is once
> > allocated, marked dirty, but turns out unnecessary and cleaned up within
> > one transaction operation.
> > 
> > However, btrfs_free_tree_block() can be called on an extent buffer after
> > its content is cleared. Then, it inserts a faulty delayed reference entry,
> > which makes the FS corrupted.
> > 
> > This bug can be triggered running generic/013 several (~200) times. It
> > failed as following:
> > 
> >     ------------[ cut here ]------------
> >     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> >     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
> >     CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403
> >     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
> >     RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> >     Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b
> >     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246
> >     RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001
> >     RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
> >     RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004
> >     R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001
> >     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
> >     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
> >     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
> >     Call Trace:
> >      <TASK>
> >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> >      ? __warn+0x81/0x170
> >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> >      ? report_bug+0x18d/0x1c0
> >      ? handle_bug+0x3c/0x70
> >      ? exc_invalid_op+0x13/0x60
> >      ? asm_exc_invalid_op+0x16/0x20
> >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> >      ? srso_return_thunk+0x5/0x5f
> >      ? rcu_is_watching+0xd/0x40
> >      ? srso_return_thunk+0x5/0x5f
> >      ? lock_release+0x1e5/0x280
> >      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
> >      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
> >      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
> >      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
> >      ? srso_return_thunk+0x5/0x5f
> >      ? srso_return_thunk+0x5/0x5f
> >      ? rcu_is_watching+0xd/0x40
> >      ? srso_return_thunk+0x5/0x5f
> >      ? lock_release+0x1e5/0x280
> >      btrfs_sync_file+0x532/0x660 [btrfs]
> >      __x64_sys_fsync+0x37/0x60
> >      do_syscall_64+0x79/0x1a0
> >      entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >     RIP: 0033:0x7f823f6f8400
> >     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
> >     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
> >     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
> >     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
> >     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
> >     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
> >     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
> >      </TASK>
> >     irq event stamp: 0
> >     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> >     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> >     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> >     softirqs last disabled at (0): [<0000000000000000>] 0x0
> >     ---[ end trace 0000000000000000 ]---
> >     ------------[ cut here ]------------
> >     BTRFS: Transaction aborted (error -117)
> >     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> >     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
> >     CPU: 9 PID: 29834 Comm: fsstress Tainted: G        W          6.8.0-rc4-BTRFS-ZNS+ #403
> >     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
> >     RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> >     Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28
> >     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282
> >     RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000
> >     RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff
> >     RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff
> >     R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001
> >     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
> >     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
> >     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
> >     Call Trace:
> >      <TASK>
> >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> >      ? __warn+0x81/0x170
> >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> >      ? report_bug+0x18d/0x1c0
> >      ? tick_nohz_tick_stopped+0x12/0x30
> >      ? srso_return_thunk+0x5/0x5f
> >      ? handle_bug+0x3c/0x70
> >      ? exc_invalid_op+0x13/0x60
> >      ? asm_exc_invalid_op+0x16/0x20
> >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> >      ? srso_return_thunk+0x5/0x5f
> >      ? rcu_is_watching+0xd/0x40
> >      ? srso_return_thunk+0x5/0x5f
> >      ? lock_release+0x1e5/0x280
> >      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
> >      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
> >      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
> >      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
> >      ? srso_return_thunk+0x5/0x5f
> >      ? srso_return_thunk+0x5/0x5f
> >      ? rcu_is_watching+0xd/0x40
> >      ? srso_return_thunk+0x5/0x5f
> >      ? lock_release+0x1e5/0x280
> >      btrfs_sync_file+0x532/0x660 [btrfs]
> >      __x64_sys_fsync+0x37/0x60
> >      do_syscall_64+0x79/0x1a0
> >      entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >     RIP: 0033:0x7f823f6f8400
> >     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
> >     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
> >     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
> >     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
> >     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
> >     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
> >     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
> >      </TASK>
> >     irq event stamp: 0
> >     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> >     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> >     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> >     softirqs last disabled at (0): [<0000000000000000>] 0x0
> >     ---[ end trace 0000000000000000 ]---
> >     BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted
> >     BTRFS info (device nvme1n2: state EA): forced readonly
> >     BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2
> >     BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834
> >             item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53
> >                     extent refs 1 gen 3835 flags 1
> >                     ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1
> > (snip)...
> >             item 164 key (66948923392 169 0) itemoff 8229 itemsize 33
> >                     extent refs 1 gen 3872 flags 2
> >                     ref#0: tree block backref root 2
> >             item 165 key (66948939776 169 1) itemoff 8196 itemsize 33
> >                     extent refs 1 gen 3873 flags 2
> >                     ref#0: tree block backref root 5
> >             item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53
> >                     extent refs 1 gen 3841 flags 1
> >                     ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1
> > (snip)...
> >             item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53
> >                     extent refs 1 gen 3842 flags 1
> >                     ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1
> >     BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166
> >     BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2
> >     BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry
> >     BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction.
> >     BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry
> > 
> > This happens maybe because clearing the contents is too early. It should
> > clear the content after all the reference to the node is dropped.
> 
> If you have such suspicion you can add assertions to validate the state,
> bits and other constraints.

It is difficult to assert the node-to-node reference because we need to
query the tree. But, I'm going to add other assertion to check if the
delayed ref is not faulty.

> 
> > Addressing that root cause needs more time. Until that, leave the extent
> > buffer header intact, so that we can add a proper delayed reference entry.
> > 
> > Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer")
> > Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  fs/btrfs/disk-io.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index a2e45ed6ef14..8aaed8719394 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
> >  	 * ordering of I/O without unnecessarily writing out data.
> >  	 */
> >  	if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) {
> > -		memzero_extent_buffer(eb, 0, eb->len);
> > +		const unsigned long header_size = sizeof(struct btrfs_header);
> > +
> > +		memzero_extent_buffer(eb, header_size, eb->len - header_size);
> 
> So this means anything that finds the buffer will have to rely on the
> state in the header and if it's with ZONED_ZEROOUT then stop processing
> it. btree_csum_one_bio() is the only function that checks the bit
> AFAICS.

Yes, indeed it's cumbersome to check the flag for every btrfs_header_*
usage. So, I found another solution and I'm testing it. It looks like
adding EXTENT_BUFFER_ZONED_ZEROOUT even when an extent buffer is not DIRTY
is causing the bug.

> Otherwise, if this is a sufficient fix we'd need it for 6.8.
Naohiro Aota March 6, 2024, 4:10 p.m. UTC | #3
On Tue, Mar 05, 2024 at 06:57:48AM +0000, Naohiro Aota wrote:
> On Wed, Feb 28, 2024 at 02:22:49PM +0100, David Sterba wrote:
> > On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote:
> > > Btrfs clears the content of an extent buffer marked as
> > > EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
> > > introduced to prevent a write hole of an extent buffer, which is once
> > > allocated, marked dirty, but turns out unnecessary and cleaned up within
> > > one transaction operation.
> > > 
> > > However, btrfs_free_tree_block() can be called on an extent buffer after
> > > its content is cleared. Then, it inserts a faulty delayed reference entry,
> > > which makes the FS corrupted.
> > > 
> > > This bug can be triggered running generic/013 several (~200) times. It
> > > failed as following:
> > > 
> > >     ------------[ cut here ]------------
> > >     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> > >     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
> > >     CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403
> > >     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
> > >     RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> > >     Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b
> > >     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246
> > >     RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001
> > >     RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
> > >     RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004
> > >     R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001
> > >     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
> > >     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
> > >     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
> > >     Call Trace:
> > >      <TASK>
> > >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> > >      ? __warn+0x81/0x170
> > >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> > >      ? report_bug+0x18d/0x1c0
> > >      ? handle_bug+0x3c/0x70
> > >      ? exc_invalid_op+0x13/0x60
> > >      ? asm_exc_invalid_op+0x16/0x20
> > >      ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs]
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? rcu_is_watching+0xd/0x40
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? lock_release+0x1e5/0x280
> > >      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
> > >      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
> > >      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
> > >      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? rcu_is_watching+0xd/0x40
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? lock_release+0x1e5/0x280
> > >      btrfs_sync_file+0x532/0x660 [btrfs]
> > >      __x64_sys_fsync+0x37/0x60
> > >      do_syscall_64+0x79/0x1a0
> > >      entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > >     RIP: 0033:0x7f823f6f8400
> > >     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
> > >     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
> > >     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
> > >     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
> > >     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
> > >     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
> > >     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
> > >      </TASK>
> > >     irq event stamp: 0
> > >     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > >     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> > >     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> > >     softirqs last disabled at (0): [<0000000000000000>] 0x0
> > >     ---[ end trace 0000000000000000 ]---
> > >     ------------[ cut here ]------------
> > >     BTRFS: Transaction aborted (error -117)
> > >     WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> > >     Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
> > >     CPU: 9 PID: 29834 Comm: fsstress Tainted: G        W          6.8.0-rc4-BTRFS-ZNS+ #403
> > >     Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
> > >     RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> > >     Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28
> > >     RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282
> > >     RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000
> > >     RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff
> > >     RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff
> > >     R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001
> > >     R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80
> > >     FS:  00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000
> > >     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >     CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0
> > >     Call Trace:
> > >      <TASK>
> > >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> > >      ? __warn+0x81/0x170
> > >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> > >      ? report_bug+0x18d/0x1c0
> > >      ? tick_nohz_tick_stopped+0x12/0x30
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? handle_bug+0x3c/0x70
> > >      ? exc_invalid_op+0x13/0x60
> > >      ? asm_exc_invalid_op+0x16/0x20
> > >      ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs]
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? rcu_is_watching+0xd/0x40
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? lock_release+0x1e5/0x280
> > >      __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs]
> > >      ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs]
> > >      btrfs_run_delayed_refs+0x92/0x130 [btrfs]
> > >      btrfs_commit_transaction+0xa2/0x12d0 [btrfs]
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? rcu_is_watching+0xd/0x40
> > >      ? srso_return_thunk+0x5/0x5f
> > >      ? lock_release+0x1e5/0x280
> > >      btrfs_sync_file+0x532/0x660 [btrfs]
> > >      __x64_sys_fsync+0x37/0x60
> > >      do_syscall_64+0x79/0x1a0
> > >      entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > >     RIP: 0033:0x7f823f6f8400
> > >     Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
> > >     RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
> > >     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400
> > >     RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003
> > >     RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c
> > >     R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10
> > >     R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180
> > >      </TASK>
> > >     irq event stamp: 0
> > >     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > >     hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> > >     softirqs last  enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00
> > >     softirqs last disabled at (0): [<0000000000000000>] 0x0
> > >     ---[ end trace 0000000000000000 ]---
> > >     BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted
> > >     BTRFS info (device nvme1n2: state EA): forced readonly
> > >     BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2
> > >     BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834
> > >             item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53
> > >                     extent refs 1 gen 3835 flags 1
> > >                     ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1
> > > (snip)...
> > >             item 164 key (66948923392 169 0) itemoff 8229 itemsize 33
> > >                     extent refs 1 gen 3872 flags 2
> > >                     ref#0: tree block backref root 2
> > >             item 165 key (66948939776 169 1) itemoff 8196 itemsize 33
> > >                     extent refs 1 gen 3873 flags 2
> > >                     ref#0: tree block backref root 5
> > >             item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53
> > >                     extent refs 1 gen 3841 flags 1
> > >                     ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1
> > > (snip)...
> > >             item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53
> > >                     extent refs 1 gen 3842 flags 1
> > >                     ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1
> > >     BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166
> > >     BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2
> > >     BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry
> > >     BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction.
> > >     BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry
> > > 
> > > This happens maybe because clearing the contents is too early. It should
> > > clear the content after all the reference to the node is dropped.
> > 
> > If you have such suspicion you can add assertions to validate the state,
> > bits and other constraints.
> 
> It is difficult to assert the node-to-node reference because we need to
> query the tree. But, I'm going to add other assertion to check if the
> delayed ref is not faulty.
>

I added the following assert (currently, it's WARN to make my debug session
easy)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index beedd6ed64d3..eb0918df047e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3464,6 +3464,14 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 	if (root_id != BTRFS_TREE_LOG_OBJECTID) {
 		struct btrfs_ref generic_ref = { 0 };
 
+		/*
+		 * Assert that the extent buffer is not cleared due to
+		 * EXTENT_BUFFER_ZONED_ZEROOUT. Please refer
+		 * btrfs_clear_buffer_dirty() and btree_csum_one_bio() for
+		 * detail.
+		 */
+		WARN_ON_ONCE(btrfs_header_bytenr(buf) == 0);
+
 		btrfs_init_generic_ref(&generic_ref, BTRFS_DROP_DELAYED_REF,
 				       buf->start, buf->len, parent,
 				       btrfs_header_owner(buf));

> > 
> > > Addressing that root cause needs more time. Until that, leave the extent
> > > buffer header intact, so that we can add a proper delayed reference entry.
> > > 
> > > Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer")
> > > Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > >  fs/btrfs/disk-io.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > > index a2e45ed6ef14..8aaed8719394 100644
> > > --- a/fs/btrfs/disk-io.c
> > > +++ b/fs/btrfs/disk-io.c
> > > @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
> > >  	 * ordering of I/O without unnecessarily writing out data.
> > >  	 */
> > >  	if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) {
> > > -		memzero_extent_buffer(eb, 0, eb->len);
> > > +		const unsigned long header_size = sizeof(struct btrfs_header);
> > > +
> > > +		memzero_extent_buffer(eb, header_size, eb->len - header_size);
> > 
> > So this means anything that finds the buffer will have to rely on the
> > state in the header and if it's with ZONED_ZEROOUT then stop processing
> > it. btree_csum_one_bio() is the only function that checks the bit
> > AFAICS.
> 
> Yes, indeed it's cumbersome to check the flag for every btrfs_header_*
> usage. So, I found another solution and I'm testing it. It looks like
> adding EXTENT_BUFFER_ZONED_ZEROOUT even when an extent buffer is not DIRTY
> is causing the bug.

And, tested this solution like diff below. It survives generic/013 100
times without hitting the WARN above. However, generic/561 hit the WARN as
below. So, there is some other reason (than clearing non-dirty extent
buffer) causing the issue.

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c7ecfcac34fc..70f23f51fda7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4152,7 +4152,9 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 	 * The actual zeroout of the buffer will happen later in
 	 * btree_csum_one_bio.
 	 */
 	if (btrfs_is_zoned(fs_info)) {
+		if (!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags))
+			return;
 		set_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags);
 		return;
 	}


[ 9014.582049] ------------[ cut here ]------------
[ 9014.598804] WARNING: CPU: 10 PID: 1716009 at fs/btrfs/extent-tree.c:3473 btrfs_free_tree_block+0x26b/0x2e0 [btrfs]
[ 9014.621665] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass ipmi_ssif acpi_cpufreq rapl ipmi_si i2c_piix4 btrfs k10temp ipmi_devintf blake2b_generic ipmi_msghandler xor raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6
[ 9014.730793] CPU: 10 PID: 1716009 Comm: fsstress Not tainted 6.8.0-rc7-BTRFS-ZNS+ #408
[ 9014.751296] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
[ 9014.771359] RIP: 0010:btrfs_free_tree_block+0x26b/0x2e0 [btrfs]
[ 9014.789863] Code: 0b e9 07 ff ff ff 8b 4b 08 48 89 f2 48 89 ef 41 b8 01 00 00 00 4c 89 e6 e8 d2 80 ff ff 4c 89 e7 e8 7a bb 0c 00 e9 e3 fe ff ff <0f> 0b e9 1c fe ff ff 0f 0b 65 8b 05 71 63 b8 5f 89 c0 48 0f a3 05
[ 9014.833425] RSP: 0018:ffffc90023ea7a70 EFLAGS: 00010246
[ 9014.851277] RAX: ffff888c4a4fb000 RBX: ffff88925d774c08 RCX: 0000000000000000
[ 9014.871007] RDX: 0000000000000000 RSI: 00000003324b0000 RDI: ffffc90023ea7ab8
[ 9014.890724] RBP: ffff889cf09cad98 R08: 0000000000000001 R09: 0000000000000000
[ 9014.910381] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000002
[ 9014.929861] R13: ffffc90023ea7a70 R14: ffff889d6443c000 R15: 0000000000000001
[ 9014.949121] FS:  00007f3cb82e1740(0000) GS:ffff88a04e880000(0000) knlGS:0000000000000000
[ 9014.969242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9014.986837] CR2: 00007f8d3da345f0 CR3: 0000000111b0c000 CR4: 0000000000350ef0
[ 9015.005852] Call Trace:
[ 9015.020105]  <TASK>
[ 9015.033830]  ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs]
[ 9015.050743]  ? __warn+0x81/0x170
[ 9015.065306]  ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs]
[ 9015.082059]  ? report_bug+0x18d/0x1c0
[ 9015.096896]  ? handle_bug+0x3a/0x70
[ 9015.111278]  ? exc_invalid_op+0x13/0x60
[ 9015.125813]  ? asm_exc_invalid_op+0x16/0x20
[ 9015.140616]  ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs]
[ 9015.156645]  btrfs_del_leaf+0xc4/0xe0 [btrfs]
[ 9015.171612]  btrfs_del_items+0x467/0x520 [btrfs]
[ 9015.186613]  __btrfs_free_extent.isra.0+0x927/0x1370 [btrfs]
[ 9015.202561]  __btrfs_run_delayed_refs+0x40c/0x1280 [btrfs]
[ 9015.218280]  ? srso_untrain_ret+0x2/0x2
[ 9015.232261]  ? srso_return_thunk+0x5/0x5f
[ 9015.246289]  ? srso_return_thunk+0x5/0x5f
[ 9015.260043]  ? rcu_is_watching+0xd/0x40
[ 9015.273399]  ? srso_return_thunk+0x5/0x5f
[ 9015.286705]  ? lock_release+0x1e5/0x280
[ 9015.299739]  ? btrfs_start_dirty_block_groups+0x105/0x5a0 [btrfs]
[ 9015.315198]  btrfs_run_delayed_refs+0x33/0x120 [btrfs]
[ 9015.329620]  btrfs_start_dirty_block_groups+0x3aa/0x5a0 [btrfs]
[ 9015.344660]  ? srso_return_thunk+0x5/0x5f
[ 9015.357546]  ? btrfs_commit_transaction+0x3e/0x12c0 [btrfs]
[ 9015.372148]  btrfs_commit_transaction+0xec/0x12c0 [btrfs]
[ 9015.386520]  ? srso_return_thunk+0x5/0x5f
[ 9015.399364]  ? btrfs_attach_transaction_barrier+0x1e/0x60 [btrfs]
[ 9015.414306]  ? __pfx_sync_fs_one_sb+0x10/0x10
[ 9015.427232]  iterate_supers+0x7a/0xe0
[ 9015.439191]  ksys_sync+0x60/0xa0
[ 9015.450453]  __do_sys_sync+0xa/0x20
[ 9015.461714]  do_syscall_64+0x77/0x1a0
[ 9015.472886]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 9015.485376] RIP: 0033:0x7f3cb83e2487


> 
> > Otherwise, if this is a sufficient fix we'd need it for 6.8.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a2e45ed6ef14..8aaed8719394 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -278,7 +278,9 @@  blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
 	 * ordering of I/O without unnecessarily writing out data.
 	 */
 	if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) {
-		memzero_extent_buffer(eb, 0, eb->len);
+		const unsigned long header_size = sizeof(struct btrfs_header);
+
+		memzero_extent_buffer(eb, header_size, eb->len - header_size);
 		return BLK_STS_OK;
 	}