diff mbox series

btrfs: fix deadlock when aborting transaction during relocation with scrub

Message ID 2f4ff5c22d01e1d92d9458c4069531d911fa7418.1679478272.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix deadlock when aborting transaction during relocation with scrub | expand

Commit Message

Filipe Manana March 22, 2023, 9:46 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Before relocating a block group we pause scrub, then do the relocation and
then unpause scrub. The relocation process requires starting and committing
a transaction, and if we have a failure in the critical section of the
transaction commit path (transaction state >= TRANS_STATE_COMMIT_START),
we will deadlock if there is a paused scrub.

That results in stack traces like the following:

   [ 8242.479915] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6
   [ 8242.936437] BTRFS warning (device sdc): Skipping commit of aborted transaction.
   [ 8242.936441] ------------[ cut here ]------------
   [ 8242.936441] BTRFS: Transaction aborted (error -28)
   [ 8242.936490] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
   [ 8242.936544] Modules linked in: dm_flakey dm_mod loop btrfs (...)
   [ 8242.936582] CPU: 11 PID: 346822 Comm: btrfs Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8242.936584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
   [ 8242.936585] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
   [ 8242.936625] Code: ff ff 45 8b (...)
   [ 8242.936627] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282
   [ 8242.936628] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000
   [ 8242.936629] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff
   [ 8242.936630] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8
   [ 8242.936632] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00
   [ 8242.936632] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0
   [ 8242.936633] FS:  00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000
   [ 8242.936635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [ 8242.936636] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0
   [ 8242.936638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [ 8242.936639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [ 8242.936640] Call Trace:
   [ 8242.936642]  <TASK>
   [ 8242.936644]  ? start_transaction+0xcb/0x610 [btrfs]
   [ 8242.936684]  prepare_to_relocate+0x111/0x1a0 [btrfs]
   [ 8242.936734]  relocate_block_group+0x57/0x5d0 [btrfs]
   [ 8242.936781]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
   [ 8242.936827]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
   [ 8242.936873]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8242.936877]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
   [ 8242.936927]  btrfs_balance+0x8ff/0x11d0 [btrfs]
   [ 8242.936973]  ? __kmem_cache_alloc_node+0x14a/0x410
   [ 8242.936976]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
   [ 8242.937024]  ? mod_objcg_state+0xd2/0x360
   [ 8242.937028]  ? refill_obj_stock+0xb0/0x160
   [ 8242.937030]  ? seq_release+0x25/0x30
   [ 8242.937041]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
   [ 8242.937043]  ? percpu_counter_add_batch+0x2e/0xa0
   [ 8242.937048]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8242.937050]  __x64_sys_ioctl+0x88/0xc0
   [ 8242.937052]  do_syscall_64+0x38/0x90
   [ 8242.937056]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8242.937058] RIP: 0033:0x7f381a6ffe9b
   [ 8242.937060] Code: 00 48 89 44 24 (...)
   [ 8242.937062] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8242.937064] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
   [ 8242.937065] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
   [ 8242.937066] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
   [ 8242.937066] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
   [ 8242.937067] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
   [ 8242.937070]  </TASK>
   [ 8242.937071] ---[ end trace 0000000000000000 ]---
   [ 8242.937075] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left
   [ 8459.196386] INFO: task btrfs:346772 blocked for more than 120 seconds.
   [ 8459.196564]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.196706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.196845] task:btrfs           state:D stack:0     pid:346772 ppid:1      flags:0x00004002
   [ 8459.196861] Call Trace:
   [ 8459.196866]  <TASK>
   [ 8459.196876]  __schedule+0x392/0xa70
   [ 8459.196906]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
   [ 8459.196923]  schedule+0x5d/0xd0
   [ 8459.196937]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
   [ 8459.197226]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.197242]  scrub_pause_off+0x21/0x50 [btrfs]
   [ 8459.197507]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
   [ 8459.197769]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
   [ 8459.198065]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.198090]  scrub_stripe+0x20d/0x740 [btrfs]
   [ 8459.198360]  scrub_chunk+0xc4/0x130 [btrfs]
   [ 8459.198621]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
   [ 8459.198885]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.198901]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
   [ 8459.199162]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
   [ 8459.199420]  ? _copy_from_user+0x7b/0x80
   [ 8459.199435]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
   [ 8459.199693]  ? refill_stock+0x33/0x50
   [ 8459.199707]  ? should_failslab+0xa/0x20
   [ 8459.199721]  ? kmem_cache_alloc_node+0x151/0x460
   [ 8459.199731]  ? alloc_io_context+0x1b/0x80
   [ 8459.199744]  ? preempt_count_add+0x70/0xa0
   [ 8459.199759]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.199770]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.199783]  do_syscall_64+0x38/0x90
   [ 8459.199796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.199808] RIP: 0033:0x7f82ffaffe9b
   [ 8459.199817] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.199827] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b
   [ 8459.199833] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003
   [ 8459.199838] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
   [ 8459.199844] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640
   [ 8459.199849] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
   [ 8459.199863]  </TASK>
   [ 8459.199868] INFO: task btrfs:346773 blocked for more than 120 seconds.
   [ 8459.200013]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.200126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.201363] task:btrfs           state:D stack:0     pid:346773 ppid:1      flags:0x00004002
   [ 8459.201376] Call Trace:
   [ 8459.201380]  <TASK>
   [ 8459.201387]  __schedule+0x392/0xa70
   [ 8459.201399]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
   [ 8459.201415]  schedule+0x5d/0xd0
   [ 8459.201425]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
   [ 8459.201697]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.201712]  scrub_pause_off+0x21/0x50 [btrfs]
   [ 8459.202001]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
   [ 8459.202261]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
   [ 8459.202526]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.202550]  scrub_stripe+0x20d/0x740 [btrfs]
   [ 8459.202820]  scrub_chunk+0xc4/0x130 [btrfs]
   [ 8459.203081]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
   [ 8459.203345]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.203361]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
   [ 8459.203621]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
   [ 8459.203877]  ? _copy_from_user+0x7b/0x80
   [ 8459.203891]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
   [ 8459.204152]  ? should_failslab+0xa/0x20
   [ 8459.204166]  ? kmem_cache_alloc_node+0x151/0x460
   [ 8459.204231]  ? alloc_io_context+0x1b/0x80
   [ 8459.204245]  ? preempt_count_add+0x70/0xa0
   [ 8459.204259]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.204270]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.204283]  do_syscall_64+0x38/0x90
   [ 8459.204295]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.204305] RIP: 0033:0x7f82ffaffe9b
   [ 8459.204312] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.204321] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b
   [ 8459.204326] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003
   [ 8459.204331] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
   [ 8459.204336] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640
   [ 8459.204341] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
   [ 8459.204355]  </TASK>
   [ 8459.204359] INFO: task btrfs:346774 blocked for more than 120 seconds.
   [ 8459.205156]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.205882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.206615] task:btrfs           state:D stack:0     pid:346774 ppid:1      flags:0x00004002
   [ 8459.206627] Call Trace:
   [ 8459.206632]  <TASK>
   [ 8459.206639]  __schedule+0x392/0xa70
   [ 8459.206656]  schedule+0x5d/0xd0
   [ 8459.206666]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
   [ 8459.206922]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.206938]  scrub_pause_off+0x21/0x50 [btrfs]
   [ 8459.207207]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
   [ 8459.207475]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
   [ 8459.207727]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.207751]  scrub_stripe+0x20d/0x740 [btrfs]
   [ 8459.208008]  scrub_chunk+0xc4/0x130 [btrfs]
   [ 8459.208323]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
   [ 8459.208579]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
   [ 8459.208594]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
   [ 8459.208845]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
   [ 8459.209087]  ? _copy_from_user+0x7b/0x80
   [ 8459.209100]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
   [ 8459.209350]  ? should_failslab+0xa/0x20
   [ 8459.209363]  ? kmem_cache_alloc_node+0x151/0x460
   [ 8459.209373]  ? alloc_io_context+0x1b/0x80
   [ 8459.209385]  ? preempt_count_add+0x70/0xa0
   [ 8459.209399]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.209410]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.209423]  do_syscall_64+0x38/0x90
   [ 8459.209434]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.209444] RIP: 0033:0x7f82ffaffe9b
   [ 8459.209450] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.209459] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b
   [ 8459.209465] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003
   [ 8459.209470] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
   [ 8459.209474] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640
   [ 8459.209479] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
   [ 8459.209493]  </TASK>
   [ 8459.209497] INFO: task btrfs:346775 blocked for more than 120 seconds.
   [ 8459.210219]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.210933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.211670] task:btrfs           state:D stack:0     pid:346775 ppid:1      flags:0x00004002
   [ 8459.211682] Call Trace:
   [ 8459.211686]  <TASK>
   [ 8459.211692]  __schedule+0x392/0xa70
   [ 8459.211708]  schedule+0x5d/0xd0
   [ 8459.211717]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
   [ 8459.211973]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.211988]  scrub_pause_off+0x21/0x50 [btrfs]
   [ 8459.212303]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
   [ 8459.212555]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
   [ 8459.212813]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.212836]  scrub_stripe+0x20d/0x740 [btrfs]
   [ 8459.213097]  scrub_chunk+0xc4/0x130 [btrfs]
   [ 8459.213347]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
   [ 8459.213603]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
   [ 8459.213617]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
   [ 8459.213869]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
   [ 8459.214241]  ? _copy_from_user+0x7b/0x80
   [ 8459.214264]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
   [ 8459.214527]  ? should_failslab+0xa/0x20
   [ 8459.214540]  ? kmem_cache_alloc_node+0x151/0x460
   [ 8459.214550]  ? alloc_io_context+0x1b/0x80
   [ 8459.214562]  ? preempt_count_add+0x70/0xa0
   [ 8459.214577]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.214588]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.214600]  do_syscall_64+0x38/0x90
   [ 8459.214612]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.214622] RIP: 0033:0x7f82ffaffe9b
   [ 8459.214628] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.214637] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b
   [ 8459.214642] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003
   [ 8459.214647] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
   [ 8459.214652] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640
   [ 8459.214657] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
   [ 8459.214671]  </TASK>
   [ 8459.214675] INFO: task btrfs:346776 blocked for more than 120 seconds.
   [ 8459.215457]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.216311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.217124] task:btrfs           state:D stack:0     pid:346776 ppid:1      flags:0x00004002
   [ 8459.217136] Call Trace:
   [ 8459.217140]  <TASK>
   [ 8459.217147]  __schedule+0x392/0xa70
   [ 8459.217158]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
   [ 8459.217174]  schedule+0x5d/0xd0
   [ 8459.217183]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
   [ 8459.217438]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.217452]  scrub_pause_off+0x21/0x50 [btrfs]
   [ 8459.217699]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
   [ 8459.217946]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
   [ 8459.218231]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.218255]  scrub_stripe+0x20d/0x740 [btrfs]
   [ 8459.218514]  scrub_chunk+0xc4/0x130 [btrfs]
   [ 8459.218764]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
   [ 8459.219018]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.219033]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
   [ 8459.219284]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
   [ 8459.219527]  ? _copy_from_user+0x7b/0x80
   [ 8459.219540]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
   [ 8459.219790]  ? should_failslab+0xa/0x20
   [ 8459.219802]  ? kmem_cache_alloc_node+0x151/0x460
   [ 8459.219812]  ? alloc_io_context+0x1b/0x80
   [ 8459.219824]  ? preempt_count_add+0x70/0xa0
   [ 8459.219838]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.219849]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.219862]  do_syscall_64+0x38/0x90
   [ 8459.219873]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.219883] RIP: 0033:0x7f82ffaffe9b
   [ 8459.219889] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.219897] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b
   [ 8459.219902] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003
   [ 8459.219907] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
   [ 8459.219912] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640
   [ 8459.219917] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
   [ 8459.219931]  </TASK>
   [ 8459.219937] INFO: task btrfs:346822 blocked for more than 120 seconds.
   [ 8459.220815]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
   [ 8459.221633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   [ 8459.222504] task:btrfs           state:D stack:0     pid:346822 ppid:1      flags:0x00004002
   [ 8459.222516] Call Trace:
   [ 8459.222519]  <TASK>
   [ 8459.222526]  __schedule+0x392/0xa70
   [ 8459.222542]  schedule+0x5d/0xd0
   [ 8459.222552]  btrfs_scrub_cancel+0x91/0x100 [btrfs]
   [ 8459.222804]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.222819]  btrfs_commit_transaction+0x572/0xeb0 [btrfs]
   [ 8459.223026]  ? start_transaction+0xcb/0x610 [btrfs]
   [ 8459.223234]  prepare_to_relocate+0x111/0x1a0 [btrfs]
   [ 8459.223482]  relocate_block_group+0x57/0x5d0 [btrfs]
   [ 8459.223730]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
   [ 8459.223965]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
   [ 8459.224277]  ? __pfx_autoremove_wake_function+0x10/0x10
   [ 8459.224292]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
   [ 8459.224539]  btrfs_balance+0x8ff/0x11d0 [btrfs]
   [ 8459.224787]  ? __kmem_cache_alloc_node+0x14a/0x410
   [ 8459.224804]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
   [ 8459.225049]  ? mod_objcg_state+0xd2/0x360
   [ 8459.225062]  ? refill_obj_stock+0xb0/0x160
   [ 8459.225072]  ? seq_release+0x25/0x30
   [ 8459.225083]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
   [ 8459.225093]  ? percpu_counter_add_batch+0x2e/0xa0
   [ 8459.225111]  ? __x64_sys_ioctl+0x88/0xc0
   [ 8459.225122]  __x64_sys_ioctl+0x88/0xc0
   [ 8459.225134]  do_syscall_64+0x38/0x90
   [ 8459.225146]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
   [ 8459.225155] RIP: 0033:0x7f381a6ffe9b
   [ 8459.225161] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   [ 8459.225169] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
   [ 8459.225175] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
   [ 8459.225179] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
   [ 8459.225184] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
   [ 8459.225189] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
   [ 8459.225203]  </TASK>

What happens is the following:

1) A scrub is running, so fs_info->scrubs_running is 1;

2) Task A starts block group relocation, and at btrfs_relocate_chunk() it
   pauses scrub by calling btrfs_scrub_pause(). That increments
   fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to
   pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);

3) The scrub task pauses at scrub_pause_off(), waiting for
   fs_info->scrub_pause_req to decrease to 0;

4) Task A then enters btrfs_relocate_block_group(), and down that call
   chain we start a transaction and then attempt to commit it;

5) When task A calls btrfs_commit_transaction(), it either will do the
   commit itself or wait for some other task that already started the
   commit of the transaction - it doesn't matter which case;

6) The transaction commit enters state TRANS_STATE_COMMIT_START;

7) An error happens during the transaction commit, like -ENOSPC when
   running delayed refs or delayed items for example;

8) This results in calling transaction.c:cleanup_transaction(), where
   we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req
   from 0 to 1, and blocking this task waiting for fs_info->scrubs_running
   to decrease to 0;

9) From this point on, both the transaction commit and the scrub task
   hang forever:

   1) The transaction commit is waiting for fs_info->scrubs_running to
      be decreased to 0;

   2) The scrub task is at scrub_pause_off() waiting for
      fs_info->scrub_pause_req to decrease to 0 - so it can not proceed
      to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.

   Therefore resulting in a deadlock.

Fix this by having cleanup_transaction(), called if a transaction commit
fails, not call btrfs_scrub_cancel() if relocation is in progress, and
having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if
the relocation failed and a transaction abort happened.

This was triggered with btrfs/061 from fstests.

Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/transaction.c | 15 ++++++++++++++-
 fs/btrfs/volumes.c     |  9 ++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

Comments

Josef Bacik March 22, 2023, 2:32 p.m. UTC | #1
On Wed, Mar 22, 2023 at 09:46:34AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Before relocating a block group we pause scrub, then do the relocation and
> then unpause scrub. The relocation process requires starting and committing
> a transaction, and if we have a failure in the critical section of the
> transaction commit path (transaction state >= TRANS_STATE_COMMIT_START),
> we will deadlock if there is a paused scrub.
> 
> That results in stack traces like the following:
> 
>    [ 8242.479915] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6
>    [ 8242.936437] BTRFS warning (device sdc): Skipping commit of aborted transaction.
>    [ 8242.936441] ------------[ cut here ]------------
>    [ 8242.936441] BTRFS: Transaction aborted (error -28)
>    [ 8242.936490] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
>    [ 8242.936544] Modules linked in: dm_flakey dm_mod loop btrfs (...)
>    [ 8242.936582] CPU: 11 PID: 346822 Comm: btrfs Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8242.936584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>    [ 8242.936585] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
>    [ 8242.936625] Code: ff ff 45 8b (...)
>    [ 8242.936627] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282
>    [ 8242.936628] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000
>    [ 8242.936629] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff
>    [ 8242.936630] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8
>    [ 8242.936632] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00
>    [ 8242.936632] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0
>    [ 8242.936633] FS:  00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000
>    [ 8242.936635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    [ 8242.936636] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0
>    [ 8242.936638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    [ 8242.936639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    [ 8242.936640] Call Trace:
>    [ 8242.936642]  <TASK>
>    [ 8242.936644]  ? start_transaction+0xcb/0x610 [btrfs]
>    [ 8242.936684]  prepare_to_relocate+0x111/0x1a0 [btrfs]
>    [ 8242.936734]  relocate_block_group+0x57/0x5d0 [btrfs]
>    [ 8242.936781]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
>    [ 8242.936827]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
>    [ 8242.936873]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8242.936877]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
>    [ 8242.936927]  btrfs_balance+0x8ff/0x11d0 [btrfs]
>    [ 8242.936973]  ? __kmem_cache_alloc_node+0x14a/0x410
>    [ 8242.936976]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
>    [ 8242.937024]  ? mod_objcg_state+0xd2/0x360
>    [ 8242.937028]  ? refill_obj_stock+0xb0/0x160
>    [ 8242.937030]  ? seq_release+0x25/0x30
>    [ 8242.937041]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
>    [ 8242.937043]  ? percpu_counter_add_batch+0x2e/0xa0
>    [ 8242.937048]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8242.937050]  __x64_sys_ioctl+0x88/0xc0
>    [ 8242.937052]  do_syscall_64+0x38/0x90
>    [ 8242.937056]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8242.937058] RIP: 0033:0x7f381a6ffe9b
>    [ 8242.937060] Code: 00 48 89 44 24 (...)
>    [ 8242.937062] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8242.937064] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
>    [ 8242.937065] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
>    [ 8242.937066] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
>    [ 8242.937066] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
>    [ 8242.937067] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
>    [ 8242.937070]  </TASK>
>    [ 8242.937071] ---[ end trace 0000000000000000 ]---
>    [ 8242.937075] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left
>    [ 8459.196386] INFO: task btrfs:346772 blocked for more than 120 seconds.
>    [ 8459.196564]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.196706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.196845] task:btrfs           state:D stack:0     pid:346772 ppid:1      flags:0x00004002
>    [ 8459.196861] Call Trace:
>    [ 8459.196866]  <TASK>
>    [ 8459.196876]  __schedule+0x392/0xa70
>    [ 8459.196906]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.196923]  schedule+0x5d/0xd0
>    [ 8459.196937]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.197226]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.197242]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.197507]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.197769]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.198065]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.198090]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.198360]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.198621]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.198885]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.198901]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.199162]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.199420]  ? _copy_from_user+0x7b/0x80
>    [ 8459.199435]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.199693]  ? refill_stock+0x33/0x50
>    [ 8459.199707]  ? should_failslab+0xa/0x20
>    [ 8459.199721]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.199731]  ? alloc_io_context+0x1b/0x80
>    [ 8459.199744]  ? preempt_count_add+0x70/0xa0
>    [ 8459.199759]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.199770]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.199783]  do_syscall_64+0x38/0x90
>    [ 8459.199796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.199808] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.199817] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.199827] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b
>    [ 8459.199833] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.199838] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.199844] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640
>    [ 8459.199849] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.199863]  </TASK>
>    [ 8459.199868] INFO: task btrfs:346773 blocked for more than 120 seconds.
>    [ 8459.200013]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.200126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.201363] task:btrfs           state:D stack:0     pid:346773 ppid:1      flags:0x00004002
>    [ 8459.201376] Call Trace:
>    [ 8459.201380]  <TASK>
>    [ 8459.201387]  __schedule+0x392/0xa70
>    [ 8459.201399]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.201415]  schedule+0x5d/0xd0
>    [ 8459.201425]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.201697]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.201712]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.202001]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.202261]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.202526]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.202550]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.202820]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.203081]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.203345]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.203361]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.203621]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.203877]  ? _copy_from_user+0x7b/0x80
>    [ 8459.203891]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.204152]  ? should_failslab+0xa/0x20
>    [ 8459.204166]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.204231]  ? alloc_io_context+0x1b/0x80
>    [ 8459.204245]  ? preempt_count_add+0x70/0xa0
>    [ 8459.204259]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.204270]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.204283]  do_syscall_64+0x38/0x90
>    [ 8459.204295]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.204305] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.204312] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.204321] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b
>    [ 8459.204326] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.204331] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.204336] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640
>    [ 8459.204341] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.204355]  </TASK>
>    [ 8459.204359] INFO: task btrfs:346774 blocked for more than 120 seconds.
>    [ 8459.205156]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.205882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.206615] task:btrfs           state:D stack:0     pid:346774 ppid:1      flags:0x00004002
>    [ 8459.206627] Call Trace:
>    [ 8459.206632]  <TASK>
>    [ 8459.206639]  __schedule+0x392/0xa70
>    [ 8459.206656]  schedule+0x5d/0xd0
>    [ 8459.206666]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.206922]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.206938]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.207207]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.207475]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.207727]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.207751]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.208008]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.208323]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.208579]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
>    [ 8459.208594]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.208845]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.209087]  ? _copy_from_user+0x7b/0x80
>    [ 8459.209100]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.209350]  ? should_failslab+0xa/0x20
>    [ 8459.209363]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.209373]  ? alloc_io_context+0x1b/0x80
>    [ 8459.209385]  ? preempt_count_add+0x70/0xa0
>    [ 8459.209399]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.209410]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.209423]  do_syscall_64+0x38/0x90
>    [ 8459.209434]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.209444] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.209450] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.209459] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b
>    [ 8459.209465] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.209470] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.209474] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640
>    [ 8459.209479] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.209493]  </TASK>
>    [ 8459.209497] INFO: task btrfs:346775 blocked for more than 120 seconds.
>    [ 8459.210219]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.210933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.211670] task:btrfs           state:D stack:0     pid:346775 ppid:1      flags:0x00004002
>    [ 8459.211682] Call Trace:
>    [ 8459.211686]  <TASK>
>    [ 8459.211692]  __schedule+0x392/0xa70
>    [ 8459.211708]  schedule+0x5d/0xd0
>    [ 8459.211717]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.211973]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.211988]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.212303]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.212555]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.212813]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.212836]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.213097]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.213347]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.213603]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
>    [ 8459.213617]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.213869]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.214241]  ? _copy_from_user+0x7b/0x80
>    [ 8459.214264]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.214527]  ? should_failslab+0xa/0x20
>    [ 8459.214540]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.214550]  ? alloc_io_context+0x1b/0x80
>    [ 8459.214562]  ? preempt_count_add+0x70/0xa0
>    [ 8459.214577]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.214588]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.214600]  do_syscall_64+0x38/0x90
>    [ 8459.214612]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.214622] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.214628] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.214637] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b
>    [ 8459.214642] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.214647] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.214652] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640
>    [ 8459.214657] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.214671]  </TASK>
>    [ 8459.214675] INFO: task btrfs:346776 blocked for more than 120 seconds.
>    [ 8459.215457]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.216311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.217124] task:btrfs           state:D stack:0     pid:346776 ppid:1      flags:0x00004002
>    [ 8459.217136] Call Trace:
>    [ 8459.217140]  <TASK>
>    [ 8459.217147]  __schedule+0x392/0xa70
>    [ 8459.217158]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.217174]  schedule+0x5d/0xd0
>    [ 8459.217183]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.217438]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.217452]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.217699]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.217946]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.218231]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.218255]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.218514]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.218764]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.219018]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.219033]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.219284]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.219527]  ? _copy_from_user+0x7b/0x80
>    [ 8459.219540]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.219790]  ? should_failslab+0xa/0x20
>    [ 8459.219802]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.219812]  ? alloc_io_context+0x1b/0x80
>    [ 8459.219824]  ? preempt_count_add+0x70/0xa0
>    [ 8459.219838]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.219849]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.219862]  do_syscall_64+0x38/0x90
>    [ 8459.219873]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.219883] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.219889] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.219897] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b
>    [ 8459.219902] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.219907] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.219912] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640
>    [ 8459.219917] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.219931]  </TASK>
>    [ 8459.219937] INFO: task btrfs:346822 blocked for more than 120 seconds.
>    [ 8459.220815]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.221633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.222504] task:btrfs           state:D stack:0     pid:346822 ppid:1      flags:0x00004002
>    [ 8459.222516] Call Trace:
>    [ 8459.222519]  <TASK>
>    [ 8459.222526]  __schedule+0x392/0xa70
>    [ 8459.222542]  schedule+0x5d/0xd0
>    [ 8459.222552]  btrfs_scrub_cancel+0x91/0x100 [btrfs]
>    [ 8459.222804]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.222819]  btrfs_commit_transaction+0x572/0xeb0 [btrfs]
>    [ 8459.223026]  ? start_transaction+0xcb/0x610 [btrfs]
>    [ 8459.223234]  prepare_to_relocate+0x111/0x1a0 [btrfs]
>    [ 8459.223482]  relocate_block_group+0x57/0x5d0 [btrfs]
>    [ 8459.223730]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
>    [ 8459.223965]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
>    [ 8459.224277]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.224292]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
>    [ 8459.224539]  btrfs_balance+0x8ff/0x11d0 [btrfs]
>    [ 8459.224787]  ? __kmem_cache_alloc_node+0x14a/0x410
>    [ 8459.224804]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
>    [ 8459.225049]  ? mod_objcg_state+0xd2/0x360
>    [ 8459.225062]  ? refill_obj_stock+0xb0/0x160
>    [ 8459.225072]  ? seq_release+0x25/0x30
>    [ 8459.225083]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
>    [ 8459.225093]  ? percpu_counter_add_batch+0x2e/0xa0
>    [ 8459.225111]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.225122]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.225134]  do_syscall_64+0x38/0x90
>    [ 8459.225146]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.225155] RIP: 0033:0x7f381a6ffe9b
>    [ 8459.225161] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.225169] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
>    [ 8459.225175] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
>    [ 8459.225179] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
>    [ 8459.225184] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
>    [ 8459.225189] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
>    [ 8459.225203]  </TASK>
> 
> What happens is the following:
> 
> 1) A scrub is running, so fs_info->scrubs_running is 1;
> 
> 2) Task A starts block group relocation, and at btrfs_relocate_chunk() it
>    pauses scrub by calling btrfs_scrub_pause(). That increments
>    fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to
>    pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);
> 
> 3) The scrub task pauses at scrub_pause_off(), waiting for
>    fs_info->scrub_pause_req to decrease to 0;
> 
> 4) Task A then enters btrfs_relocate_block_group(), and down that call
>    chain we start a transaction and then attempt to commit it;
> 
> 5) When task A calls btrfs_commit_transaction(), it either will do the
>    commit itself or wait for some other task that already started the
>    commit of the transaction - it doesn't matter which case;
> 
> 6) The transaction commit enters state TRANS_STATE_COMMIT_START;
> 
> 7) An error happens during the transaction commit, like -ENOSPC when
>    running delayed refs or delayed items for example;
> 
> 8) This results in calling transaction.c:cleanup_transaction(), where
>    we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req
>    from 0 to 1, and blocking this task waiting for fs_info->scrubs_running
>    to decrease to 0;
> 
> 9) From this point on, both the transaction commit and the scrub task
>    hang forever:
> 
>    1) The transaction commit is waiting for fs_info->scrubs_running to
>       be decreased to 0;
> 
>    2) The scrub task is at scrub_pause_off() waiting for
>       fs_info->scrub_pause_req to decrease to 0 - so it can not proceed
>       to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.
> 
>    Therefore resulting in a deadlock.
> 
> Fix this by having cleanup_transaction(), called if a transaction commit
> fails, not call btrfs_scrub_cancel() if relocation is in progress, and
> having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if
> the relocation failed and a transaction abort happened.
> 
> This was triggered with btrfs/061 from fstests.
> 
> Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
David Sterba March 27, 2023, 4:56 p.m. UTC | #2
On Wed, Mar 22, 2023 at 09:46:34AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Before relocating a block group we pause scrub, then do the relocation and
> then unpause scrub. The relocation process requires starting and committing
> a transaction, and if we have a failure in the critical section of the
> transaction commit path (transaction state >= TRANS_STATE_COMMIT_START),
> we will deadlock if there is a paused scrub.
> 
> That results in stack traces like the following:
> 
>    [ 8242.479915] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6
>    [ 8242.936437] BTRFS warning (device sdc): Skipping commit of aborted transaction.
>    [ 8242.936441] ------------[ cut here ]------------
>    [ 8242.936441] BTRFS: Transaction aborted (error -28)
>    [ 8242.936490] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
>    [ 8242.936544] Modules linked in: dm_flakey dm_mod loop btrfs (...)
>    [ 8242.936582] CPU: 11 PID: 346822 Comm: btrfs Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8242.936584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>    [ 8242.936585] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
>    [ 8242.936625] Code: ff ff 45 8b (...)
>    [ 8242.936627] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282
>    [ 8242.936628] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000
>    [ 8242.936629] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff
>    [ 8242.936630] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8
>    [ 8242.936632] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00
>    [ 8242.936632] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0
>    [ 8242.936633] FS:  00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000
>    [ 8242.936635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    [ 8242.936636] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0
>    [ 8242.936638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    [ 8242.936639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    [ 8242.936640] Call Trace:
>    [ 8242.936642]  <TASK>
>    [ 8242.936644]  ? start_transaction+0xcb/0x610 [btrfs]
>    [ 8242.936684]  prepare_to_relocate+0x111/0x1a0 [btrfs]
>    [ 8242.936734]  relocate_block_group+0x57/0x5d0 [btrfs]
>    [ 8242.936781]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
>    [ 8242.936827]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
>    [ 8242.936873]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8242.936877]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
>    [ 8242.936927]  btrfs_balance+0x8ff/0x11d0 [btrfs]
>    [ 8242.936973]  ? __kmem_cache_alloc_node+0x14a/0x410
>    [ 8242.936976]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
>    [ 8242.937024]  ? mod_objcg_state+0xd2/0x360
>    [ 8242.937028]  ? refill_obj_stock+0xb0/0x160
>    [ 8242.937030]  ? seq_release+0x25/0x30
>    [ 8242.937041]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
>    [ 8242.937043]  ? percpu_counter_add_batch+0x2e/0xa0
>    [ 8242.937048]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8242.937050]  __x64_sys_ioctl+0x88/0xc0
>    [ 8242.937052]  do_syscall_64+0x38/0x90
>    [ 8242.937056]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8242.937058] RIP: 0033:0x7f381a6ffe9b
>    [ 8242.937060] Code: 00 48 89 44 24 (...)
>    [ 8242.937062] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8242.937064] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
>    [ 8242.937065] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
>    [ 8242.937066] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
>    [ 8242.937066] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
>    [ 8242.937067] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
>    [ 8242.937070]  </TASK>
>    [ 8242.937071] ---[ end trace 0000000000000000 ]---
>    [ 8242.937075] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left
>    [ 8459.196386] INFO: task btrfs:346772 blocked for more than 120 seconds.
>    [ 8459.196564]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.196706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.196845] task:btrfs           state:D stack:0     pid:346772 ppid:1      flags:0x00004002
>    [ 8459.196861] Call Trace:
>    [ 8459.196866]  <TASK>
>    [ 8459.196876]  __schedule+0x392/0xa70
>    [ 8459.196906]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.196923]  schedule+0x5d/0xd0
>    [ 8459.196937]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.197226]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.197242]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.197507]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.197769]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.198065]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.198090]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.198360]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.198621]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.198885]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.198901]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.199162]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.199420]  ? _copy_from_user+0x7b/0x80
>    [ 8459.199435]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.199693]  ? refill_stock+0x33/0x50
>    [ 8459.199707]  ? should_failslab+0xa/0x20
>    [ 8459.199721]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.199731]  ? alloc_io_context+0x1b/0x80
>    [ 8459.199744]  ? preempt_count_add+0x70/0xa0
>    [ 8459.199759]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.199770]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.199783]  do_syscall_64+0x38/0x90
>    [ 8459.199796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.199808] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.199817] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.199827] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b
>    [ 8459.199833] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.199838] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.199844] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640
>    [ 8459.199849] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.199863]  </TASK>
>    [ 8459.199868] INFO: task btrfs:346773 blocked for more than 120 seconds.
>    [ 8459.200013]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.200126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.201363] task:btrfs           state:D stack:0     pid:346773 ppid:1      flags:0x00004002
>    [ 8459.201376] Call Trace:
>    [ 8459.201380]  <TASK>
>    [ 8459.201387]  __schedule+0x392/0xa70
>    [ 8459.201399]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.201415]  schedule+0x5d/0xd0
>    [ 8459.201425]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.201697]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.201712]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.202001]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.202261]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.202526]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.202550]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.202820]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.203081]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.203345]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.203361]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.203621]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.203877]  ? _copy_from_user+0x7b/0x80
>    [ 8459.203891]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.204152]  ? should_failslab+0xa/0x20
>    [ 8459.204166]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.204231]  ? alloc_io_context+0x1b/0x80
>    [ 8459.204245]  ? preempt_count_add+0x70/0xa0
>    [ 8459.204259]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.204270]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.204283]  do_syscall_64+0x38/0x90
>    [ 8459.204295]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.204305] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.204312] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.204321] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b
>    [ 8459.204326] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.204331] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.204336] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640
>    [ 8459.204341] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.204355]  </TASK>
>    [ 8459.204359] INFO: task btrfs:346774 blocked for more than 120 seconds.
>    [ 8459.205156]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.205882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.206615] task:btrfs           state:D stack:0     pid:346774 ppid:1      flags:0x00004002
>    [ 8459.206627] Call Trace:
>    [ 8459.206632]  <TASK>
>    [ 8459.206639]  __schedule+0x392/0xa70
>    [ 8459.206656]  schedule+0x5d/0xd0
>    [ 8459.206666]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.206922]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.206938]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.207207]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.207475]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.207727]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.207751]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.208008]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.208323]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.208579]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
>    [ 8459.208594]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.208845]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.209087]  ? _copy_from_user+0x7b/0x80
>    [ 8459.209100]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.209350]  ? should_failslab+0xa/0x20
>    [ 8459.209363]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.209373]  ? alloc_io_context+0x1b/0x80
>    [ 8459.209385]  ? preempt_count_add+0x70/0xa0
>    [ 8459.209399]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.209410]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.209423]  do_syscall_64+0x38/0x90
>    [ 8459.209434]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.209444] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.209450] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.209459] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b
>    [ 8459.209465] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.209470] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.209474] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640
>    [ 8459.209479] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.209493]  </TASK>
>    [ 8459.209497] INFO: task btrfs:346775 blocked for more than 120 seconds.
>    [ 8459.210219]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.210933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.211670] task:btrfs           state:D stack:0     pid:346775 ppid:1      flags:0x00004002
>    [ 8459.211682] Call Trace:
>    [ 8459.211686]  <TASK>
>    [ 8459.211692]  __schedule+0x392/0xa70
>    [ 8459.211708]  schedule+0x5d/0xd0
>    [ 8459.211717]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.211973]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.211988]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.212303]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.212555]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.212813]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.212836]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.213097]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.213347]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.213603]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
>    [ 8459.213617]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.213869]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.214241]  ? _copy_from_user+0x7b/0x80
>    [ 8459.214264]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.214527]  ? should_failslab+0xa/0x20
>    [ 8459.214540]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.214550]  ? alloc_io_context+0x1b/0x80
>    [ 8459.214562]  ? preempt_count_add+0x70/0xa0
>    [ 8459.214577]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.214588]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.214600]  do_syscall_64+0x38/0x90
>    [ 8459.214612]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.214622] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.214628] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.214637] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b
>    [ 8459.214642] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.214647] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.214652] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640
>    [ 8459.214657] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.214671]  </TASK>
>    [ 8459.214675] INFO: task btrfs:346776 blocked for more than 120 seconds.
>    [ 8459.215457]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.216311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.217124] task:btrfs           state:D stack:0     pid:346776 ppid:1      flags:0x00004002
>    [ 8459.217136] Call Trace:
>    [ 8459.217140]  <TASK>
>    [ 8459.217147]  __schedule+0x392/0xa70
>    [ 8459.217158]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
>    [ 8459.217174]  schedule+0x5d/0xd0
>    [ 8459.217183]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
>    [ 8459.217438]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.217452]  scrub_pause_off+0x21/0x50 [btrfs]
>    [ 8459.217699]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
>    [ 8459.217946]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
>    [ 8459.218231]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.218255]  scrub_stripe+0x20d/0x740 [btrfs]
>    [ 8459.218514]  scrub_chunk+0xc4/0x130 [btrfs]
>    [ 8459.218764]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
>    [ 8459.219018]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.219033]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
>    [ 8459.219284]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
>    [ 8459.219527]  ? _copy_from_user+0x7b/0x80
>    [ 8459.219540]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
>    [ 8459.219790]  ? should_failslab+0xa/0x20
>    [ 8459.219802]  ? kmem_cache_alloc_node+0x151/0x460
>    [ 8459.219812]  ? alloc_io_context+0x1b/0x80
>    [ 8459.219824]  ? preempt_count_add+0x70/0xa0
>    [ 8459.219838]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.219849]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.219862]  do_syscall_64+0x38/0x90
>    [ 8459.219873]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.219883] RIP: 0033:0x7f82ffaffe9b
>    [ 8459.219889] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.219897] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b
>    [ 8459.219902] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003
>    [ 8459.219907] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
>    [ 8459.219912] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640
>    [ 8459.219917] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
>    [ 8459.219931]  </TASK>
>    [ 8459.219937] INFO: task btrfs:346822 blocked for more than 120 seconds.
>    [ 8459.220815]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
>    [ 8459.221633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [ 8459.222504] task:btrfs           state:D stack:0     pid:346822 ppid:1      flags:0x00004002
>    [ 8459.222516] Call Trace:
>    [ 8459.222519]  <TASK>
>    [ 8459.222526]  __schedule+0x392/0xa70
>    [ 8459.222542]  schedule+0x5d/0xd0
>    [ 8459.222552]  btrfs_scrub_cancel+0x91/0x100 [btrfs]
>    [ 8459.222804]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.222819]  btrfs_commit_transaction+0x572/0xeb0 [btrfs]
>    [ 8459.223026]  ? start_transaction+0xcb/0x610 [btrfs]
>    [ 8459.223234]  prepare_to_relocate+0x111/0x1a0 [btrfs]
>    [ 8459.223482]  relocate_block_group+0x57/0x5d0 [btrfs]
>    [ 8459.223730]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
>    [ 8459.223965]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
>    [ 8459.224277]  ? __pfx_autoremove_wake_function+0x10/0x10
>    [ 8459.224292]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
>    [ 8459.224539]  btrfs_balance+0x8ff/0x11d0 [btrfs]
>    [ 8459.224787]  ? __kmem_cache_alloc_node+0x14a/0x410
>    [ 8459.224804]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
>    [ 8459.225049]  ? mod_objcg_state+0xd2/0x360
>    [ 8459.225062]  ? refill_obj_stock+0xb0/0x160
>    [ 8459.225072]  ? seq_release+0x25/0x30
>    [ 8459.225083]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
>    [ 8459.225093]  ? percpu_counter_add_batch+0x2e/0xa0
>    [ 8459.225111]  ? __x64_sys_ioctl+0x88/0xc0
>    [ 8459.225122]  __x64_sys_ioctl+0x88/0xc0
>    [ 8459.225134]  do_syscall_64+0x38/0x90
>    [ 8459.225146]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    [ 8459.225155] RIP: 0033:0x7f381a6ffe9b
>    [ 8459.225161] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    [ 8459.225169] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
>    [ 8459.225175] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
>    [ 8459.225179] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
>    [ 8459.225184] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
>    [ 8459.225189] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
>    [ 8459.225203]  </TASK>
> 
> What happens is the following:
> 
> 1) A scrub is running, so fs_info->scrubs_running is 1;
> 
> 2) Task A starts block group relocation, and at btrfs_relocate_chunk() it
>    pauses scrub by calling btrfs_scrub_pause(). That increments
>    fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to
>    pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);
> 
> 3) The scrub task pauses at scrub_pause_off(), waiting for
>    fs_info->scrub_pause_req to decrease to 0;
> 
> 4) Task A then enters btrfs_relocate_block_group(), and down that call
>    chain we start a transaction and then attempt to commit it;
> 
> 5) When task A calls btrfs_commit_transaction(), it either will do the
>    commit itself or wait for some other task that already started the
>    commit of the transaction - it doesn't matter which case;
> 
> 6) The transaction commit enters state TRANS_STATE_COMMIT_START;
> 
> 7) An error happens during the transaction commit, like -ENOSPC when
>    running delayed refs or delayed items for example;
> 
> 8) This results in calling transaction.c:cleanup_transaction(), where
>    we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req
>    from 0 to 1, and blocking this task waiting for fs_info->scrubs_running
>    to decrease to 0;
> 
> 9) From this point on, both the transaction commit and the scrub task
>    hang forever:
> 
>    1) The transaction commit is waiting for fs_info->scrubs_running to
>       be decreased to 0;
> 
>    2) The scrub task is at scrub_pause_off() waiting for
>       fs_info->scrub_pause_req to decrease to 0 - so it can not proceed
>       to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.
> 
>    Therefore resulting in a deadlock.
> 
> Fix this by having cleanup_transaction(), called if a transaction commit
> fails, not call btrfs_scrub_cancel() if relocation is in progress, and
> having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if
> the relocation failed and a transaction abort happened.
> 
> This was triggered with btrfs/061 from fstests.
> 
> Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c47b6838754e..c8e503e5db4c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2028,7 +2028,20 @@  static void cleanup_transaction(struct btrfs_trans_handle *trans, int err)
 
 	if (current->journal_info == trans)
 		current->journal_info = NULL;
-	btrfs_scrub_cancel(fs_info);
+
+	/*
+	 * If relocation is running, we can't cancel scrub because that will
+	 * result in a deadlock. Before relocating a block group, relocation
+	 * pauses scrub, then starts and commits a transaction before unpausing
+	 * scrub. If the transaction commit is being done by the relocation
+	 * task or triggered by another task and the relocation task is waiting
+	 * for the commit, and we end up here due to an error in the commit
+	 * path, then calling btrfs_scrub_cancel() will deadlock, as we are
+	 * asking for scrub to stop while having it asked to be paused higher
+	 * above in relocation code.
+	 */
+	if (!test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags))
+		btrfs_scrub_cancel(fs_info);
 
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5da6f5167046..ab33d19feb27 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3266,8 +3266,15 @@  int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	btrfs_scrub_pause(fs_info);
 	ret = btrfs_relocate_block_group(fs_info, chunk_offset);
 	btrfs_scrub_continue(fs_info);
-	if (ret)
+	if (ret) {
+		/*
+		 * If we had a transaction abort, stop all running scrubs.
+		 * See transaction.c:cleanup_transaction() why we do it here.
+		 */
+		if (BTRFS_FS_ERROR(fs_info))
+			btrfs_scrub_cancel(fs_info);
 		return ret;
+	}
 
 	block_group = btrfs_lookup_block_group(fs_info, chunk_offset);
 	if (!block_group)