Message ID | 1645095472-26530-1-git-send-email-byungchul.park@lge.com (mailing list archive) |
---|---|
Headers | show |
Series | DEPT(Dependency Tracker) | expand |
[ 7.009608] =================================================== [ 7.009613] DEPT: Circular dependency has been detected. [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W [ 7.009616] --------------------------------------------------- [ 7.009617] summary [ 7.009618] --------------------------------------------------- [ 7.009618] *** DEADLOCK *** [ 7.009618] [ 7.009619] context A [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) [ 7.009621] [W] down_write(&ei->i_data_sem:0) [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) [ 7.009624] [ 7.009625] context B [ 7.009625] [S] down_read(&ei->i_data_sem:0) [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) [ 7.009627] [E] up_read(&ei->i_data_sem:0) [ 7.009628] [ 7.009629] [S]: start of the event context [ 7.009629] [W]: the wait blocked [ 7.009630] [E]: the event not reachable [ 7.009631] --------------------------------------------------- [ 7.009631] context A's detail [ 7.009632] --------------------------------------------------- [ 7.009632] context A [ 7.009633] [S] (unknown)(&(bit_wait_table + i)->dmap:0) [ 7.009634] [W] down_write(&ei->i_data_sem:0) [ 7.009635] [E] event(&(bit_wait_table + i)->dmap:0) [ 7.009636] [ 7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0): [ 7.009638] (N/A) [ 7.009638] [ 7.009639] [W] down_write(&ei->i_data_sem:0): [ 7.009639] ext4_truncate (fs/ext4/inode.c:4187) [ 7.009645] stacktrace: [ 7.009646] down_write (kernel/locking/rwsem.c:1514) [ 7.009648] ext4_truncate (fs/ext4/inode.c:4187) [ 7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) [ 7.009652] generic_perform_write (mm/filemap.c:3784) [ 7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 7.009657] ext4_file_write_iter (fs/ext4/file.c:677) [ 7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 7.009662] vfs_write (fs/read_write.c:590) [ 7.009663] ksys_write (fs/read_write.c:644) [ 7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 7.009669] [ 7.009670] [E] event(&(bit_wait_table + i)->dmap:0): [ 7.009671] __wake_up_common (kernel/sched/wait.c:108) [ 7.009673] stacktrace: [ 7.009674] dept_event (kernel/dependency/dept.c:2337) [ 7.009677] __wake_up_common (kernel/sched/wait.c:109) [ 7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) [ 7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) [ 7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) [ 7.009683] ext4_truncate (fs/ext4/inode.c:4212) [ 7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) [ 7.009687] generic_perform_write (mm/filemap.c:3784) [ 7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 7.009690] ext4_file_write_iter (fs/ext4/file.c:677) [ 7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 7.009694] vfs_write (fs/read_write.c:590) [ 7.009695] ksys_write (fs/read_write.c:644) [ 7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 7.009700] --------------------------------------------------- [ 7.009700] context B's detail [ 7.009701] --------------------------------------------------- [ 7.009702] context B [ 7.009702] [S] down_read(&ei->i_data_sem:0) [ 7.009703] [W] wait(&(bit_wait_table + i)->dmap:0) [ 7.009704] [E] up_read(&ei->i_data_sem:0) [ 7.009705] [ 7.009706] [S] down_read(&ei->i_data_sem:0): [ 7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) [ 7.009709] stacktrace: [ 7.009709] down_read (kernel/locking/rwsem.c:1461) [ 7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) [ 7.009712] ext4_getblk (fs/ext4/inode.c:851) [ 7.009714] ext4_bread (fs/ext4/inode.c:903) [ 7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) [ 7.009718] dx_probe (fs/ext4/namei.c:789) [ 7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) [ 7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) [ 7.009723] ext4_lookup (fs/ext4/namei.c:1770) [ 7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) [ 7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) [ 7.009729] do_filp_open (fs/namei.c:3637) [ 7.009731] do_sys_openat2 (fs/open.c:1215) [ 7.009732] do_sys_open (fs/open.c:1231) [ 7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 7.009738] [ 7.009738] [W] wait(&(bit_wait_table + i)->dmap:0): [ 7.009739] prepare_to_wait (kernel/sched/wait.c:275) [ 7.009741] stacktrace: [ 7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) [ 7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) [ 7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) [ 7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) [ 7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) [ 7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) [ 7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) [ 7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) [ 7.009754] ext4_find_extent (fs/ext4/extents.c:928) [ 7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) [ 7.009757] ext4_map_blocks (fs/ext4/inode.c:563) [ 7.009759] ext4_getblk (fs/ext4/inode.c:851) [ 7.009760] ext4_bread (fs/ext4/inode.c:903) [ 7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) [ 7.009764] dx_probe (fs/ext4/namei.c:789) [ 7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) [ 7.009767] [ 7.009768] [E] up_read(&ei->i_data_sem:0): [ 7.009769] ext4_map_blocks (fs/ext4/inode.c:593) [ 7.009771] stacktrace: [ 7.009771] up_read (kernel/locking/rwsem.c:1556) [ 7.009774] ext4_map_blocks (fs/ext4/inode.c:593) [ 7.009775] ext4_getblk (fs/ext4/inode.c:851) [ 7.009777] ext4_bread (fs/ext4/inode.c:903) [ 7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) [ 7.009780] dx_probe (fs/ext4/namei.c:789) [ 7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) [ 7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) [ 7.009786] ext4_lookup (fs/ext4/namei.c:1770) [ 7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) [ 7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) [ 7.009791] do_filp_open (fs/namei.c:3637) [ 7.009792] do_sys_openat2 (fs/open.c:1215) [ 7.009794] do_sys_open (fs/open.c:1231) [ 7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 7.009799] --------------------------------------------------- [ 7.009800] information that might be helpful [ 7.009800] --------------------------------------------------- [ 7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G W 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 [ 7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 7.009805] Call Trace: [ 7.009806] <TASK> [ 7.009807] dump_stack_lvl (lib/dump_stack.c:107) [ 7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) [ 7.009812] ? print_circle (kernel/dependency/dept.c:1086) [ 7.009814] cb_check_dl (kernel/dependency/dept.c:1104) [ 7.009815] bfs (kernel/dependency/dept.c:860) [ 7.009818] add_dep (kernel/dependency/dept.c:1423) [ 7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) [ 7.009822] ? __wake_up_common (kernel/sched/wait.c:108) [ 7.009824] dept_event (kernel/dependency/dept.c:2337) [ 7.009826] __wake_up_common (kernel/sched/wait.c:109) [ 7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) [ 7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) [ 7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) [ 7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) [ 7.009837] ext4_truncate (fs/ext4/inode.c:4212) [ 7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) [ 7.009842] generic_perform_write (mm/filemap.c:3784) [ 7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 7.009848] ext4_file_write_iter (fs/ext4/file.c:677) [ 7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 7.009854] vfs_write (fs/read_write.c:590) [ 7.009856] ksys_write (fs/read_write.c:644) [ 7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) [ 7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 7.009865] RIP: 0033:0x7f3b160b335d [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01 All code ======== 0: e1 20 loope 0x22 2: 00 00 add %al,(%rax) 4: 75 10 jne 0x16 6: b8 01 00 00 00 mov $0x1,%eax b: 0f 05 syscall d: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 13: 73 31 jae 0x46 15: c3 retq 16: 48 83 ec 08 sub $0x8,%rsp 1a: e8 ce fa ff ff callq 0xfffffffffffffaed 1f: 48 89 04 24 mov %rax,(%rsp) 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a:* 48 8b 3c 24 mov (%rsp),%rdi <-- trapping instruction 2e: 48 89 c2 mov %rax,%rdx 31: e8 17 fb ff ff callq 0xfffffffffffffb4d 36: 48 89 d0 mov %rdx,%rax 39: 48 83 c4 08 add $0x8,%rsp 3d: 48 rex.W 3e: 3d .byte 0x3d 3f: 01 .byte 0x1 Code starting with the faulting instruction =========================================== 0: 48 8b 3c 24 mov (%rsp),%rdi 4: 48 89 c2 mov %rax,%rdx 7: e8 17 fb ff ff callq 0xfffffffffffffb23 c: 48 89 d0 mov %rdx,%rax f: 48 83 c4 08 add $0x8,%rsp 13: 48 rex.W 14: 3d .byte 0x3d 15: 01 .byte 0x1 [ 7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d [ 7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001 [ 7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05 [ 7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0 [ 7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300 [ 7.009879] </TASK>
[ 9.008161] =================================================== [ 9.008163] DEPT: Circular dependency has been detected. [ 9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G W [ 9.008166] --------------------------------------------------- [ 9.008167] summary [ 9.008167] --------------------------------------------------- [ 9.008168] *** DEADLOCK *** [ 9.008168] [ 9.008168] context A [ 9.008169] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008171] [W] wait(&(&journal->j_wait_commit)->dmap:0) [ 9.008172] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008173] [ 9.008173] context B [ 9.008174] [S] down_write(mapping.invalidate_lock:0) [ 9.008175] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008176] [E] up_write(mapping.invalidate_lock:0) [ 9.008177] [ 9.008178] context C [ 9.008179] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) [ 9.008180] [W] down_write(mapping.invalidate_lock:0) [ 9.008181] [E] event(&(&journal->j_wait_commit)->dmap:0) [ 9.008181] [ 9.008182] [S]: start of the event context [ 9.008183] [W]: the wait blocked [ 9.008183] [E]: the event not reachable [ 9.008184] --------------------------------------------------- [ 9.008184] context A's detail [ 9.008185] --------------------------------------------------- [ 9.008186] context A [ 9.008186] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008187] [W] wait(&(&journal->j_wait_commit)->dmap:0) [ 9.008188] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008189] [ 9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0): [ 9.008191] (N/A) [ 9.008191] [ 9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0): [ 9.008193] prepare_to_wait (kernel/sched/wait.c:275) [ 9.008197] stacktrace: [ 9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) [ 9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) [ 9.008201] kjournald2 (fs/jbd2/journal.c:250) [ 9.008203] kthread (kernel/kthread.c:377) [ 9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) [ 9.008209] [ 9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0): [ 9.008210] __wake_up_common (kernel/sched/wait.c:108) [ 9.008212] stacktrace: [ 9.008213] dept_event (kernel/dependency/dept.c:2337) [ 9.008215] __wake_up_common (kernel/sched/wait.c:109) [ 9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) [ 9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) [ 9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) [ 9.008223] kthread (kernel/kthread.c:377) [ 9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) [ 9.008226] --------------------------------------------------- [ 9.008226] context B's detail [ 9.008227] --------------------------------------------------- [ 9.008228] context B [ 9.008228] [S] down_write(mapping.invalidate_lock:0) [ 9.008229] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) [ 9.008230] [E] up_write(mapping.invalidate_lock:0) [ 9.008231] [ 9.008232] [S] down_write(mapping.invalidate_lock:0): [ 9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) [ 9.008237] stacktrace: [ 9.008237] down_write (kernel/locking/rwsem.c:1514) [ 9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) [ 9.008241] generic_perform_write (mm/filemap.c:3784) [ 9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 9.008245] ext4_file_write_iter (fs/ext4/file.c:677) [ 9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 9.008250] vfs_write (fs/read_write.c:590) [ 9.008251] ksys_write (fs/read_write.c:644) [ 9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 9.008258] [ 9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0): [ 9.008259] prepare_to_wait (kernel/sched/wait.c:275) [ 9.008261] stacktrace: [ 9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) [ 9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) [ 9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) [ 9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) [ 9.008267] start_this_handle (fs/jbd2/transaction.c:427) [ 9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) [ 9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) [ 9.008273] ext4_truncate (fs/ext4/inode.c:4164) [ 9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) [ 9.008276] generic_perform_write (mm/filemap.c:3784) [ 9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 9.008279] ext4_file_write_iter (fs/ext4/file.c:677) [ 9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 9.008283] vfs_write (fs/read_write.c:590) [ 9.008284] ksys_write (fs/read_write.c:644) [ 9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 9.008287] [ 9.008288] [E] up_write(mapping.invalidate_lock:0): [ 9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) [ 9.008291] --------------------------------------------------- [ 9.008291] context C's detail [ 9.008292] --------------------------------------------------- [ 9.008292] context C [ 9.008293] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) [ 9.008294] [W] down_write(mapping.invalidate_lock:0) [ 9.008295] [E] event(&(&journal->j_wait_commit)->dmap:0) [ 9.008296] [ 9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0): [ 9.008298] (N/A) [ 9.008298] [ 9.008299] [W] down_write(mapping.invalidate_lock:0): [ 9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) [ 9.008302] stacktrace: [ 9.008302] down_write (kernel/locking/rwsem.c:1514) [ 9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) [ 9.008305] generic_perform_write (mm/filemap.c:3784) [ 9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 9.008309] ext4_file_write_iter (fs/ext4/file.c:677) [ 9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 9.008312] vfs_write (fs/read_write.c:590) [ 9.008314] ksys_write (fs/read_write.c:644) [ 9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 9.008318] [ 9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0): [ 9.008320] __wake_up_common (kernel/sched/wait.c:108) [ 9.008321] stacktrace: [ 9.008322] __wake_up_common (kernel/sched/wait.c:109) [ 9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) [ 9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) [ 9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) [ 9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) [ 9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) [ 9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) [ 9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) [ 9.008334] generic_perform_write (mm/filemap.c:3784) [ 9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) [ 9.008337] ext4_file_write_iter (fs/ext4/file.c:677) [ 9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) [ 9.008341] vfs_write (fs/read_write.c:590) [ 9.008342] ksys_write (fs/read_write.c:644) [ 9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 9.008347] --------------------------------------------------- [ 9.008348] information that might be helpful [ 9.008348] --------------------------------------------------- [ 9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G W 5.17.0-rc1-00015-gb94f67143867-dirty #2 [ 9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 9.008353] Call Trace: [ 9.008354] <TASK> [ 9.008355] dump_stack_lvl (lib/dump_stack.c:107) [ 9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) [ 9.008360] ? print_circle (kernel/dependency/dept.c:1086) [ 9.008362] cb_check_dl (kernel/dependency/dept.c:1104) [ 9.008364] bfs (kernel/dependency/dept.c:860) [ 9.008366] add_dep (kernel/dependency/dept.c:1423) [ 9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) [ 9.008370] ? __wake_up_common (kernel/sched/wait.c:108) [ 9.008372] dept_event (kernel/dependency/dept.c:2337) [ 9.008374] __wake_up_common (kernel/sched/wait.c:109) [ 9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) [ 9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) [ 9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) [ 9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) [ 9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) [ 9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [ 9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) [ 9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) [ 9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) [ 9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) [ 9.008400] ? commit_timeout (fs/jbd2/journal.c:173) [ 9.008402] kthread (kernel/kthread.c:377) [ 9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) [ 9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) [ 9.008410] </TASK>
On Thu, Feb 17, 2022 at 08:10:03PM +0900, Byungchul Park wrote: > [ 7.009608] =================================================== > [ 7.009613] DEPT: Circular dependency has been detected. > [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W > [ 7.009616] --------------------------------------------------- > [ 7.009617] summary > [ 7.009618] --------------------------------------------------- > [ 7.009618] *** DEADLOCK *** > [ 7.009618] > [ 7.009619] context A > [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) Why is the context unknown here? I don't see a way to debug this without knowing where we acquired the bit wait lock. > [ 7.009621] [W] down_write(&ei->i_data_sem:0) > [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) > [ 7.009624] > [ 7.009625] context B > [ 7.009625] [S] down_read(&ei->i_data_sem:0) > [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) > [ 7.009627] [E] up_read(&ei->i_data_sem:0) > [ 7.009628] > [ 7.009629] [S]: start of the event context > [ 7.009629] [W]: the wait blocked > [ 7.009630] [E]: the event not reachable > [ 7.009631] --------------------------------------------------- > [ 7.009631] context A's detail > [ 7.009632] --------------------------------------------------- > [ 7.009632] context A > [ 7.009633] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > [ 7.009634] [W] down_write(&ei->i_data_sem:0) > [ 7.009635] [E] event(&(bit_wait_table + i)->dmap:0) > [ 7.009636] > [ 7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0): > [ 7.009638] (N/A) > [ 7.009638] > [ 7.009639] [W] down_write(&ei->i_data_sem:0): > [ 7.009639] ext4_truncate (fs/ext4/inode.c:4187) > [ 7.009645] stacktrace: > [ 7.009646] down_write (kernel/locking/rwsem.c:1514) > [ 7.009648] ext4_truncate (fs/ext4/inode.c:4187) > [ 7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > [ 7.009652] generic_perform_write (mm/filemap.c:3784) > [ 7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 7.009657] ext4_file_write_iter (fs/ext4/file.c:677) > [ 7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 7.009662] vfs_write (fs/read_write.c:590) > [ 7.009663] ksys_write (fs/read_write.c:644) > [ 7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 7.009669] > [ 7.009670] [E] event(&(bit_wait_table + i)->dmap:0): > [ 7.009671] __wake_up_common (kernel/sched/wait.c:108) > [ 7.009673] stacktrace: > [ 7.009674] dept_event (kernel/dependency/dept.c:2337) > [ 7.009677] __wake_up_common (kernel/sched/wait.c:109) > [ 7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > [ 7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) > [ 7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) > [ 7.009683] ext4_truncate (fs/ext4/inode.c:4212) > [ 7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > [ 7.009687] generic_perform_write (mm/filemap.c:3784) > [ 7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 7.009690] ext4_file_write_iter (fs/ext4/file.c:677) > [ 7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 7.009694] vfs_write (fs/read_write.c:590) > [ 7.009695] ksys_write (fs/read_write.c:644) > [ 7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 7.009700] --------------------------------------------------- > [ 7.009700] context B's detail > [ 7.009701] --------------------------------------------------- > [ 7.009702] context B > [ 7.009702] [S] down_read(&ei->i_data_sem:0) > [ 7.009703] [W] wait(&(bit_wait_table + i)->dmap:0) > [ 7.009704] [E] up_read(&ei->i_data_sem:0) > [ 7.009705] > [ 7.009706] [S] down_read(&ei->i_data_sem:0): > [ 7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) > [ 7.009709] stacktrace: > [ 7.009709] down_read (kernel/locking/rwsem.c:1461) > [ 7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) > [ 7.009712] ext4_getblk (fs/ext4/inode.c:851) > [ 7.009714] ext4_bread (fs/ext4/inode.c:903) > [ 7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) > [ 7.009718] dx_probe (fs/ext4/namei.c:789) > [ 7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) > [ 7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) > [ 7.009723] ext4_lookup (fs/ext4/namei.c:1770) > [ 7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) > [ 7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) > [ 7.009729] do_filp_open (fs/namei.c:3637) > [ 7.009731] do_sys_openat2 (fs/open.c:1215) > [ 7.009732] do_sys_open (fs/open.c:1231) > [ 7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 7.009738] > [ 7.009738] [W] wait(&(bit_wait_table + i)->dmap:0): > [ 7.009739] prepare_to_wait (kernel/sched/wait.c:275) > [ 7.009741] stacktrace: > [ 7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > [ 7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) > [ 7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) > [ 7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) > [ 7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) > [ 7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) > [ 7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) > [ 7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) > [ 7.009754] ext4_find_extent (fs/ext4/extents.c:928) > [ 7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) > [ 7.009757] ext4_map_blocks (fs/ext4/inode.c:563) > [ 7.009759] ext4_getblk (fs/ext4/inode.c:851) > [ 7.009760] ext4_bread (fs/ext4/inode.c:903) > [ 7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) > [ 7.009764] dx_probe (fs/ext4/namei.c:789) > [ 7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) > [ 7.009767] > [ 7.009768] [E] up_read(&ei->i_data_sem:0): > [ 7.009769] ext4_map_blocks (fs/ext4/inode.c:593) > [ 7.009771] stacktrace: > [ 7.009771] up_read (kernel/locking/rwsem.c:1556) > [ 7.009774] ext4_map_blocks (fs/ext4/inode.c:593) > [ 7.009775] ext4_getblk (fs/ext4/inode.c:851) > [ 7.009777] ext4_bread (fs/ext4/inode.c:903) > [ 7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) > [ 7.009780] dx_probe (fs/ext4/namei.c:789) > [ 7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) > [ 7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) > [ 7.009786] ext4_lookup (fs/ext4/namei.c:1770) > [ 7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) > [ 7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) > [ 7.009791] do_filp_open (fs/namei.c:3637) > [ 7.009792] do_sys_openat2 (fs/open.c:1215) > [ 7.009794] do_sys_open (fs/open.c:1231) > [ 7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 7.009799] --------------------------------------------------- > [ 7.009800] information that might be helpful > [ 7.009800] --------------------------------------------------- > [ 7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G W 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 > [ 7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > [ 7.009805] Call Trace: > [ 7.009806] <TASK> > [ 7.009807] dump_stack_lvl (lib/dump_stack.c:107) > [ 7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) > [ 7.009812] ? print_circle (kernel/dependency/dept.c:1086) > [ 7.009814] cb_check_dl (kernel/dependency/dept.c:1104) > [ 7.009815] bfs (kernel/dependency/dept.c:860) > [ 7.009818] add_dep (kernel/dependency/dept.c:1423) > [ 7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) > [ 7.009822] ? __wake_up_common (kernel/sched/wait.c:108) > [ 7.009824] dept_event (kernel/dependency/dept.c:2337) > [ 7.009826] __wake_up_common (kernel/sched/wait.c:109) > [ 7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > [ 7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) > [ 7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) > [ 7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) > [ 7.009837] ext4_truncate (fs/ext4/inode.c:4212) > [ 7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > [ 7.009842] generic_perform_write (mm/filemap.c:3784) > [ 7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 7.009848] ext4_file_write_iter (fs/ext4/file.c:677) > [ 7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 7.009854] vfs_write (fs/read_write.c:590) > [ 7.009856] ksys_write (fs/read_write.c:644) > [ 7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) > [ 7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 7.009865] RIP: 0033:0x7f3b160b335d > [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01 > All code > ======== > 0: e1 20 loope 0x22 > 2: 00 00 add %al,(%rax) > 4: 75 10 jne 0x16 > 6: b8 01 00 00 00 mov $0x1,%eax > b: 0f 05 syscall > d: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > 13: 73 31 jae 0x46 > 15: c3 retq > 16: 48 83 ec 08 sub $0x8,%rsp > 1a: e8 ce fa ff ff callq 0xfffffffffffffaed > 1f: 48 89 04 24 mov %rax,(%rsp) > 23: b8 01 00 00 00 mov $0x1,%eax > 28: 0f 05 syscall > 2a:* 48 8b 3c 24 mov (%rsp),%rdi <-- trapping instruction > 2e: 48 89 c2 mov %rax,%rdx > 31: e8 17 fb ff ff callq 0xfffffffffffffb4d > 36: 48 89 d0 mov %rdx,%rax > 39: 48 83 c4 08 add $0x8,%rsp > 3d: 48 rex.W > 3e: 3d .byte 0x3d > 3f: 01 .byte 0x1 > > Code starting with the faulting instruction > =========================================== > 0: 48 8b 3c 24 mov (%rsp),%rdi > 4: 48 89 c2 mov %rax,%rdx > 7: e8 17 fb ff ff callq 0xfffffffffffffb23 > c: 48 89 d0 mov %rdx,%rax > f: 48 83 c4 08 add $0x8,%rsp > 13: 48 rex.W > 14: 3d .byte 0x3d > 15: 01 .byte 0x1 > [ 7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > [ 7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d > [ 7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001 > [ 7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05 > [ 7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0 > [ 7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300 > [ 7.009879] </TASK>
On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote: > > I've got several reports from the tool. Some of them look like false > alarms and some others look like real deadlock possibility. Because of > my unfamiliarity of the domain, it's hard to confirm if it's a real one. > Let me add the reports on this email thread. The problem is we have so many potentially invalid, or so-rare-as-to-be-not-worth-the-time-to-investigate-in-the- grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's really not reasonable to ask maintainers to determine whether something is a false alarm or not. If I want more of these unreliable potential bug reports to investigate, there is a huge backlog in Syzkaller. :-) Looking at the second ext4 report, it doesn't make any sense. Context A is the kjournald thread. We don't do a commit until (a) the timeout expires, or (b) someone explicitly requests that a commit happen waking up j_wait_commit. I'm guessing that complaint here is that DEPT thinks nothing is explicitly requesting a wake up. But note that after 5 seconds (or whatever journal->j_commit_interval) is configured to be we *will* always start a commit. So ergo, there can't be a deadlock. At a higher level of discussion, it's an unfair tax on maintainer's times to ask maintainers to help you debug DEPT for you. Tools like Syzkaller and DEPT are useful insofar as they save us time in making our subsystems better. But until you can prove that it's not going to be a massive denial of service attack on maintainer's time, at the *very* least keep an RFC on the patch, or add massive warnings that more often than not DEPT is going to be sending maintainers on a wild goose chase. If you know there there "appear to be false positives", you need to make sure you've tracked them all down before trying to ask that this be merged. You may also want to add some documentation about why we should trust this; in particular for wait channels, when a process calls schedule() there may be multiple reasons why the thread will wake up --- in the worst case, such as in the select(2) or epoll(2) system call, there may be literally thousands of reasons (one for every file desriptor the select is waiting on) --- why the process will wake up and thus resolve the potential "deadlock" that DEPT is worrying about. How is DEPT going to handle those cases? If the answer is that things need to be tagged, then at least disclose potential reasons why DEPT might be untrustworthy to save your reviewers time. I know that you're trying to help us, but this tool needs to be far better than Lockdep before we should think about merging it. Even if it finds 5% more potential deadlocks, if it creates 95% more false positive reports --- and the ones it finds are crazy things that rarely actually happen in practice, are the costs worth the benefits? And who is bearing the costs, and who is receiving the benefits? Regards, - Ted
On Thu, 17 Feb 2022 10:51:09 -0500 "Theodore Ts'o" <tytso@mit.edu> wrote: > I know that you're trying to help us, but this tool needs to be far > better than Lockdep before we should think about merging it. Even if > it finds 5% more potential deadlocks, if it creates 95% more false > positive reports --- and the ones it finds are crazy things that > rarely actually happen in practice, are the costs worth the benefits? > And who is bearing the costs, and who is receiving the benefits? I personally believe that there's potential that this can be helpful and we will want to merge it. But, what I believe Ted is trying to say is, if you do not know if the report is a bug or not, please do not ask the maintainers to determine it for you. This is a good opportunity for you to look to see why your tool reported an issue, and learn that subsystem. Look at if this is really a bug or not, and investigate why. The likely/unlikely tracing I do finds issues all over the kernel. But before I report anything, I look at the subsystem and determine *why* it's reporting what it does. In some cases, it's just a config issue. Where, I may submit a patch saying "this is 100% wrong in X config, and we should just remove the "unlikely". But I did the due diligence to find out exactly what the issue is, and why the tooling reported what it reported. I want to stress that your Dept tooling looks to have the potential of being something that will be worth while including. But the false positives needs to be down to the rate of lockdep false positives. As Ted said, if it's reporting 95% false positives, nobody is going to look at the 5% of real bugs that it finds. -- Steve
On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote: > On Thu, 17 Feb 2022 10:51:09 -0500 > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > I know that you're trying to help us, but this tool needs to be far > > better than Lockdep before we should think about merging it. Even if > > it finds 5% more potential deadlocks, if it creates 95% more false > > positive reports --- and the ones it finds are crazy things that > > rarely actually happen in practice, are the costs worth the benefits? > > And who is bearing the costs, and who is receiving the benefits? > > I personally believe that there's potential that this can be helpful and we > will want to merge it. > > But, what I believe Ted is trying to say is, if you do not know if the > report is a bug or not, please do not ask the maintainers to determine it > for you. This is a good opportunity for you to look to see why your tool > reported an issue, and learn that subsystem. Look at if this is really a > bug or not, and investigate why. I agree with Steven here, to the point where I'm willing to invest some time being a beta-tester for this, so if you focus your efforts on filesystem/mm kinds of problems, I can continue looking at them and tell you what's helpful and what's unhelpful in the reports.
On Thu, Feb 17, 2022 at 01:27:42PM +0000, Matthew Wilcox wrote: > On Thu, Feb 17, 2022 at 08:10:03PM +0900, Byungchul Park wrote: > > [ 7.009608] =================================================== > > [ 7.009613] DEPT: Circular dependency has been detected. > > [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W > > [ 7.009616] --------------------------------------------------- > > [ 7.009617] summary > > [ 7.009618] --------------------------------------------------- > > [ 7.009618] *** DEADLOCK *** > > [ 7.009618] > > [ 7.009619] context A > > [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > > Why is the context unknown here? I don't see a way to debug this > without knowing where we acquired the bit wait lock. ideal view ------------------ context X context A request event E to context A ... write REQUESTEVENT if (can see REQUESTEVENT written) ... notice the request from X [S] wait for the event [W] ... write barrier write WAITSTART actual wait if (can see REQUESTEVENT written) consider it's on the way to the event ... ... finally the event [E] Dept works with the above view wrt. wait and event. [S] point varies depending on ways to make context A notice the request so that the event context A can start. Of course, by making more effort on identifying each way in the kernel, we can figure out the exact point. Here, we can also check WAITSTART with a read barrier instead of REQUESTEVENT in context A conservatively. That's how Dept works. Dept's view ------------------ context X context A request event E to context A ... write REQUESTEVENT if (can see REQUESTEVENT written) ... notice the request from X [S] wait for the event [W] ... write barrier write WAITSTART read barrier actual wait if (can see WAITSTART written) consider it's on the way to the event ... ... finally the event [E] consider all waits in context A so far that could see WAITSTART written Thanks, Byungchul > > [ 7.009621] [W] down_write(&ei->i_data_sem:0) > > [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) > > [ 7.009624] > > [ 7.009625] context B > > [ 7.009625] [S] down_read(&ei->i_data_sem:0) > > [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) > > [ 7.009627] [E] up_read(&ei->i_data_sem:0) > > [ 7.009628] > > [ 7.009629] [S]: start of the event context > > [ 7.009629] [W]: the wait blocked > > [ 7.009630] [E]: the event not reachable > > [ 7.009631] --------------------------------------------------- > > [ 7.009631] context A's detail > > [ 7.009632] --------------------------------------------------- > > [ 7.009632] context A > > [ 7.009633] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > > [ 7.009634] [W] down_write(&ei->i_data_sem:0) > > [ 7.009635] [E] event(&(bit_wait_table + i)->dmap:0) > > [ 7.009636] > > [ 7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0): > > [ 7.009638] (N/A) > > [ 7.009638] > > [ 7.009639] [W] down_write(&ei->i_data_sem:0): > > [ 7.009639] ext4_truncate (fs/ext4/inode.c:4187) > > [ 7.009645] stacktrace: > > [ 7.009646] down_write (kernel/locking/rwsem.c:1514) > > [ 7.009648] ext4_truncate (fs/ext4/inode.c:4187) > > [ 7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > > [ 7.009652] generic_perform_write (mm/filemap.c:3784) > > [ 7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 7.009657] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 7.009662] vfs_write (fs/read_write.c:590) > > [ 7.009663] ksys_write (fs/read_write.c:644) > > [ 7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 7.009669] > > [ 7.009670] [E] event(&(bit_wait_table + i)->dmap:0): > > [ 7.009671] __wake_up_common (kernel/sched/wait.c:108) > > [ 7.009673] stacktrace: > > [ 7.009674] dept_event (kernel/dependency/dept.c:2337) > > [ 7.009677] __wake_up_common (kernel/sched/wait.c:109) > > [ 7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > > [ 7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) > > [ 7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) > > [ 7.009683] ext4_truncate (fs/ext4/inode.c:4212) > > [ 7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > > [ 7.009687] generic_perform_write (mm/filemap.c:3784) > > [ 7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 7.009690] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 7.009694] vfs_write (fs/read_write.c:590) > > [ 7.009695] ksys_write (fs/read_write.c:644) > > [ 7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 7.009700] --------------------------------------------------- > > [ 7.009700] context B's detail > > [ 7.009701] --------------------------------------------------- > > [ 7.009702] context B > > [ 7.009702] [S] down_read(&ei->i_data_sem:0) > > [ 7.009703] [W] wait(&(bit_wait_table + i)->dmap:0) > > [ 7.009704] [E] up_read(&ei->i_data_sem:0) > > [ 7.009705] > > [ 7.009706] [S] down_read(&ei->i_data_sem:0): > > [ 7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) > > [ 7.009709] stacktrace: > > [ 7.009709] down_read (kernel/locking/rwsem.c:1461) > > [ 7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) > > [ 7.009712] ext4_getblk (fs/ext4/inode.c:851) > > [ 7.009714] ext4_bread (fs/ext4/inode.c:903) > > [ 7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) > > [ 7.009718] dx_probe (fs/ext4/namei.c:789) > > [ 7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) > > [ 7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) > > [ 7.009723] ext4_lookup (fs/ext4/namei.c:1770) > > [ 7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) > > [ 7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) > > [ 7.009729] do_filp_open (fs/namei.c:3637) > > [ 7.009731] do_sys_openat2 (fs/open.c:1215) > > [ 7.009732] do_sys_open (fs/open.c:1231) > > [ 7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 7.009738] > > [ 7.009738] [W] wait(&(bit_wait_table + i)->dmap:0): > > [ 7.009739] prepare_to_wait (kernel/sched/wait.c:275) > > [ 7.009741] stacktrace: > > [ 7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > > [ 7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) > > [ 7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) > > [ 7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) > > [ 7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) > > [ 7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) > > [ 7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) > > [ 7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) > > [ 7.009754] ext4_find_extent (fs/ext4/extents.c:928) > > [ 7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) > > [ 7.009757] ext4_map_blocks (fs/ext4/inode.c:563) > > [ 7.009759] ext4_getblk (fs/ext4/inode.c:851) > > [ 7.009760] ext4_bread (fs/ext4/inode.c:903) > > [ 7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) > > [ 7.009764] dx_probe (fs/ext4/namei.c:789) > > [ 7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) > > [ 7.009767] > > [ 7.009768] [E] up_read(&ei->i_data_sem:0): > > [ 7.009769] ext4_map_blocks (fs/ext4/inode.c:593) > > [ 7.009771] stacktrace: > > [ 7.009771] up_read (kernel/locking/rwsem.c:1556) > > [ 7.009774] ext4_map_blocks (fs/ext4/inode.c:593) > > [ 7.009775] ext4_getblk (fs/ext4/inode.c:851) > > [ 7.009777] ext4_bread (fs/ext4/inode.c:903) > > [ 7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) > > [ 7.009780] dx_probe (fs/ext4/namei.c:789) > > [ 7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) > > [ 7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) > > [ 7.009786] ext4_lookup (fs/ext4/namei.c:1770) > > [ 7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) > > [ 7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) > > [ 7.009791] do_filp_open (fs/namei.c:3637) > > [ 7.009792] do_sys_openat2 (fs/open.c:1215) > > [ 7.009794] do_sys_open (fs/open.c:1231) > > [ 7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 7.009799] --------------------------------------------------- > > [ 7.009800] information that might be helpful > > [ 7.009800] --------------------------------------------------- > > [ 7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G W 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 > > [ 7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 7.009805] Call Trace: > > [ 7.009806] <TASK> > > [ 7.009807] dump_stack_lvl (lib/dump_stack.c:107) > > [ 7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) > > [ 7.009812] ? print_circle (kernel/dependency/dept.c:1086) > > [ 7.009814] cb_check_dl (kernel/dependency/dept.c:1104) > > [ 7.009815] bfs (kernel/dependency/dept.c:860) > > [ 7.009818] add_dep (kernel/dependency/dept.c:1423) > > [ 7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) > > [ 7.009822] ? __wake_up_common (kernel/sched/wait.c:108) > > [ 7.009824] dept_event (kernel/dependency/dept.c:2337) > > [ 7.009826] __wake_up_common (kernel/sched/wait.c:109) > > [ 7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > > [ 7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) > > [ 7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) > > [ 7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) > > [ 7.009837] ext4_truncate (fs/ext4/inode.c:4212) > > [ 7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > > [ 7.009842] generic_perform_write (mm/filemap.c:3784) > > [ 7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 7.009848] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 7.009854] vfs_write (fs/read_write.c:590) > > [ 7.009856] ksys_write (fs/read_write.c:644) > > [ 7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) > > [ 7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 7.009865] RIP: 0033:0x7f3b160b335d > > [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01 > > All code > > ======== > > 0: e1 20 loope 0x22 > > 2: 00 00 add %al,(%rax) > > 4: 75 10 jne 0x16 > > 6: b8 01 00 00 00 mov $0x1,%eax > > b: 0f 05 syscall > > d: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > > 13: 73 31 jae 0x46 > > 15: c3 retq > > 16: 48 83 ec 08 sub $0x8,%rsp > > 1a: e8 ce fa ff ff callq 0xfffffffffffffaed > > 1f: 48 89 04 24 mov %rax,(%rsp) > > 23: b8 01 00 00 00 mov $0x1,%eax > > 28: 0f 05 syscall > > 2a:* 48 8b 3c 24 mov (%rsp),%rdi <-- trapping instruction > > 2e: 48 89 c2 mov %rax,%rdx > > 31: e8 17 fb ff ff callq 0xfffffffffffffb4d > > 36: 48 89 d0 mov %rdx,%rax > > 39: 48 83 c4 08 add $0x8,%rsp > > 3d: 48 rex.W > > 3e: 3d .byte 0x3d > > 3f: 01 .byte 0x1 > > > > Code starting with the faulting instruction > > =========================================== > > 0: 48 8b 3c 24 mov (%rsp),%rdi > > 4: 48 89 c2 mov %rax,%rdx > > 7: e8 17 fb ff ff callq 0xfffffffffffffb23 > > c: 48 89 d0 mov %rdx,%rax > > f: 48 83 c4 08 add $0x8,%rsp > > 13: 48 rex.W > > 14: 3d .byte 0x3d > > 15: 01 .byte 0x1 > > [ 7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > > [ 7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d > > [ 7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001 > > [ 7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05 > > [ 7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0 > > [ 7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300 > > [ 7.009879] </TASK>
On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote: > > I personally believe that there's potential that this can be helpful and we > will want to merge it. > > But, what I believe Ted is trying to say is, if you do not know if the > report is a bug or not, please do not ask the maintainers to determine it > for you. This is a good opportunity for you to look to see why your tool > reported an issue, and learn that subsystem. Look at if this is really a > bug or not, and investigate why. I agree there's potential here, or I would have ignored the ext4 "bug report". When we can get rid of the false positives, I think it should be merged; I'd just rather it not be merged until after the false positives are fixed, since otherwise, someone well-meaning will start using it with Syzkaller, and noise that maintainers need to deal with (with people requesting reverts of two year old commits, etc) will increase by a factor of ten or more. (With Syzbot reproducers that set up random cgroups, IP tunnels with wiregaurd enabled, FUSE stress testers, etc., that file system maintainers will be asked to try to disentangle.) So from a maintainer's perspective, false positives are highly negative. It may be that from some people's POV, one bug found and 20 false positive might still be "useful". But if your tool gains a reputation of not valuing maintainers' time, it's just going to make us (or at least me :-) cranky, and it's going to be very hard to recover from perception. So it's probably better to be very conservative and careful in polishing it before asking for it to be merged. Cheers, - Ted
On Thu, Feb 17, 2022 at 10:51:09AM -0500, Theodore Ts'o wrote: > On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote: > > > > I've got several reports from the tool. Some of them look like false > > alarms and some others look like real deadlock possibility. Because of > > my unfamiliarity of the domain, it's hard to confirm if it's a real one. > > Let me add the reports on this email thread. > > The problem is we have so many potentially invalid, or > so-rare-as-to-be-not-worth-the-time-to-investigate-in-the- > grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's > really not reasonable to ask maintainers to determine whether Even though I might have been wrong and might be gonna be wrong, you look so arrogant. You were hasty to judge and trying to walk over me. I reported it because I thought it was a real problem but couldn't confirm it. For the other reports that I thought was not real, I didn't even mention it. If you are talking about the previous report, then I felt so sorry as I told you. I skimmed through the part of the waits... Basically, I respect you and appreciate your feedback. Hope you not get me wrong. > Looking at the second ext4 report, it doesn't make any sense. Context > A is the kjournald thread. We don't do a commit until (a) the timeout > expires, or (b) someone explicitly requests that a commit happen > waking up j_wait_commit. I'm guessing that complaint here is that > DEPT thinks nothing is explicitly requesting a wake up. But note that > after 5 seconds (or whatever journal->j_commit_interval) is configured > to be we *will* always start a commit. So ergo, there can't be a deadlock. Yeah, it might not be a *deadlock deadlock* because the wait will be anyway woken up by one of the wake up points you mentioned. However, the dependency looks problematic because the three contexts participating in the dependency chain would be stuck for a while until one eventually wakes it up. I bet it would not be what you meant. Again. It's not critical but problematic. Or am I missing something? > At a higher level of discussion, it's an unfair tax on maintainer's > times to ask maintainers to help you debug DEPT for you. Tools like > Syzkaller and DEPT are useful insofar as they save us time in making > our subsystems better. But until you can prove that it's not going to > be a massive denial of service attack on maintainer's time, at the Partially I agree. I would understand you even if you don't support Dept until you think it's valuable enough. However, let me keep asking things to fs folks, not you, even though I would cc you on it. > If you know there there "appear to be false positives", you need to > make sure you've tracked them all down before trying to ask that this > be merged. To track them all down, I need to ask LKML because Dept works perfectly with my system. I don't want it to be merged with a lot of false positive still in there, either. > You may also want to add some documentation about why we should trust > this; in particular for wait channels, when a process calls schedule() > there may be multiple reasons why the thread will wake up --- in the > worst case, such as in the select(2) or epoll(2) system call, there > may be literally thousands of reasons (one for every file desriptor > the select is waiting on) --- why the process will wake up and thus > resolve the potential "deadlock" that DEPT is worrying about. How is > DEPT going to handle those cases? If the answer is that things need Thank you for the information but I don't get it which case you are concerning. I'd like to ask you a specific senario of that so that we can discuss it more - maybe I guess I could answer to it tho, but I won't ask you. Just give me an instance only if you think it's worthy. You look like a guy who unconditionally blames on new things before understanding it rather than asking and discussing. Again. I also think anyone doesn't have to spend his or her time for what he or she think is not worthy enough. > I know that you're trying to help us, but this tool needs to be far > better than Lockdep before we should think about merging it. Even if > it finds 5% more potential deadlocks, if it creates 95% more false It should not get merged for sure if so, but it sounds too sarcastic. Let's see if it creates 95% false positives for real. If it's true and I can't control it, I will give up. That's what I should do. There are a lot of factors to judge how valuable Dept is. Dept would be useful especially in the middle of development, rather than in the final state in the tree. It'd be appreciated if you think that sides more, too. Thanks, Byungchul
On Thu, Feb 17, 2022 at 05:06:18PM +0000, Matthew Wilcox wrote: > On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote: > > On Thu, 17 Feb 2022 10:51:09 -0500 > > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > > > I know that you're trying to help us, but this tool needs to be far > > > better than Lockdep before we should think about merging it. Even if > > > it finds 5% more potential deadlocks, if it creates 95% more false > > > positive reports --- and the ones it finds are crazy things that > > > rarely actually happen in practice, are the costs worth the benefits? > > > And who is bearing the costs, and who is receiving the benefits? > > > > I personally believe that there's potential that this can be helpful and we > > will want to merge it. > > > > But, what I believe Ted is trying to say is, if you do not know if the > > report is a bug or not, please do not ask the maintainers to determine it > > for you. This is a good opportunity for you to look to see why your tool > > reported an issue, and learn that subsystem. Look at if this is really a > > bug or not, and investigate why. > > I agree with Steven here, to the point where I'm willing to invest some > time being a beta-tester for this, so if you focus your efforts on > filesystem/mm kinds of problems, I can continue looking at them and > tell you what's helpful and what's unhelpful in the reports. I appreciate your support. I'll do my best to make it *THE* perfect tool for that purpose. I'd feel great if it helps a lot and saves you guys' time in advance, it might not for now tho. Thanks, Byungchul
On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote: > On Thu, 17 Feb 2022 10:51:09 -0500 > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > I know that you're trying to help us, but this tool needs to be far > > better than Lockdep before we should think about merging it. Even if > > it finds 5% more potential deadlocks, if it creates 95% more false > > positive reports --- and the ones it finds are crazy things that > > rarely actually happen in practice, are the costs worth the benefits? > > And who is bearing the costs, and who is receiving the benefits? > > I personally believe that there's potential that this can be helpful and we > will want to merge it. > > But, what I believe Ted is trying to say is, if you do not know if the > report is a bug or not, please do not ask the maintainers to determine it > for you. This is a good opportunity for you to look to see why your tool > reported an issue, and learn that subsystem. Look at if this is really a > bug or not, and investigate why. Appreciate your feedback. I'll be more careful in reporting things, and I think I need to make it more conservative... > The likely/unlikely tracing I do finds issues all over the kernel. But > before I report anything, I look at the subsystem and determine *why* it's > reporting what it does. In some cases, it's just a config issue. Where, I > may submit a patch saying "this is 100% wrong in X config, and we should > just remove the "unlikely". But I did the due diligence to find out exactly > what the issue is, and why the tooling reported what it reported. I'll try my best to do things that way. However, thing is that there's few reports with my system... That's why I shared Dept in LKML space. > I want to stress that your Dept tooling looks to have the potential of > being something that will be worth while including. But the false positives > needs to be down to the rate of lockdep false positives. As Ted said, if > it's reporting 95% false positives, nobody is going to look at the 5% of > real bugs that it finds. Agree. Dept should not be merged if so. I'm not pushing ahead, but I'm convinced that Dept works what a dependency tracker should do. Let's see how valuable it is esp. in the middle of developing something in the kernel. Thanks, Byungchul
On Thu, Feb 17, 2022 at 11:19:15PM -0500, Theodore Ts'o wrote: > On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote: > > > > I personally believe that there's potential that this can be helpful and we > > will want to merge it. > > > > But, what I believe Ted is trying to say is, if you do not know if the > > report is a bug or not, please do not ask the maintainers to determine it > > for you. This is a good opportunity for you to look to see why your tool > > reported an issue, and learn that subsystem. Look at if this is really a > > bug or not, and investigate why. > > I agree there's potential here, or I would have ignored the ext4 "bug > report". I just checked this one. Appreciate it... > When we can get rid of the false positives, I think it should be Of course, the false positives should be removed once it's found. I will try my best to remove all of those on my own as much as possible. However, thing is I can't see others than what I can see with my system. > merged; I'd just rather it not be merged until after the false > positives are fixed, since otherwise, someone well-meaning will start > using it with Syzkaller, and noise that maintainers need to deal with > (with people requesting reverts of two year old commits, etc) will > increase by a factor of ten or more. (With Syzbot reproducers that Agree. > set up random cgroups, IP tunnels with wiregaurd enabled, FUSE stress > testers, etc., that file system maintainers will be asked to try to > disentangle.) > > So from a maintainer's perspective, false positives are highly > negative. It may be that from some people's POV, one bug found and 20 > false positive might still be "useful". But if your tool gains a > reputation of not valuing maintainers' time, it's just going to make > us (or at least me :-) cranky, and it's going to be very hard to Agree. > recover from perception. So it's probably better to be very > conservative and careful in polishing it before asking for it to be > merged. If it's true that there are too many false positives like 95%, then I'll fix those fist for sure before asking to merge it. Let's see if so. To kernel developers, It'd be appreciated if you'd let us know if you can see real ones than false positives in the middle of developing something in the kernel so it's useful. Otherwise, it's hard to measure how many false positives it reports and how valuable it is and so on... Thanks, Byungchul
So I was trying to understand what this report is about for some time but honestly I have failed... On Thu 17-02-22 20:10:04, Byungchul Park wrote: > [ 9.008161] =================================================== > [ 9.008163] DEPT: Circular dependency has been detected. > [ 9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G W > [ 9.008166] --------------------------------------------------- > [ 9.008167] summary > [ 9.008167] --------------------------------------------------- > [ 9.008168] *** DEADLOCK *** > [ 9.008168] > [ 9.008168] context A > [ 9.008169] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008171] [W] wait(&(&journal->j_wait_commit)->dmap:0) > [ 9.008172] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008173] > [ 9.008173] context B > [ 9.008174] [S] down_write(mapping.invalidate_lock:0) > [ 9.008175] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008176] [E] up_write(mapping.invalidate_lock:0) > [ 9.008177] > [ 9.008178] context C > [ 9.008179] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > [ 9.008180] [W] down_write(mapping.invalidate_lock:0) > [ 9.008181] [E] event(&(&journal->j_wait_commit)->dmap:0) > [ 9.008181] > [ 9.008182] [S]: start of the event context > [ 9.008183] [W]: the wait blocked > [ 9.008183] [E]: the event not reachable So what situation is your tool complaining about here? Can you perhaps show it here in more common visualization like: TASK1 TASK2 does foo, grabs Z does X, grabs lock Y blocks on Z blocks on Y or something like that? Because I was not able to decipher this from the report even after trying for some time... Honza > [ 9.008184] --------------------------------------------------- > [ 9.008184] context A's detail > [ 9.008185] --------------------------------------------------- > [ 9.008186] context A > [ 9.008186] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008187] [W] wait(&(&journal->j_wait_commit)->dmap:0) > [ 9.008188] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008189] > [ 9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0): > [ 9.008191] (N/A) > [ 9.008191] > [ 9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0): > [ 9.008193] prepare_to_wait (kernel/sched/wait.c:275) > [ 9.008197] stacktrace: > [ 9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > [ 9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) > [ 9.008201] kjournald2 (fs/jbd2/journal.c:250) > [ 9.008203] kthread (kernel/kthread.c:377) > [ 9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) > [ 9.008209] > [ 9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0): > [ 9.008210] __wake_up_common (kernel/sched/wait.c:108) > [ 9.008212] stacktrace: > [ 9.008213] dept_event (kernel/dependency/dept.c:2337) > [ 9.008215] __wake_up_common (kernel/sched/wait.c:109) > [ 9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > [ 9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) > [ 9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) > [ 9.008223] kthread (kernel/kthread.c:377) > [ 9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) > [ 9.008226] --------------------------------------------------- > [ 9.008226] context B's detail > [ 9.008227] --------------------------------------------------- > [ 9.008228] context B > [ 9.008228] [S] down_write(mapping.invalidate_lock:0) > [ 9.008229] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > [ 9.008230] [E] up_write(mapping.invalidate_lock:0) > [ 9.008231] > [ 9.008232] [S] down_write(mapping.invalidate_lock:0): > [ 9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > [ 9.008237] stacktrace: > [ 9.008237] down_write (kernel/locking/rwsem.c:1514) > [ 9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > [ 9.008241] generic_perform_write (mm/filemap.c:3784) > [ 9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 9.008245] ext4_file_write_iter (fs/ext4/file.c:677) > [ 9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 9.008250] vfs_write (fs/read_write.c:590) > [ 9.008251] ksys_write (fs/read_write.c:644) > [ 9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 9.008258] > [ 9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0): > [ 9.008259] prepare_to_wait (kernel/sched/wait.c:275) > [ 9.008261] stacktrace: > [ 9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > [ 9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) > [ 9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) > [ 9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) > [ 9.008267] start_this_handle (fs/jbd2/transaction.c:427) > [ 9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) > [ 9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) > [ 9.008273] ext4_truncate (fs/ext4/inode.c:4164) > [ 9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > [ 9.008276] generic_perform_write (mm/filemap.c:3784) > [ 9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 9.008279] ext4_file_write_iter (fs/ext4/file.c:677) > [ 9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 9.008283] vfs_write (fs/read_write.c:590) > [ 9.008284] ksys_write (fs/read_write.c:644) > [ 9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 9.008287] > [ 9.008288] [E] up_write(mapping.invalidate_lock:0): > [ 9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) > [ 9.008291] --------------------------------------------------- > [ 9.008291] context C's detail > [ 9.008292] --------------------------------------------------- > [ 9.008292] context C > [ 9.008293] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > [ 9.008294] [W] down_write(mapping.invalidate_lock:0) > [ 9.008295] [E] event(&(&journal->j_wait_commit)->dmap:0) > [ 9.008296] > [ 9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0): > [ 9.008298] (N/A) > [ 9.008298] > [ 9.008299] [W] down_write(mapping.invalidate_lock:0): > [ 9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > [ 9.008302] stacktrace: > [ 9.008302] down_write (kernel/locking/rwsem.c:1514) > [ 9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > [ 9.008305] generic_perform_write (mm/filemap.c:3784) > [ 9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 9.008309] ext4_file_write_iter (fs/ext4/file.c:677) > [ 9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 9.008312] vfs_write (fs/read_write.c:590) > [ 9.008314] ksys_write (fs/read_write.c:644) > [ 9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 9.008318] > [ 9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0): > [ 9.008320] __wake_up_common (kernel/sched/wait.c:108) > [ 9.008321] stacktrace: > [ 9.008322] __wake_up_common (kernel/sched/wait.c:109) > [ 9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > [ 9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) > [ 9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) > [ 9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) > [ 9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) > [ 9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) > [ 9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) > [ 9.008334] generic_perform_write (mm/filemap.c:3784) > [ 9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) > [ 9.008337] ext4_file_write_iter (fs/ext4/file.c:677) > [ 9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) > [ 9.008341] vfs_write (fs/read_write.c:590) > [ 9.008342] ksys_write (fs/read_write.c:644) > [ 9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 9.008347] --------------------------------------------------- > [ 9.008348] information that might be helpful > [ 9.008348] --------------------------------------------------- > [ 9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G W 5.17.0-rc1-00015-gb94f67143867-dirty #2 > [ 9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > [ 9.008353] Call Trace: > [ 9.008354] <TASK> > [ 9.008355] dump_stack_lvl (lib/dump_stack.c:107) > [ 9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) > [ 9.008360] ? print_circle (kernel/dependency/dept.c:1086) > [ 9.008362] cb_check_dl (kernel/dependency/dept.c:1104) > [ 9.008364] bfs (kernel/dependency/dept.c:860) > [ 9.008366] add_dep (kernel/dependency/dept.c:1423) > [ 9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) > [ 9.008370] ? __wake_up_common (kernel/sched/wait.c:108) > [ 9.008372] dept_event (kernel/dependency/dept.c:2337) > [ 9.008374] __wake_up_common (kernel/sched/wait.c:109) > [ 9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > [ 9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) > [ 9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) > [ 9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) > [ 9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) > [ 9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) > [ 9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > [ 9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) > [ 9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) > [ 9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) > [ 9.008400] ? commit_timeout (fs/jbd2/journal.c:173) > [ 9.008402] kthread (kernel/kthread.c:377) > [ 9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) > [ 9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) > [ 9.008410] </TASK>
On Thu 17-02-22 20:10:03, Byungchul Park wrote: > [ 7.009608] =================================================== > [ 7.009613] DEPT: Circular dependency has been detected. > [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W > [ 7.009616] --------------------------------------------------- > [ 7.009617] summary > [ 7.009618] --------------------------------------------------- > [ 7.009618] *** DEADLOCK *** > [ 7.009618] > [ 7.009619] context A > [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > [ 7.009621] [W] down_write(&ei->i_data_sem:0) > [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) > [ 7.009624] > [ 7.009625] context B > [ 7.009625] [S] down_read(&ei->i_data_sem:0) > [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) > [ 7.009627] [E] up_read(&ei->i_data_sem:0) > [ 7.009628] Looking into this I have noticed that Dept here tracks bitlocks (buffer locks in particular) but it apparently treats locks on all buffers as one locking class so it conflates lock on superblock buffer with a lock on extent tree block buffer. These are wastly different locks with different locking constraints. So to avoid false positives in filesystems we will need to add annotations to differentiate locks on different buffers (based on what the block is used for). Similarly how we e.g. annotate i_rwsem for different inodes. Honza
On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote: > On Thu 17-02-22 20:10:04, Byungchul Park wrote: > > [ 9.008161] =================================================== > > [ 9.008163] DEPT: Circular dependency has been detected. > > [ 9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G W > > [ 9.008166] --------------------------------------------------- > > [ 9.008167] summary > > [ 9.008167] --------------------------------------------------- > > [ 9.008168] *** DEADLOCK *** > > [ 9.008168] > > [ 9.008168] context A > > [ 9.008169] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008171] [W] wait(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008172] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008173] > > [ 9.008173] context B > > [ 9.008174] [S] down_write(mapping.invalidate_lock:0) > > [ 9.008175] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008176] [E] up_write(mapping.invalidate_lock:0) > > [ 9.008177] > > [ 9.008178] context C > > [ 9.008179] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008180] [W] down_write(mapping.invalidate_lock:0) > > [ 9.008181] [E] event(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008181] > > [ 9.008182] [S]: start of the event context > > [ 9.008183] [W]: the wait blocked > > [ 9.008183] [E]: the event not reachable > > So what situation is your tool complaining about here? Can you perhaps show > it here in more common visualization like: Sure. > TASK1 TASK2 > does foo, grabs Z > does X, grabs lock Y > blocks on Z > blocks on Y > > or something like that? Because I was not able to decipher this from the > report even after trying for some time... KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) wait A --- stuck wait B --- stuck wait C --- stuck wake up B wake up C wake up A where: A is a wait_queue, j_wait_commit B is a wait_queue, j_wait_transaction_locked C is a rwsem, mapping.invalidate_lock The above is the simplest form. And it's worth noting that Dept focuses on wait and event itself rather than grabing and releasing things like lock. The following is the more descriptive form of it. KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) wait @j_wait_commit ext4_truncate_failed_write() down_write(mapping.invalidate_lock) ext4_truncate() ... wait @j_wait_transaction_locked ext_truncate_failed_write() down_write(mapping.invalidate_lock) ext4_should_retry_alloc() ... __jbd2_log_start_commit() wake_up(j_wait_commit) jbd2_journal_commit_transaction() wake_up(j_wait_transaction_locked) up_write(mapping.invalidate_lock) I hope this would help you understand the report. Yeah... This is what Dept complained. And as Ted said, the kthread would be woken up by another wakeup. So it's not deadlock deadlock. However, these three threads and any other tasks waiting for any of the events A, B, C would be struck for a while until the wakeup eventually wakes up the kthread, kjournald2. I guess it's not what ext4 is meant to do. Honestly, ext4 and journal system look so complicated that I'm not convinced tho... Thanks, Byungchul > > Honza > > > > > [ 9.008184] --------------------------------------------------- > > [ 9.008184] context A's detail > > [ 9.008185] --------------------------------------------------- > > [ 9.008186] context A > > [ 9.008186] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008187] [W] wait(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008188] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008189] > > [ 9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0): > > [ 9.008191] (N/A) > > [ 9.008191] > > [ 9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0): > > [ 9.008193] prepare_to_wait (kernel/sched/wait.c:275) > > [ 9.008197] stacktrace: > > [ 9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > > [ 9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) > > [ 9.008201] kjournald2 (fs/jbd2/journal.c:250) > > [ 9.008203] kthread (kernel/kthread.c:377) > > [ 9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) > > [ 9.008209] > > [ 9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0): > > [ 9.008210] __wake_up_common (kernel/sched/wait.c:108) > > [ 9.008212] stacktrace: > > [ 9.008213] dept_event (kernel/dependency/dept.c:2337) > > [ 9.008215] __wake_up_common (kernel/sched/wait.c:109) > > [ 9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > > [ 9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) > > [ 9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) > > [ 9.008223] kthread (kernel/kthread.c:377) > > [ 9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) > > [ 9.008226] --------------------------------------------------- > > [ 9.008226] context B's detail > > [ 9.008227] --------------------------------------------------- > > [ 9.008228] context B > > [ 9.008228] [S] down_write(mapping.invalidate_lock:0) > > [ 9.008229] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > > [ 9.008230] [E] up_write(mapping.invalidate_lock:0) > > [ 9.008231] > > [ 9.008232] [S] down_write(mapping.invalidate_lock:0): > > [ 9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > > [ 9.008237] stacktrace: > > [ 9.008237] down_write (kernel/locking/rwsem.c:1514) > > [ 9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > > [ 9.008241] generic_perform_write (mm/filemap.c:3784) > > [ 9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 9.008245] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 9.008250] vfs_write (fs/read_write.c:590) > > [ 9.008251] ksys_write (fs/read_write.c:644) > > [ 9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 9.008258] > > [ 9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0): > > [ 9.008259] prepare_to_wait (kernel/sched/wait.c:275) > > [ 9.008261] stacktrace: > > [ 9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) > > [ 9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) > > [ 9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) > > [ 9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) > > [ 9.008267] start_this_handle (fs/jbd2/transaction.c:427) > > [ 9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) > > [ 9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) > > [ 9.008273] ext4_truncate (fs/ext4/inode.c:4164) > > [ 9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) > > [ 9.008276] generic_perform_write (mm/filemap.c:3784) > > [ 9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 9.008279] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 9.008283] vfs_write (fs/read_write.c:590) > > [ 9.008284] ksys_write (fs/read_write.c:644) > > [ 9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 9.008287] > > [ 9.008288] [E] up_write(mapping.invalidate_lock:0): > > [ 9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) > > [ 9.008291] --------------------------------------------------- > > [ 9.008291] context C's detail > > [ 9.008292] --------------------------------------------------- > > [ 9.008292] context C > > [ 9.008293] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008294] [W] down_write(mapping.invalidate_lock:0) > > [ 9.008295] [E] event(&(&journal->j_wait_commit)->dmap:0) > > [ 9.008296] > > [ 9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0): > > [ 9.008298] (N/A) > > [ 9.008298] > > [ 9.008299] [W] down_write(mapping.invalidate_lock:0): > > [ 9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > > [ 9.008302] stacktrace: > > [ 9.008302] down_write (kernel/locking/rwsem.c:1514) > > [ 9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) > > [ 9.008305] generic_perform_write (mm/filemap.c:3784) > > [ 9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 9.008309] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 9.008312] vfs_write (fs/read_write.c:590) > > [ 9.008314] ksys_write (fs/read_write.c:644) > > [ 9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 9.008318] > > [ 9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0): > > [ 9.008320] __wake_up_common (kernel/sched/wait.c:108) > > [ 9.008321] stacktrace: > > [ 9.008322] __wake_up_common (kernel/sched/wait.c:109) > > [ 9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > > [ 9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) > > [ 9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) > > [ 9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) > > [ 9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) > > [ 9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) > > [ 9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) > > [ 9.008334] generic_perform_write (mm/filemap.c:3784) > > [ 9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) > > [ 9.008337] ext4_file_write_iter (fs/ext4/file.c:677) > > [ 9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) > > [ 9.008341] vfs_write (fs/read_write.c:590) > > [ 9.008342] ksys_write (fs/read_write.c:644) > > [ 9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > > [ 9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > > [ 9.008347] --------------------------------------------------- > > [ 9.008348] information that might be helpful > > [ 9.008348] --------------------------------------------------- > > [ 9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G W 5.17.0-rc1-00015-gb94f67143867-dirty #2 > > [ 9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 9.008353] Call Trace: > > [ 9.008354] <TASK> > > [ 9.008355] dump_stack_lvl (lib/dump_stack.c:107) > > [ 9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) > > [ 9.008360] ? print_circle (kernel/dependency/dept.c:1086) > > [ 9.008362] cb_check_dl (kernel/dependency/dept.c:1104) > > [ 9.008364] bfs (kernel/dependency/dept.c:860) > > [ 9.008366] add_dep (kernel/dependency/dept.c:1423) > > [ 9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) > > [ 9.008370] ? __wake_up_common (kernel/sched/wait.c:108) > > [ 9.008372] dept_event (kernel/dependency/dept.c:2337) > > [ 9.008374] __wake_up_common (kernel/sched/wait.c:109) > > [ 9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) > > [ 9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) > > [ 9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) > > [ 9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) > > [ 9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) > > [ 9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) > > [ 9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > > [ 9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) > > [ 9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) > > [ 9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) > > [ 9.008400] ? commit_timeout (fs/jbd2/journal.c:173) > > [ 9.008402] kthread (kernel/kthread.c:377) > > [ 9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) > > [ 9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) > > [ 9.008410] </TASK> > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Tue, Feb 22, 2022 at 09:27:23AM +0100, Jan Kara wrote: > On Thu 17-02-22 20:10:03, Byungchul Park wrote: > > [ 7.009608] =================================================== > > [ 7.009613] DEPT: Circular dependency has been detected. > > [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W > > [ 7.009616] --------------------------------------------------- > > [ 7.009617] summary > > [ 7.009618] --------------------------------------------------- > > [ 7.009618] *** DEADLOCK *** > > [ 7.009618] > > [ 7.009619] context A > > [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > > [ 7.009621] [W] down_write(&ei->i_data_sem:0) > > [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) > > [ 7.009624] > > [ 7.009625] context B > > [ 7.009625] [S] down_read(&ei->i_data_sem:0) > > [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) > > [ 7.009627] [E] up_read(&ei->i_data_sem:0) > > [ 7.009628] > > Looking into this I have noticed that Dept here tracks bitlocks (buffer > locks in particular) but it apparently treats locks on all buffers as one > locking class so it conflates lock on superblock buffer with a lock on > extent tree block buffer. These are wastly different locks with different > locking constraints. So to avoid false positives in filesystems we will > need to add annotations to differentiate locks on different buffers (based > on what the block is used for). Similarly how we e.g. annotate i_rwsem for Exactly yes. All synchronization objects should be classfied by what it is used for. Even though it's already classified by the location of the code initializing the object - roughly and normally saying we can expect those have the same constraint, we are actually assigning different constraints according to the subtle design esp. in file systems. It would also help the code have better documentation ;-) I'm willing to add annotations for that to fs. > different inodes. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Tue, Feb 22, 2022 at 09:27:23AM +0100, Jan Kara wrote: > On Thu 17-02-22 20:10:03, Byungchul Park wrote: > > [ 7.009608] =================================================== > > [ 7.009613] DEPT: Circular dependency has been detected. > > [ 7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G W > > [ 7.009616] --------------------------------------------------- > > [ 7.009617] summary > > [ 7.009618] --------------------------------------------------- > > [ 7.009618] *** DEADLOCK *** > > [ 7.009618] > > [ 7.009619] context A > > [ 7.009619] [S] (unknown)(&(bit_wait_table + i)->dmap:0) > > [ 7.009621] [W] down_write(&ei->i_data_sem:0) > > [ 7.009623] [E] event(&(bit_wait_table + i)->dmap:0) > > [ 7.009624] > > [ 7.009625] context B > > [ 7.009625] [S] down_read(&ei->i_data_sem:0) > > [ 7.009626] [W] wait(&(bit_wait_table + i)->dmap:0) > > [ 7.009627] [E] up_read(&ei->i_data_sem:0) > > [ 7.009628] > > Looking into this I have noticed that Dept here tracks bitlocks (buffer > locks in particular) but it apparently treats locks on all buffers as one > locking class so it conflates lock on superblock buffer with a lock on > extent tree block buffer. These are wastly different locks with different > locking constraints. So to avoid false positives in filesystems we will > need to add annotations to differentiate locks on different buffers (based > on what the block is used for). Similarly how we e.g. annotate i_rwsem for > different inodes. Hi Jan Kara, I just understood why some guys in this space got mad at Dept reports. I barely got reports from the lock you mentioned with my system - precisely speaking only one, even though I've been rebooting my system many times. But another report that someone gave for me showed there were a lot of reports from the lock. Your comment and the report are so much helpful. I need to assign each's own class first for the buffer locks. Thank you very much. Thanks, Byungchul > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Wed 23-02-22 09:35:34, Byungchul Park wrote: > On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote: > > On Thu 17-02-22 20:10:04, Byungchul Park wrote: > > > [ 9.008161] =================================================== > > > [ 9.008163] DEPT: Circular dependency has been detected. > > > [ 9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G W > > > [ 9.008166] --------------------------------------------------- > > > [ 9.008167] summary > > > [ 9.008167] --------------------------------------------------- > > > [ 9.008168] *** DEADLOCK *** > > > [ 9.008168] > > > [ 9.008168] context A > > > [ 9.008169] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > > > [ 9.008171] [W] wait(&(&journal->j_wait_commit)->dmap:0) > > > [ 9.008172] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > > > [ 9.008173] > > > [ 9.008173] context B > > > [ 9.008174] [S] down_write(mapping.invalidate_lock:0) > > > [ 9.008175] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > > > [ 9.008176] [E] up_write(mapping.invalidate_lock:0) > > > [ 9.008177] > > > [ 9.008178] context C > > > [ 9.008179] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > > > [ 9.008180] [W] down_write(mapping.invalidate_lock:0) > > > [ 9.008181] [E] event(&(&journal->j_wait_commit)->dmap:0) > > > [ 9.008181] > > > [ 9.008182] [S]: start of the event context > > > [ 9.008183] [W]: the wait blocked > > > [ 9.008183] [E]: the event not reachable > > > > So what situation is your tool complaining about here? Can you perhaps show > > it here in more common visualization like: > > Sure. > > > TASK1 TASK2 > > does foo, grabs Z > > does X, grabs lock Y > > blocks on Z > > blocks on Y > > > > or something like that? Because I was not able to decipher this from the > > report even after trying for some time... > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > wait A > --- stuck > wait B > --- stuck > wait C > --- stuck > > wake up B wake up C wake up A > > where: > A is a wait_queue, j_wait_commit > B is a wait_queue, j_wait_transaction_locked > C is a rwsem, mapping.invalidate_lock I see. But a situation like this is not necessarily a guarantee of a deadlock, is it? I mean there can be task D that will eventually call say 'wake up B' and unblock everything and this is how things were designed to work? Multiple sources of wakeups are quite common I'd say... What does Dept do to prevent false reports in cases like this? > The above is the simplest form. And it's worth noting that Dept focuses > on wait and event itself rather than grabing and releasing things like > lock. The following is the more descriptive form of it. > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > wait @j_wait_commit > ext4_truncate_failed_write() > down_write(mapping.invalidate_lock) > > ext4_truncate() > ... > wait @j_wait_transaction_locked > > ext_truncate_failed_write() > down_write(mapping.invalidate_lock) > > ext4_should_retry_alloc() > ... > __jbd2_log_start_commit() > wake_up(j_wait_commit) > jbd2_journal_commit_transaction() > wake_up(j_wait_transaction_locked) > up_write(mapping.invalidate_lock) > > I hope this would help you understand the report. I see, thanks for explanation! So the above scenario is impossible because for anyone to block on @j_wait_transaction_locked the transaction must be committing, which is done only by kjournald2 kthread and so that thread cannot be waiting at @j_wait_commit. Essentially blocking on @j_wait_transaction_locked means @j_wait_commit wakeup was already done. I guess this shows there can be non-trivial dependencies between wait queues which are difficult to track in an automated way and without such tracking we are going to see false positives... Honza
On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote: > On Wed 23-02-22 09:35:34, Byungchul Park wrote: > > On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote: > > > On Thu 17-02-22 20:10:04, Byungchul Park wrote: > > > > [ 9.008161] =================================================== > > > > [ 9.008163] DEPT: Circular dependency has been detected. > > > > [ 9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G W > > > > [ 9.008166] --------------------------------------------------- > > > > [ 9.008167] summary > > > > [ 9.008167] --------------------------------------------------- > > > > [ 9.008168] *** DEADLOCK *** > > > > [ 9.008168] > > > > [ 9.008168] context A > > > > [ 9.008169] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0) > > > > [ 9.008171] [W] wait(&(&journal->j_wait_commit)->dmap:0) > > > > [ 9.008172] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0) > > > > [ 9.008173] > > > > [ 9.008173] context B > > > > [ 9.008174] [S] down_write(mapping.invalidate_lock:0) > > > > [ 9.008175] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0) > > > > [ 9.008176] [E] up_write(mapping.invalidate_lock:0) > > > > [ 9.008177] > > > > [ 9.008178] context C > > > > [ 9.008179] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0) > > > > [ 9.008180] [W] down_write(mapping.invalidate_lock:0) > > > > [ 9.008181] [E] event(&(&journal->j_wait_commit)->dmap:0) > > > > [ 9.008181] > > > > [ 9.008182] [S]: start of the event context > > > > [ 9.008183] [W]: the wait blocked > > > > [ 9.008183] [E]: the event not reachable > > > > > > So what situation is your tool complaining about here? Can you perhaps show > > > it here in more common visualization like: > > > > Sure. > > > > > TASK1 TASK2 > > > does foo, grabs Z > > > does X, grabs lock Y > > > blocks on Z > > > blocks on Y > > > > > > or something like that? Because I was not able to decipher this from the > > > report even after trying for some time... > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > wait A > > --- stuck > > wait B > > --- stuck > > wait C > > --- stuck > > > > wake up B wake up C wake up A > > > > where: > > A is a wait_queue, j_wait_commit > > B is a wait_queue, j_wait_transaction_locked > > C is a rwsem, mapping.invalidate_lock > > I see. But a situation like this is not necessarily a guarantee of a > deadlock, is it? I mean there can be task D that will eventually call say > 'wake up B' and unblock everything and this is how things were designed to > work? Multiple sources of wakeups are quite common I'd say... What does Yes. At the very beginning when I desgined Dept, I was thinking whether to support multiple wakeup sources or not for a quite long time. Supporting it would be a better option to aovid non-critical reports. However, I thought anyway we'd better fix it - not urgent tho - if there's any single circle dependency. That's why I decided not to support it for now and wanted to gather the kernel guys' opinions. Thing is which policy we should go with. > Dept do to prevent false reports in cases like this? > > > The above is the simplest form. And it's worth noting that Dept focuses > > on wait and event itself rather than grabing and releasing things like > > lock. The following is the more descriptive form of it. > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > wait @j_wait_commit > > ext4_truncate_failed_write() > > down_write(mapping.invalidate_lock) > > > > ext4_truncate() > > ... > > wait @j_wait_transaction_locked > > > > ext_truncate_failed_write() > > down_write(mapping.invalidate_lock) > > > > ext4_should_retry_alloc() > > ... > > __jbd2_log_start_commit() > > wake_up(j_wait_commit) > > jbd2_journal_commit_transaction() > > wake_up(j_wait_transaction_locked) > > up_write(mapping.invalidate_lock) > > > > I hope this would help you understand the report. > > I see, thanks for explanation! So the above scenario is impossible because My pleasure. > for anyone to block on @j_wait_transaction_locked the transaction must be > committing, which is done only by kjournald2 kthread and so that thread > cannot be waiting at @j_wait_commit. Essentially blocking on > @j_wait_transaction_locked means @j_wait_commit wakeup was already done. kjournal2 repeatedly does the wait and the wake_up so the above scenario looks possible to me even based on what you explained. Maybe I should understand how the journal things work more for furhter discussion. Your explanation is so helpful. Thank you really. Thanks, Byungchul > I guess this shows there can be non-trivial dependencies between wait > queues which are difficult to track in an automated way and without such > tracking we are going to see false positives... > > Honza > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Thu 24-02-22 10:11:02, Byungchul Park wrote: > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote: > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > > > wait A > > > --- stuck > > > wait B > > > --- stuck > > > wait C > > > --- stuck > > > > > > wake up B wake up C wake up A > > > > > > where: > > > A is a wait_queue, j_wait_commit > > > B is a wait_queue, j_wait_transaction_locked > > > C is a rwsem, mapping.invalidate_lock > > > > I see. But a situation like this is not necessarily a guarantee of a > > deadlock, is it? I mean there can be task D that will eventually call say > > 'wake up B' and unblock everything and this is how things were designed to > > work? Multiple sources of wakeups are quite common I'd say... What does > > Yes. At the very beginning when I desgined Dept, I was thinking whether > to support multiple wakeup sources or not for a quite long time. > Supporting it would be a better option to aovid non-critical reports. > However, I thought anyway we'd better fix it - not urgent tho - if > there's any single circle dependency. That's why I decided not to > support it for now and wanted to gather the kernel guys' opinions. Thing > is which policy we should go with. I see. So supporting only a single wakeup source is fine for locks I guess. But for general wait queues or other synchronization mechanisms, I'm afraid it will lead to quite some false positive reports. Just my 2c. > > Dept do to prevent false reports in cases like this? > > > > > The above is the simplest form. And it's worth noting that Dept focuses > > > on wait and event itself rather than grabing and releasing things like > > > lock. The following is the more descriptive form of it. > > > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > > > wait @j_wait_commit > > > ext4_truncate_failed_write() > > > down_write(mapping.invalidate_lock) > > > > > > ext4_truncate() > > > ... > > > wait @j_wait_transaction_locked > > > > > > ext_truncate_failed_write() > > > down_write(mapping.invalidate_lock) > > > > > > ext4_should_retry_alloc() > > > ... > > > __jbd2_log_start_commit() > > > wake_up(j_wait_commit) > > > jbd2_journal_commit_transaction() > > > wake_up(j_wait_transaction_locked) > > > up_write(mapping.invalidate_lock) > > > > > > I hope this would help you understand the report. > > > > I see, thanks for explanation! So the above scenario is impossible because > > My pleasure. > > > for anyone to block on @j_wait_transaction_locked the transaction must be > > committing, which is done only by kjournald2 kthread and so that thread > > cannot be waiting at @j_wait_commit. Essentially blocking on > > @j_wait_transaction_locked means @j_wait_commit wakeup was already done. > > kjournal2 repeatedly does the wait and the wake_up so the above scenario > looks possible to me even based on what you explained. Maybe I should > understand how the journal things work more for furhter discussion. Your > explanation is so helpful. Thank you really. OK, let me provide you with more details for better understanding :) In jbd2 we have an object called 'transaction'. This object can go through many states but for our case is important that transaction is moved to T_LOCKED state and out of it only while jbd2_journal_commit_transaction() function is executing and waiting on j_wait_transaction_locked waitqueue is exactly waiting for a transaction to get out of T_LOCKED state. Function jbd2_journal_commit_transaction() is executed only by kjournald. Hence anyone can see transaction in T_LOCKED state only if kjournald is running inside jbd2_journal_commit_transaction() and thus kjournald cannot be sleeping on j_wait_commit at the same time. Does this explain things? Honza
On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote: > On Thu 24-02-22 10:11:02, Byungchul Park wrote: > > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote: > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > > > > > wait A > > > > --- stuck > > > > wait B > > > > --- stuck > > > > wait C > > > > --- stuck > > > > > > > > wake up B wake up C wake up A > > > > > > > > where: > > > > A is a wait_queue, j_wait_commit > > > > B is a wait_queue, j_wait_transaction_locked > > > > C is a rwsem, mapping.invalidate_lock > > > > > > I see. But a situation like this is not necessarily a guarantee of a > > > deadlock, is it? I mean there can be task D that will eventually call say > > > 'wake up B' and unblock everything and this is how things were designed to > > > work? Multiple sources of wakeups are quite common I'd say... What does > > > > Yes. At the very beginning when I desgined Dept, I was thinking whether > > to support multiple wakeup sources or not for a quite long time. > > Supporting it would be a better option to aovid non-critical reports. > > However, I thought anyway we'd better fix it - not urgent tho - if > > there's any single circle dependency. That's why I decided not to > > support it for now and wanted to gather the kernel guys' opinions. Thing > > is which policy we should go with. > > I see. So supporting only a single wakeup source is fine for locks I guess. > But for general wait queues or other synchronization mechanisms, I'm afraid > it will lead to quite some false positive reports. Just my 2c. Thank you for your feedback. I realized we've been using "false positive" differently. There exist the three types of code in terms of dependency and deadlock. It's worth noting that dependencies are built from between waits and events in Dept. --- case 1. Code with an actual circular dependency, but not deadlock. A circular dependency can be broken by a rescue wakeup source e.g. timeout. It's not a deadlock. If it's okay that the contexts participating in the circular dependency and others waiting for the events in the circle are stuck until it gets broken. Otherwise, say, if it's not meant, then it's anyway problematic. 1-1. What if we judge this code is problematic? 1-2. What if we judge this code is good? case 2. Code with an actual circular dependency, and deadlock. There's no other wakeup source than those within the circular dependency. Literally deadlock. It's problematic and critical. 2-1. What if we judge this code is problematic? 2-2. What if we judge this code is good? case 3. Code with no actual circular dependency, and not deadlock. Must be good. 3-1. What if we judge this code is problematic? 3-2. What if we judge this code is good? --- I call only 3-1 "false positive" circular dependency. And you call 1-1 and 3-1 "false positive" deadlock. I've been wondering if the kernel guys esp. Linus considers code with any circular dependency is problematic or not, even if it won't lead to a deadlock, say, case 1. Even though I designed Dept based on what I believe is right, of course, I'm willing to change the design according to the majority opinion. However, I would never allow case 1 if I were the owner of the kernel for better stability, even though the code works anyway okay for now. Thanks, Byungchul > > > Dept do to prevent false reports in cases like this? > > > > > > > The above is the simplest form. And it's worth noting that Dept focuses > > > > on wait and event itself rather than grabing and releasing things like > > > > lock. The following is the more descriptive form of it. > > > > > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > > > > > wait @j_wait_commit > > > > ext4_truncate_failed_write() > > > > down_write(mapping.invalidate_lock) > > > > > > > > ext4_truncate() > > > > ... > > > > wait @j_wait_transaction_locked > > > > > > > > ext_truncate_failed_write() > > > > down_write(mapping.invalidate_lock) > > > > > > > > ext4_should_retry_alloc() > > > > ... > > > > __jbd2_log_start_commit() > > > > wake_up(j_wait_commit) > > > > jbd2_journal_commit_transaction() > > > > wake_up(j_wait_transaction_locked) > > > > up_write(mapping.invalidate_lock) > > > > > > > > I hope this would help you understand the report. > > > > > > I see, thanks for explanation! So the above scenario is impossible because > > > > My pleasure. > > > > > for anyone to block on @j_wait_transaction_locked the transaction must be > > > committing, which is done only by kjournald2 kthread and so that thread > > > cannot be waiting at @j_wait_commit. Essentially blocking on > > > @j_wait_transaction_locked means @j_wait_commit wakeup was already done. > > > > kjournal2 repeatedly does the wait and the wake_up so the above scenario > > looks possible to me even based on what you explained. Maybe I should > > understand how the journal things work more for furhter discussion. Your > > explanation is so helpful. Thank you really. > > OK, let me provide you with more details for better understanding :) In > jbd2 we have an object called 'transaction'. This object can go through > many states but for our case is important that transaction is moved to > T_LOCKED state and out of it only while jbd2_journal_commit_transaction() > function is executing and waiting on j_wait_transaction_locked waitqueue is > exactly waiting for a transaction to get out of T_LOCKED state. Function > jbd2_journal_commit_transaction() is executed only by kjournald. Hence > anyone can see transaction in T_LOCKED state only if kjournald is running > inside jbd2_journal_commit_transaction() and thus kjournald cannot be > sleeping on j_wait_commit at the same time. Does this explain things? > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Mon 28-02-22 18:28:26, Byungchul Park wrote: > On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote: > > On Thu 24-02-22 10:11:02, Byungchul Park wrote: > > > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote: > > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write) > > > > > > > > > > wait A > > > > > --- stuck > > > > > wait B > > > > > --- stuck > > > > > wait C > > > > > --- stuck > > > > > > > > > > wake up B wake up C wake up A > > > > > > > > > > where: > > > > > A is a wait_queue, j_wait_commit > > > > > B is a wait_queue, j_wait_transaction_locked > > > > > C is a rwsem, mapping.invalidate_lock > > > > > > > > I see. But a situation like this is not necessarily a guarantee of a > > > > deadlock, is it? I mean there can be task D that will eventually call say > > > > 'wake up B' and unblock everything and this is how things were designed to > > > > work? Multiple sources of wakeups are quite common I'd say... What does > > > > > > Yes. At the very beginning when I desgined Dept, I was thinking whether > > > to support multiple wakeup sources or not for a quite long time. > > > Supporting it would be a better option to aovid non-critical reports. > > > However, I thought anyway we'd better fix it - not urgent tho - if > > > there's any single circle dependency. That's why I decided not to > > > support it for now and wanted to gather the kernel guys' opinions. Thing > > > is which policy we should go with. > > > > I see. So supporting only a single wakeup source is fine for locks I guess. > > But for general wait queues or other synchronization mechanisms, I'm afraid > > it will lead to quite some false positive reports. Just my 2c. > > Thank you for your feedback. > > I realized we've been using "false positive" differently. There exist > the three types of code in terms of dependency and deadlock. It's worth > noting that dependencies are built from between waits and events in Dept. > > --- > > case 1. Code with an actual circular dependency, but not deadlock. > > A circular dependency can be broken by a rescue wakeup source e.g. > timeout. It's not a deadlock. If it's okay that the contexts > participating in the circular dependency and others waiting for the > events in the circle are stuck until it gets broken. Otherwise, say, > if it's not meant, then it's anyway problematic. > > 1-1. What if we judge this code is problematic? > 1-2. What if we judge this code is good? > > case 2. Code with an actual circular dependency, and deadlock. > > There's no other wakeup source than those within the circular > dependency. Literally deadlock. It's problematic and critical. > > 2-1. What if we judge this code is problematic? > 2-2. What if we judge this code is good? > > case 3. Code with no actual circular dependency, and not deadlock. > > Must be good. > > 3-1. What if we judge this code is problematic? > 3-2. What if we judge this code is good? > > --- > > I call only 3-1 "false positive" circular dependency. And you call 1-1 > and 3-1 "false positive" deadlock. > > I've been wondering if the kernel guys esp. Linus considers code with > any circular dependency is problematic or not, even if it won't lead to > a deadlock, say, case 1. Even though I designed Dept based on what I > believe is right, of course, I'm willing to change the design according > to the majority opinion. > > However, I would never allow case 1 if I were the owner of the kernel > for better stability, even though the code works anyway okay for now. So yes, I call a report for the situation "There is circular dependency but deadlock is not possible." a false positive. And that is because in my opinion your definition of circular dependency includes schemes that are useful and used in the kernel. Your example in case 1 is kind of borderline (I personally would consider that bug as well) but there are other more valid schemes with multiple wakeup sources like: We have a queue of work to do Q protected by lock L. Consumer process has code like: while (1) { lock L prepare_to_wait(work_queued); if (no work) { unlock L sleep } else { unlock L do work wake_up(work_done) } } AFAIU Dept will create dependency here that 'wakeup work_done' is after 'wait for work_queued'. Producer has code like: while (1) { lock L prepare_to_wait(work_done) if (too much work queued) { unlock L sleep } else { queue work unlock L wake_up(work_queued) } } And Dept will create dependency here that 'wakeup work_queued' is after 'wait for work_done'. And thus we have a trivial cycle in the dependencies despite the code being perfectly valid and safe. Honza
On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > case 1. Code with an actual circular dependency, but not deadlock. > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > timeout. It's not a deadlock. If it's okay that the contexts > > participating in the circular dependency and others waiting for the > > events in the circle are stuck until it gets broken. Otherwise, say, > > if it's not meant, then it's anyway problematic. > > > > 1-1. What if we judge this code is problematic? > > 1-2. What if we judge this code is good? > > > > I've been wondering if the kernel guys esp. Linus considers code with > > any circular dependency is problematic or not, even if it won't lead to > > a deadlock, say, case 1. Even though I designed Dept based on what I > > believe is right, of course, I'm willing to change the design according > > to the majority opinion. > > > > However, I would never allow case 1 if I were the owner of the kernel > > for better stability, even though the code works anyway okay for now. Note, I used the example of the timeout as the most obvious way of explaining that a deadlock is not possible. There is also the much more complex explanation which Jan was trying to give, which is what leads to the circular dependency. It can happen that when trying to start a handle, if either (a) there is not enough space in the journal for new handles, or (b) the current transaction is so large that if we don't close the transaction and start a new hone, we will end up running out of space in the future, and so in that case, start_this_handle() will block starting any more handles, and then wake up the commit thread. The commit thread then waits for the currently running threads to complete, before it allows new handles to start, and then it will complete the commit. In the case of (a) we then need to do a journal checkpoint, which is more work to release space in the journal, and only then, can we allow new handles to start. The botom line is (a) it works, (b) there aren't significant delays, and for DEPT to complain that this is somehow wrong and we need to completely rearchitect perfectly working code because it doesn't confirm to DEPT's idea of what is "correct" is not acceptable. > We have a queue of work to do Q protected by lock L. Consumer process has > code like: > > while (1) { > lock L > prepare_to_wait(work_queued); > if (no work) { > unlock L > sleep > } else { > unlock L > do work > wake_up(work_done) > } > } > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > 'wait for work_queued'. Producer has code like: > > while (1) { > lock L > prepare_to_wait(work_done) > if (too much work queued) { > unlock L > sleep > } else { > queue work > unlock L > wake_up(work_queued) > } > } > > And Dept will create dependency here that 'wakeup work_queued' is after > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > despite the code being perfectly valid and safe. Cheers, - Ted
On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > On Mon 28-02-22 18:28:26, Byungchul Park wrote: > > case 1. Code with an actual circular dependency, but not deadlock. > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > timeout. It's not a deadlock. If it's okay that the contexts > > participating in the circular dependency and others waiting for the > > events in the circle are stuck until it gets broken. Otherwise, say, > > if it's not meant, then it's anyway problematic. > > > > 1-1. What if we judge this code is problematic? > > 1-2. What if we judge this code is good? > > > > case 2. Code with an actual circular dependency, and deadlock. > > > > There's no other wakeup source than those within the circular > > dependency. Literally deadlock. It's problematic and critical. > > > > 2-1. What if we judge this code is problematic? > > 2-2. What if we judge this code is good? > > > > case 3. Code with no actual circular dependency, and not deadlock. > > > > Must be good. > > > > 3-1. What if we judge this code is problematic? > > 3-2. What if we judge this code is good? > > > > --- > > > > I call only 3-1 "false positive" circular dependency. And you call 1-1 > > and 3-1 "false positive" deadlock. > > > > I've been wondering if the kernel guys esp. Linus considers code with > > any circular dependency is problematic or not, even if it won't lead to > > a deadlock, say, case 1. Even though I designed Dept based on what I > > believe is right, of course, I'm willing to change the design according > > to the majority opinion. > > > > However, I would never allow case 1 if I were the owner of the kernel > > for better stability, even though the code works anyway okay for now. > > So yes, I call a report for the situation "There is circular dependency but > deadlock is not possible." a false positive. And that is because in my > opinion your definition of circular dependency includes schemes that are > useful and used in the kernel. > > Your example in case 1 is kind of borderline (I personally would consider > that bug as well) but there are other more valid schemes with multiple > wakeup sources like: > > We have a queue of work to do Q protected by lock L. Consumer process has > code like: > > while (1) { > lock L > prepare_to_wait(work_queued); > if (no work) { > unlock L > sleep > } else { > unlock L > do work > wake_up(work_done) > } > } > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > 'wait for work_queued'. Producer has code like: First of all, thank you for this good example. > while (1) { > lock L > prepare_to_wait(work_done) > if (too much work queued) { > unlock L > sleep > } else { > queue work > unlock L > wake_up(work_queued) > } > } > > And Dept will create dependency here that 'wakeup work_queued' is after > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > despite the code being perfectly valid and safe. Unfortunately, it's neither perfect nor safe without another wakeup source - rescue wakeup source. consumer producer lock L (too much work queued == true) unlock L --- preempted lock L unlock L do work lock L unlock L do work ... (no work == true) sleep --- scheduled in sleep This code leads a deadlock without another wakeup source, say, not safe. But yes. I also think this code should be allowed if it anyway runs alongside another wakeup source. For the case, Dept should track the rescue wakeup source instead that leads a actual deadlock. I will correct code to make Dept track its rescue wakeup source whenever finding the case. Lastly, just for your information, I need to explain how Dept works a little more for you not to misunderstand Dept. Assuming the consumer and producer guarantee not to lead a deadlock like the following, Dept won't report it a problem: consumer producer sleep wakeup work_done queue work sleep wakeup work_queued do work sleep wakeup work_done queue work sleep wakeup work_queued do work sleep ... ... Dept does not consider all waits preceeding an event but only waits that might lead a deadlock. In this case, Dept works with each region independently. consumer producer sleep <- initiates region 1 --- region 1 starts ... ... --- region 1 ends wakeup work_done ... ... queue work ... ... sleep <- initiates region 2 --- region 2 starts ... ... --- region 2 ends wakeup work_queued ... ... do work ... ... sleep <- initiates region 3 --- region 3 starts ... ... --- region 3 ends wakeup work_done ... ... queue work ... ... sleep <- initiates region 4 --- region 4 starts ... ... --- region 4 ends wakeup work_queued ... ... do work ... ... That is, Dept does not build dependencies across different regions. So you don't have to worry about unreasonable false positives that much. Thoughts? Thanks, Byungchul > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote: > On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > > case 1. Code with an actual circular dependency, but not deadlock. > > > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > > timeout. It's not a deadlock. If it's okay that the contexts > > > participating in the circular dependency and others waiting for the > > > events in the circle are stuck until it gets broken. Otherwise, say, > > > if it's not meant, then it's anyway problematic. > > > > > > 1-1. What if we judge this code is problematic? > > > 1-2. What if we judge this code is good? > > > > > > I've been wondering if the kernel guys esp. Linus considers code with > > > any circular dependency is problematic or not, even if it won't lead to > > > a deadlock, say, case 1. Even though I designed Dept based on what I > > > believe is right, of course, I'm willing to change the design according > > > to the majority opinion. > > > > > > However, I would never allow case 1 if I were the owner of the kernel > > > for better stability, even though the code works anyway okay for now. > > Note, I used the example of the timeout as the most obvious way of > explaining that a deadlock is not possible. There is also the much > more complex explanation which Jan was trying to give, which is what > leads to the circular dependency. It can happen that when trying to > start a handle, if either (a) there is not enough space in the journal > for new handles, or (b) the current transaction is so large that if we > don't close the transaction and start a new hone, we will end up > running out of space in the future, and so in that case, > start_this_handle() will block starting any more handles, and then > wake up the commit thread. The commit thread then waits for the > currently running threads to complete, before it allows new handles to > start, and then it will complete the commit. In the case of (a) we > then need to do a journal checkpoint, which is more work to release > space in the journal, and only then, can we allow new handles to start. Thank you for the full explanation of how journal things work. > The botom line is (a) it works, (b) there aren't significant delays, > and for DEPT to complain that this is somehow wrong and we need to > completely rearchitect perfectly working code because it doesn't > confirm to DEPT's idea of what is "correct" is not acceptable. Thanks to you and Jan Kara, I realized it's not a real dependency in the consumer and producer scenario but again *ONLY IF* there is a rescue wakeup source. Dept should track the rescue wakeup source instead in the case. I won't ask you to rearchitect the working code. The code looks sane. Thanks a lot. Thanks, Byungchul > > We have a queue of work to do Q protected by lock L. Consumer process has > > code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_queued); > > if (no work) { > > unlock L > > sleep > > } else { > > unlock L > > do work > > wake_up(work_done) > > } > > } > > > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > > 'wait for work_queued'. Producer has code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_done) > > if (too much work queued) { > > unlock L > > sleep > > } else { > > queue work > > unlock L > > wake_up(work_queued) > > } > > } > > > > And Dept will create dependency here that 'wakeup work_queued' is after > > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > > despite the code being perfectly valid and safe. > > Cheers, > > - Ted
On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote: > > Unfortunately, it's neither perfect nor safe without another wakeup > source - rescue wakeup source. > > consumer producer > > lock L > (too much work queued == true) > unlock L > --- preempted > lock L > unlock L > do work > lock L > unlock L > do work > ... > (no work == true) > sleep > --- scheduled in > sleep That's not how things work in ext4. It's **way** more complicated than that. We have multiple wait channels, one wake up the consumer (read: the commit thread), and one which wakes up any processes waiting for commit thread to have made forward progress. We also have two spin-lock protected sequence number, one which indicates the current commited transaction #, and one indicating the transaction # that needs to be committed. On the commit thread, it will sleep on j_wait_commit, and when it is woken up, it will check to see if there is work to be done (j_commit_sequence != j_commit_request), and if so, do the work, and then wake up processes waiting on the wait_queue j_wait_done_commit. (Again, all of this uses the pattern, "prepare to wait", then check to see if we should sleep, if we do need to sleep, unlock j_state_lock, then sleep. So this prevents any races leading to lost wakeups. On the start_this_handle() thread, if we current transaction is too full, we set j_commit_request to its transaction id to indicate that we want the current transaction to be committed, and then we wake up the j_wait_commit wait queue and then we enter a loop where do a prepare_to_wait in j_wait_done_commit, check to see if j_commit_sequence == the transaction id that we want to be completed, and if it's not done yet, we unlock the j_state_lock spinlock, and go to sleep. Again, because of the prepare_to_wait, there is no chance of a lost wakeup. So there really is no "consumer" and "producer" here. If you really insist on using this model, which really doesn't apply, for one thread, it's the consumer with respect to one wait queue, and the producer with respect to the *other* wait queue. For the other thread, the consumer and producer roles are reversed. And of course, this is a highly simplified model, since we also have a wait queue used by the commit thread to wait for the number of active handles on a particular transaction to go to zero, and stop_this_handle() will wake up commit thread via this wait queue when the last active handle on a particular transaction is retired. (And yes, that parameter is also protected by a different spin lock which is per-transaction). So it seems to me that a fundamental flaw in DEPT's model is assuming that the only waiting paradigm that can be used is consumer/producer, and that's simply not true. The fact that you use the term "lock" is also going to lead a misleading line of reasoning, because properly speaking, they aren't really locks. We are simply using wait channels to wake up processes as necessary, and then they will check other variables to decide whether or not they need to sleep or not, and we have an invariant that when these variables change indicating forward progress, the associated wait channel will be woken up. Cheers, - Ted P.S. This model is also highly simplified since there are other reasons why the commit thread can be woken up, some which might be via a timeout, and some which is via the j_wait_commit wait channel but not because j_commit_request has been changed, but because file system is being unmounted, or the file system is being frozen in preparation of a snapshot, etc. These are *not* necessary to prevent a deadlock, because under normal circumstances the two wake channels are sufficient of themselves. So please don't think of them as "rescue wakeup sources"; again, that's highly misleading and the wrong way to think of them. And to make things even more complicated, we have more than 2 wait channel --- we have *five*: /** * @j_wait_transaction_locked: * * Wait queue for waiting for a locked transaction to start committing, * or for a barrier lock to be released. */ wait_queue_head_t j_wait_transaction_locked; /** * @j_wait_done_commit: Wait queue for waiting for commit to complete. */ wait_queue_head_t j_wait_done_commit; /** * @j_wait_commit: Wait queue to trigger commit. */ wait_queue_head_t j_wait_commit; /** * @j_wait_updates: Wait queue to wait for updates to complete. */ wait_queue_head_t j_wait_updates; /** * @j_wait_reserved: * * Wait queue to wait for reserved buffer credits to drop. */ wait_queue_head_t j_wait_reserved; /** * @j_fc_wait: * * Wait queue to wait for completion of async fast commits. */ wait_queue_head_t j_fc_wait; "There are more things in heaven and Earth, Horatio, Than are dreamt of in your philosophy." - William Shakespeare, Hamlet
Ted wrote: > On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote: > > > > Unfortunately, it's neither perfect nor safe without another wakeup > > source - rescue wakeup source. > > > > consumer producer > > > > lock L > > (too much work queued == true) > > unlock L > > --- preempted > > lock L > > unlock L > > do work > > lock L > > unlock L > > do work > > ... > > (no work == true) > > sleep > > --- scheduled in > > sleep > > That's not how things work in ext4. It's **way** more complicated You seem to get it wrong. This example is what Jan Kara gave me. I just tried to explain things based on Jan Kara's example so leaving all statements that Jan Kara wrote. Plus the example was so helpful. Thanks, Jan Kara. > than that. We have multiple wait channels, one wake up the consumer > (read: the commit thread), and one which wakes up any processes > waiting for commit thread to have made forward progress. We also have > two spin-lock protected sequence number, one which indicates the > current commited transaction #, and one indicating the transaction # > that needs to be committed. > > On the commit thread, it will sleep on j_wait_commit, and when it is > woken up, it will check to see if there is work to be done > (j_commit_sequence != j_commit_request), and if so, do the work, and > then wake up processes waiting on the wait_queue j_wait_done_commit. > (Again, all of this uses the pattern, "prepare to wait", then check to > see if we should sleep, if we do need to sleep, unlock j_state_lock, > then sleep. So this prevents any races leading to lost wakeups. > > On the start_this_handle() thread, if we current transaction is too > full, we set j_commit_request to its transaction id to indicate that > we want the current transaction to be committed, and then we wake up > the j_wait_commit wait queue and then we enter a loop where do a > prepare_to_wait in j_wait_done_commit, check to see if > j_commit_sequence == the transaction id that we want to be completed, > and if it's not done yet, we unlock the j_state_lock spinlock, and go > to sleep. Again, because of the prepare_to_wait, there is no chance > of a lost wakeup. The above explantion gives me a clear view about synchronization of journal things. I appreciate it. > So there really is no "consumer" and "producer" here. If you really > insist on using this model, which really doesn't apply, for one Dept does not assume "consumer" and "producer" model at all, but Dept works with general waits and events. *That model is just one of them.* > thread, it's the consumer with respect to one wait queue, and the > producer with respect to the *other* wait queue. For the other > thread, the consumer and producer roles are reversed. > > And of course, this is a highly simplified model, since we also have a > wait queue used by the commit thread to wait for the number of active > handles on a particular transaction to go to zero, and > stop_this_handle() will wake up commit thread via this wait queue when > the last active handle on a particular transaction is retired. (And > yes, that parameter is also protected by a different spin lock which > is per-transaction). This one also gives me a clear view. Thanks a lot. > So it seems to me that a fundamental flaw in DEPT's model is assuming > that the only waiting paradigm that can be used is consumer/producer, No, Dept does not. > and that's simply not true. The fact that you use the term "lock" is > also going to lead a misleading line of reasoning, because properly "lock/unlock L" comes from the Jan Kara's example. It has almost nothing to do with the explanation. I just left "lock/unlock L" as a statement that comes from the Jan Kara's example. > speaking, they aren't really locks. We are simply using wait channels I totally agree with you. *They aren't really locks but it's just waits and wakeups.* That's exactly why I decided to develop Dept. Dept is not interested in locks unlike Lockdep, but fouces on waits and wakeup sources itself. I think you get Dept wrong a lot. Please ask me more if you have things you doubt about Dept. Thanks, Byungchul
On Thu 03-03-22 10:00:33, Byungchul Park wrote: > On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > On Mon 28-02-22 18:28:26, Byungchul Park wrote: > > > case 1. Code with an actual circular dependency, but not deadlock. > > > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > > timeout. It's not a deadlock. If it's okay that the contexts > > > participating in the circular dependency and others waiting for the > > > events in the circle are stuck until it gets broken. Otherwise, say, > > > if it's not meant, then it's anyway problematic. > > > > > > 1-1. What if we judge this code is problematic? > > > 1-2. What if we judge this code is good? > > > > > > case 2. Code with an actual circular dependency, and deadlock. > > > > > > There's no other wakeup source than those within the circular > > > dependency. Literally deadlock. It's problematic and critical. > > > > > > 2-1. What if we judge this code is problematic? > > > 2-2. What if we judge this code is good? > > > > > > case 3. Code with no actual circular dependency, and not deadlock. > > > > > > Must be good. > > > > > > 3-1. What if we judge this code is problematic? > > > 3-2. What if we judge this code is good? > > > > > > --- > > > > > > I call only 3-1 "false positive" circular dependency. And you call 1-1 > > > and 3-1 "false positive" deadlock. > > > > > > I've been wondering if the kernel guys esp. Linus considers code with > > > any circular dependency is problematic or not, even if it won't lead to > > > a deadlock, say, case 1. Even though I designed Dept based on what I > > > believe is right, of course, I'm willing to change the design according > > > to the majority opinion. > > > > > > However, I would never allow case 1 if I were the owner of the kernel > > > for better stability, even though the code works anyway okay for now. > > > > So yes, I call a report for the situation "There is circular dependency but > > deadlock is not possible." a false positive. And that is because in my > > opinion your definition of circular dependency includes schemes that are > > useful and used in the kernel. > > > > Your example in case 1 is kind of borderline (I personally would consider > > that bug as well) but there are other more valid schemes with multiple > > wakeup sources like: > > > > We have a queue of work to do Q protected by lock L. Consumer process has > > code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_queued); > > if (no work) { > > unlock L > > sleep > > } else { > > unlock L > > do work > > wake_up(work_done) > > } > > } > > > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > > 'wait for work_queued'. Producer has code like: > > First of all, thank you for this good example. > > > while (1) { > > lock L > > prepare_to_wait(work_done) > > if (too much work queued) { > > unlock L > > sleep > > } else { > > queue work > > unlock L > > wake_up(work_queued) > > } > > } > > > > And Dept will create dependency here that 'wakeup work_queued' is after > > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > > despite the code being perfectly valid and safe. > > Unfortunately, it's neither perfect nor safe without another wakeup > source - rescue wakeup source. > > consumer producer > > lock L > (too much work queued == true) > unlock L > --- preempted > lock L > unlock L > do work > lock L > unlock L > do work > ... > (no work == true) > sleep > --- scheduled in > sleep > > This code leads a deadlock without another wakeup source, say, not safe. So the scenario you describe above is indeed possible. But the trick is that the wakeup from 'consumer' as is doing work will remove 'producer' from the wait queue and change the 'producer' process state to 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the scheduler will just treat this as another preemption point and the 'producer' will immediately or soon continue to run. So indeed we can think of this as "another wakeup source" but the source is in the CPU scheduler itself. This is the standard way how waitqueues are used in the kernel... > Lastly, just for your information, I need to explain how Dept works a > little more for you not to misunderstand Dept. > > Assuming the consumer and producer guarantee not to lead a deadlock like > the following, Dept won't report it a problem: > > consumer producer > > sleep > wakeup work_done > queue work > sleep > wakeup work_queued > do work > sleep > wakeup work_done > queue work > sleep > wakeup work_queued > do work > sleep > ... ... > > Dept does not consider all waits preceeding an event but only waits that > might lead a deadlock. In this case, Dept works with each region > independently. > > consumer producer > > sleep <- initiates region 1 > --- region 1 starts > ... ... > --- region 1 ends > wakeup work_done > ... ... > queue work > ... ... > sleep <- initiates region 2 > --- region 2 starts > ... ... > --- region 2 ends > wakeup work_queued > ... ... > do work > ... ... > sleep <- initiates region 3 > --- region 3 starts > ... ... > --- region 3 ends > wakeup work_done > ... ... > queue work > ... ... > sleep <- initiates region 4 > --- region 4 starts > ... ... > --- region 4 ends > wakeup work_queued > ... ... > do work > ... ... > > That is, Dept does not build dependencies across different regions. So > you don't have to worry about unreasonable false positives that much. > > Thoughts? Thanks for explanation! And what exactly defines the 'regions'? When some process goes to sleep on some waitqueue, this defines a start of a region at the place where all the other processes are at that moment and wakeup of the waitqueue is an end of the region? Honza
On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote: > I totally agree with you. *They aren't really locks but it's just waits > and wakeups.* That's exactly why I decided to develop Dept. Dept is not > interested in locks unlike Lockdep, but fouces on waits and wakeup > sources itself. I think you get Dept wrong a lot. Please ask me more if > you have things you doubt about Dept. So the question is this --- do you now understand why, even though there is a circular dependency, nothing gets stalled in the interactions between the two wait channels? - Ted
On Thu, Mar 03, 2022 at 09:36:25AM -0500, Theodore Ts'o wrote: > On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote: > > I totally agree with you. *They aren't really locks but it's just waits > > and wakeups.* That's exactly why I decided to develop Dept. Dept is not > > interested in locks unlike Lockdep, but fouces on waits and wakeup > > sources itself. I think you get Dept wrong a lot. Please ask me more if > > you have things you doubt about Dept. > > So the question is this --- do you now understand why, even though > there is a circular dependency, nothing gets stalled in the > interactions between the two wait channels? ??? I'm afraid I don't get you. All contexts waiting for any of the events in the circular dependency chain will be definitely stuck if there is a circular dependency as I explained. So we need another wakeup source to break the circle. In ext4 code, you might have the wakeup source for breaking the circle. What I agreed with is: The case that 1) the circular dependency is unevitable 2) there are another wakeup source for breadking the circle and 3) the duration in sleep is short enough, should be acceptable. Sounds good? Thanks, Byungchul
On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote: > On Thu 03-03-22 10:00:33, Byungchul Park wrote: > > Unfortunately, it's neither perfect nor safe without another wakeup > > source - rescue wakeup source. > > > > consumer producer > > > > lock L > > (too much work queued == true) > > unlock L > > --- preempted > > lock L > > unlock L > > do work > > lock L > > unlock L > > do work > > ... > > (no work == true) > > sleep > > --- scheduled in > > sleep > > > > This code leads a deadlock without another wakeup source, say, not safe. > > So the scenario you describe above is indeed possible. But the trick is > that the wakeup from 'consumer' as is doing work will remove 'producer' > from the wait queue and change the 'producer' process state to > 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the > scheduler will just treat this as another preemption point and the > 'producer' will immediately or soon continue to run. So indeed we can think > of this as "another wakeup source" but the source is in the CPU scheduler > itself. This is the standard way how waitqueues are used in the kernel... Nice! Thanks for the explanation. I will take it into account if needed. > > Lastly, just for your information, I need to explain how Dept works a > > little more for you not to misunderstand Dept. > > > > Assuming the consumer and producer guarantee not to lead a deadlock like > > the following, Dept won't report it a problem: > > > > consumer producer > > > > sleep > > wakeup work_done > > queue work > > sleep > > wakeup work_queued > > do work > > sleep > > wakeup work_done > > queue work > > sleep > > wakeup work_queued > > do work > > sleep > > ... ... > > > > Dept does not consider all waits preceeding an event but only waits that > > might lead a deadlock. In this case, Dept works with each region > > independently. > > > > consumer producer > > > > sleep <- initiates region 1 > > --- region 1 starts > > ... ... > > --- region 1 ends > > wakeup work_done > > ... ... > > queue work > > ... ... > > sleep <- initiates region 2 > > --- region 2 starts > > ... ... > > --- region 2 ends > > wakeup work_queued > > ... ... > > do work > > ... ... > > sleep <- initiates region 3 > > --- region 3 starts > > ... ... > > --- region 3 ends > > wakeup work_done > > ... ... > > queue work > > ... ... > > sleep <- initiates region 4 > > --- region 4 starts > > ... ... > > --- region 4 ends > > wakeup work_queued > > ... ... > > do work > > ... ... > > > > That is, Dept does not build dependencies across different regions. So > > you don't have to worry about unreasonable false positives that much. > > > > Thoughts? > > Thanks for explanation! And what exactly defines the 'regions'? When some > process goes to sleep on some waitqueue, this defines a start of a region > at the place where all the other processes are at that moment and wakeup of > the waitqueue is an end of the region? Yes. Let me explain it more for better understanding. (I copied it from the talk I did with Matthew..) ideal view ----------- context X context Y request event E ... write REQUESTEVENT when (notice REQUESTEVENT written) ... notice the request from X [S] --- ideally region 1 starts here wait for the event ... sleep if (can see REQUESTEVENT written) it's on the way to the event ... ... --- ideally region 1 ends here finally the event [E] Dept basically works with the above view with regard to wait and event. But it's very hard to identify the ideal [S] point in practice. So Dept instead identifies [S] point by checking WAITSTART with memory barriers like the following, which would make Dept work conservatively. Dept's view ------------ context X context Y request event E ... write REQUESTEVENT when (notice REQUESTEVENT written) ... notice the request from X --- region 2 Dept gives up starts wait for the event ... write barrier write WAITSTART read barrier sleep when (notice WAITSTART written) ensure the request has come [S] --- region 2 Dept gives up ends --- region 3 starts here ... if (can see WAITSTART written) it's on the way to the event ... ... --- region 3 ends here finally the event [E] In short, Dept works with region 3. Thanks, Byungchul
On Thu, Mar 03, 2022 at 09:36:25AM -0500, Theodore Ts'o wrote: > On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote: > > I totally agree with you. *They aren't really locks but it's just waits > > and wakeups.* That's exactly why I decided to develop Dept. Dept is not > > interested in locks unlike Lockdep, but fouces on waits and wakeup > > sources itself. I think you get Dept wrong a lot. Please ask me more if > > you have things you doubt about Dept. > > So the question is this --- do you now understand why, even though > there is a circular dependency, nothing gets stalled in the > interactions between the two wait channels? I found a point that the two wait channels don't lead a deadlock in some cases thanks to Jan Kara. I will fix it so that Dept won't complain it. Thanks, Byungchul > > - Ted
On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote: > > All contexts waiting for any of the events in the circular dependency > chain will be definitely stuck if there is a circular dependency as I > explained. So we need another wakeup source to break the circle. In > ext4 code, you might have the wakeup source for breaking the circle. > > What I agreed with is: > > The case that 1) the circular dependency is unevitable 2) there are > another wakeup source for breadking the circle and 3) the duration > in sleep is short enough, should be acceptable. > > Sounds good? These dependencies are part of every single ext4 metadata update, and if there were any unnecessary sleeps, this would be a major performance gap, and this is a very well studied part of ext4. There are some places where we sleep, sure. In some case start_this_handle() needs to wait for a commit to complete, and the commit thread might need to sleep for I/O to complete. But the moment the thing that we're waiting for is complete, we wake up all of the processes on the wait queue. But in the case where we wait for I/O complete, that wakeupis coming from the device driver, when it receives the the I/O completion interrupt from the hard drive. Is that considered an "external source"? Maybe DEPT doesn't recognize that this is certain to happen just as day follows the night? (Well, maybe the I/O completion interrupt might not happen if the disk drive bursts into flames --- but then, you've got bigger problems. :-) In any case, if DEPT is going to report these "circular dependencies as bugs that MUST be fixed", it's going to be pure noise and I will ignore all DEPT reports, and will push back on having Lockdep replaced by DEPT --- because Lockdep give us actionable reports, and if DEPT can't tell the difference between a valid programming pattern and a bug, then it's worse than useless. Sounds good? - Ted
On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote: > > I found a point that the two wait channels don't lead a deadlock in > some cases thanks to Jan Kara. I will fix it so that Dept won't > complain it. I sent my last (admittedly cranky) message before you sent this. I'm glad you finally understood Jan's explanation. I was trying to tell you the same thing, but apparently I failed to communicate in a sufficiently clear manner. In any case, what Jan described is a fundamental part of how wait queues work, and I'm kind of amazed that you were able to implement DEPT without understanding it. (But maybe that is why some of the DEPT reports were completely incomprehensible to me; I couldn't interpret why in the world DEPT was saying there was a problem.) In any case, the thing I would ask is a little humility. We regularly use lockdep, and we run a huge number of stress tests, throughout each development cycle. So if DEPT is issuing lots of reports about apparently circular dependencies, please try to be open to the thought that the fault is in DEPT, and don't try to argue with maintainers that their code MUST be buggy --- but since you don't understand our code, and DEPT must be theoretically perfect, that it is up to the Maintainers to prove to you that their code is correct. I am going to gently suggest that it is at least as likely, if not more likely, that the failure is in DEPT or your understanding of what how kernel wait channels and locking works. After all, why would it be that we haven't found these problems via our other QA practices? Cheers, - Ted
On Fri, Mar 04, 2022 at 10:26:23PM -0500, Theodore Ts'o wrote: > On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote: > > > > All contexts waiting for any of the events in the circular dependency > > chain will be definitely stuck if there is a circular dependency as I > > explained. So we need another wakeup source to break the circle. In > > ext4 code, you might have the wakeup source for breaking the circle. > > > > What I agreed with is: > > > > The case that 1) the circular dependency is unevitable 2) there are > > another wakeup source for breadking the circle and 3) the duration > > in sleep is short enough, should be acceptable. > > > > Sounds good? > > These dependencies are part of every single ext4 metadata update, > and if there were any unnecessary sleeps, this would be a major > performance gap, and this is a very well studied part of ext4. > > There are some places where we sleep, sure. In some case > start_this_handle() needs to wait for a commit to complete, and the > commit thread might need to sleep for I/O to complete. But the moment > the thing that we're waiting for is complete, we wake up all of the > processes on the wait queue. But in the case where we wait for I/O > complete, that wakeupis coming from the device driver, when it > receives the the I/O completion interrupt from the hard drive. Is > that considered an "external source"? Maybe DEPT doesn't recognize > that this is certain to happen just as day follows the night? (Well, > maybe the I/O completion interrupt might not happen if the disk drive > bursts into flames --- but then, you've got bigger problems. :-) Almost all you've been blaming at Dept are totally non-sense. Based on what you're saying, I'm conviced that you don't understand how Dept works even 1%. You don't even try to understand it before blame. You don't have to understand and support it. But I can't response to you if you keep saying silly things that way. > In any case, if DEPT is going to report these "circular dependencies > as bugs that MUST be fixed", it's going to be pure noise and I will > ignore all DEPT reports, and will push back on having Lockdep replaced Dept is going to be improved so that what you are concerning about won't be reported. > by DEPT --- because Lockdep give us actionable reports, and if DEPT Right. Dept should give actionable reports, too. > can't tell the difference between a valid programming pattern and a > bug, then it's worse than useless. Needless to say.
On Fri, Mar 04, 2022 at 10:40:35PM -0500, Theodore Ts'o wrote: > On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote: > > > > I found a point that the two wait channels don't lead a deadlock in > > some cases thanks to Jan Kara. I will fix it so that Dept won't > > complain it. > > I sent my last (admittedly cranky) message before you sent this. I'm > glad you finally understood Jan's explanation. I was trying to tell Not finally. I've understood him whenever he tried to tell me something. > you the same thing, but apparently I failed to communicate in a I don't think so. Your point and Jan's point are different. All he has said make sense. But yours does not. > sufficiently clear manner. In any case, what Jan described is a > fundamental part of how wait queues work, and I'm kind of amazed that > you were able to implement DEPT without understanding it. (But maybe Of course, it was possible because all that Dept has to know for basic work is wait and event. The subtle things like what Jan told me help Dept be better. > that is why some of the DEPT reports were completely incomprehensible It's because you are blinded to blame at it without understanding how Dept works at all. I will fix those that must be fixed. Don't worry. > to me; I couldn't interpret why in the world DEPT was saying there was > a problem.) I can tell you if you really want to understand why. But I can't if you are like this. > In any case, the thing I would ask is a little humility. We regularly > use lockdep, and we run a huge number of stress tests, throughout each > development cycle. Sure. > So if DEPT is issuing lots of reports about apparently circular > dependencies, please try to be open to the thought that the fault is No one was convinced that Dept doesn't have a fault. I think your worries are too much. > in DEPT, and don't try to argue with maintainers that their code MUST > be buggy --- but since you don't understand our code, and DEPT must be No one argued that their code must be buggy, either. So I don't think you have to worry about what's never happened. > theoretically perfect, that it is up to the Maintainers to prove to > you that their code is correct. > > I am going to gently suggest that it is at least as likely, if not > more likely, that the failure is in DEPT or your understanding of what No doubt. I already think so. But it doesn't mean that I have to keep quiet without discussing to imporve Dept. I will keep improving Dept in a reasonable way. > how kernel wait channels and locking works. After all, why would it > be that we haven't found these problems via our other QA practices? Let's talk more once you understand how Dept works at least 10%. Or I think we cannot talk in a productive way.
On Sat, Mar 05, 2022 at 11:15:38PM +0900, Byungchul Park wrote: > On Fri, Mar 04, 2022 at 10:26:23PM -0500, Theodore Ts'o wrote: > > On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote: > > > > > > All contexts waiting for any of the events in the circular dependency > > > chain will be definitely stuck if there is a circular dependency as I > > > explained. So we need another wakeup source to break the circle. In > > > ext4 code, you might have the wakeup source for breaking the circle. > > > > > > What I agreed with is: > > > > > > The case that 1) the circular dependency is unevitable 2) there are > > > another wakeup source for breadking the circle and 3) the duration > > > in sleep is short enough, should be acceptable. > > > > > > Sounds good? > > > > These dependencies are part of every single ext4 metadata update, > > and if there were any unnecessary sleeps, this would be a major > > performance gap, and this is a very well studied part of ext4. > > > > There are some places where we sleep, sure. In some case > > start_this_handle() needs to wait for a commit to complete, and the > > commit thread might need to sleep for I/O to complete. But the moment > > the thing that we're waiting for is complete, we wake up all of the > > processes on the wait queue. But in the case where we wait for I/O > > complete, that wakeupis coming from the device driver, when it > > receives the the I/O completion interrupt from the hard drive. Is > > that considered an "external source"? Maybe DEPT doesn't recognize > > that this is certain to happen just as day follows the night? (Well, > > maybe the I/O completion interrupt might not happen if the disk drive > > bursts into flames --- but then, you've got bigger problems. :-) > > Almost all you've been blaming at Dept are totally non-sense. Based on > what you're saying, I'm conviced that you don't understand how Dept > works even 1%. You don't even try to understand it before blame. > > You don't have to understand and support it. But I can't response to you > if you keep saying silly things that way. Byungchul, other than ext4 have there been any DEPT reports that other subsystem maintainers' agree were valid usecases? Regarding false-positives, just to note lockdep is not without its share of false-positives. Just that (as you know), the signal-to-noise ratio should be high for it to be useful. I've put up with lockdep's false positives just because it occasionally saves me from catastrophe. > > In any case, if DEPT is going to report these "circular dependencies > > as bugs that MUST be fixed", it's going to be pure noise and I will > > ignore all DEPT reports, and will push back on having Lockdep replaced > > Dept is going to be improved so that what you are concerning about won't > be reported. Yeah I am looking forward to learning more about it however I was wondering about the following: lockdep can already be used for modeling "resource acquire/release" and "resource wait" semantics that are unrelated to locks, like we do in mm reclaim. I am wondering why we cannot just use those existing lockdep mechanisms for the wait/wake usecases (Assuming that we can agree that circular dependencies on related to wait/wake is a bad thing. Or perhaps there's a reason why Peter Zijlstra did not use lockdep for wait/wake dependencies (such as multiple wake sources) considering he wrote a lot of that code. Keep kicking ass brother, you're doing great. Thanks, Joel
Hi, Sorry to butt in as an outsider, but this seems like a shockingly disrespectful discussion for such a wide CC list. I don't want to make rules how you discuss things (I very rarely contribute), and I see the value in a frank discussion, but maybe you could continue with a reduced CC list? I find it unlikely that I am the only one who could do without this. Best regards, Reimar Döffinger > On 5 Mar 2022, at 15:55, Byungchul Park <byungchul.park@lge.com> wrote: > > On Fri, Mar 04, 2022 at 10:40:35PM -0500, Theodore Ts'o wrote: >> On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote: >>> >>> I found a point that the two wait channels don't lead a deadlock in >>> some cases thanks to Jan Kara. I will fix it so that Dept won't >>> complain it. >> >> I sent my last (admittedly cranky) message before you sent this. I'm >> glad you finally understood Jan's explanation. I was trying to tell > > Not finally. I've understood him whenever he tried to tell me something. > >> you the same thing, but apparently I failed to communicate in a > > I don't think so. Your point and Jan's point are different. All he has > said make sense. But yours does not. > >> sufficiently clear manner. In any case, what Jan described is a >> fundamental part of how wait queues work, and I'm kind of amazed that >> you were able to implement DEPT without understanding it. (But maybe > > Of course, it was possible because all that Dept has to know for basic > work is wait and event. The subtle things like what Jan told me help > Dept be better. > >> that is why some of the DEPT reports were completely incomprehensible > > It's because you are blinded to blame at it without understanding how > Dept works at all. I will fix those that must be fixed. Don't worry. > >> to me; I couldn't interpret why in the world DEPT was saying there was >> a problem.) > > I can tell you if you really want to understand why. But I can't if you > are like this. > >> In any case, the thing I would ask is a little humility. We regularly >> use lockdep, and we run a huge number of stress tests, throughout each >> development cycle. > > Sure. > >> So if DEPT is issuing lots of reports about apparently circular >> dependencies, please try to be open to the thought that the fault is > > No one was convinced that Dept doesn't have a fault. I think your > worries are too much. > >> in DEPT, and don't try to argue with maintainers that their code MUST >> be buggy --- but since you don't understand our code, and DEPT must be > > No one argued that their code must be buggy, either. So I don't think > you have to worry about what's never happened. > >> theoretically perfect, that it is up to the Maintainers to prove to >> you that their code is correct. >> >> I am going to gently suggest that it is at least as likely, if not >> more likely, that the failure is in DEPT or your understanding of what > > No doubt. I already think so. But it doesn't mean that I have to keep > quiet without discussing to imporve Dept. I will keep improving Dept in > a reasonable way. > >> how kernel wait channels and locking works. After all, why would it >> be that we haven't found these problems via our other QA practices? > > Let's talk more once you understand how Dept works at least 10%. Or I > think we cannot talk in a productive way. >
On Sat, Mar 05, 2022 at 11:55:34PM +0900, Byungchul Park wrote: > > that is why some of the DEPT reports were completely incomprehensible > > It's because you are blinded to blame at it without understanding how > Dept works at all. I will fix those that must be fixed. Don't worry. Users of DEPT must not have to understand how DEPT works in order to understand and use DEPT reports. If you think I don't understand how DEPT work, I'm going to gently suggest that this means DEPT reports are clear enough, and/or DEPT documentation needs to be *substantially* improved, or both --- and these needs to happen before DEPT is ready to be merged. > > So if DEPT is issuing lots of reports about apparently circular > > dependencies, please try to be open to the thought that the fault is > > No one was convinced that Dept doesn't have a fault. I think your > worries are too much. In that case, may I ask that you add back a RFC to the subject prefix (e.g., [PATCH RFC -v5]?) Or maybe even add the subject prefix NOT YET READY? I have seen cases when after a patch series get to PATCH -v22, and then people assume that it *must* be ready, as opposed what it really means, which is "the author is just persistently reposting and rebasing the patch series over and over again". It would be helpful if you directly acknowledge, in each patch submission, that it is not yet ready for prime time. After all, right now, DEPT has generated two reports in ext4, both of which were false positives, and both of which have required a lot of maintainer times to prove to you that they were in fact false positives. So are we all agreed that DEPT is not ready for prime time? > No one argued that their code must be buggy, either. So I don't think > you have to worry about what's never happened. Well, you kept on insisting that ext4 must have a circular dependency, and that depending on a "rescue wakeup" is bad programming practice, but you'll reluctantly agree to make DEPT accept "rescue wakeups" if that is the will of the developers. My concern here is the fundmaental concept of "rescue wakeups" is wrong; I don't see how you can distinguish between a valid wakeup and one that you and DEPT is going to somehow characterize as dodgy. Consider: a process can first subscribe to multiple wait queues, and arrange to be woken up by a timeout, and then call schedule() to go to sleep. So it is not waiting on a single wait channel, but potentially *multiple* wakeup sources. If you are going to prove that kernel is not going to make forward progress, you need to prove that *all* ways that process might not wake up aren't going to happen for some reason. Just because one wakeup source seems to form a circular dependency proves nothing, since another wakeup source might be the designed and architected way that code makes forward progress. You seem to be assuminng that one wakeup source is somehow the "correct" one, and the other ways that process could be woken up is a "rescue wakeup source" and you seem to believe that relying on a "rescue wakeup source" is bad. But in the case of a process which has called prepare-to-wait on more than one wait queue, how is DEPT going to distinguish between your "morally correct" wkaeup source, and the "rescue wakeup source"? > No doubt. I already think so. But it doesn't mean that I have to keep > quiet without discussing to imporve Dept. I will keep improving Dept in > a reasonable way. Well, I don't want to be in a position of having to prove that every single DEPT report in my code that doesn't make sense to me is nonsense, or else DEPT will get merged. So maybe we need to reverse the burden of proof. Instead of just sending a DEPT report, and then asking the maintainers to explain why it is a false positive, how about if *you* use the DEPT report to examinie the subsystem code, and then explain plain English, how you think this could trigger in real life, or cause a performance problem in real life or perhaps provide a script or C reproducer that triggers the supposed deadlock? Yes, that means you will need to understand the code in question, but hopefully the DEPT reports should be clear enough that someone who isn't a deep expert in the code should be able to spot the bug. If not, and if only a few deep experts of code in question will be able to decipher the DEPT report and figure out a fix, that's really not ideal. If DEPT can find a real bug and you can show that Lockdep wouldn't have been able to find it, then that would be proof that it is making a real contribution. That's would be real benefit. At the same time, DEPT will hopefully be able to demonstrate a false positive rate which is low enough that the benefits clearly outweight the costs. At the moment, I believe the scoreboard for DEPT with respect to ext4 is zero real bugs found, and two false positives, both of which have required significant rounds of e-mail before the subsystem maintainers were able to prove to you that it was, indeed, DEPT reporting a false positive. Do you now understand why I am so concerned that you aren't putting an RFC or NOT YET READY in the subject line? - Ted P.S. If DEPT had a CONFIG_EXPERIMENTAL, with a disclaimer in the KConfig that some of its reports might be false positives, that might be another way of easing my fears that this won't get used by Syzkaller, and to generate a lot of burdensome triage work on the maintainers.
Ted wrote: > On Sat, Mar 05, 2022 at 11:55:34PM +0900, Byungchul Park wrote: > > > that is why some of the DEPT reports were completely incomprehensible > > > > It's because you are blinded to blame at it without understanding how > > Dept works at all. I will fix those that must be fixed. Don't worry. > > Users of DEPT must not have to understand how DEPT works in order to Users must not have to understand how Dept works for sure, and haters must not blame things based on what they guess wrong. > understand and use DEPT reports. If you think I don't understand how > DEPT work, I'm going to gently suggest that this means DEPT reports > are clear enough, and/or DEPT documentation needs to be > *substantially* improved, or both --- and these needs to happen before > DEPT is ready to be merged. Okay. > > > So if DEPT is issuing lots of reports about apparently circular > > > dependencies, please try to be open to the thought that the fault is > > > > No one was convinced that Dept doesn't have a fault. I think your > > worries are too much. > > In that case, may I ask that you add back a RFC to the subject prefix > (e.g., [PATCH RFC -v5]?) Or maybe even add the subject prefix NOT YET I will. > READY? I have seen cases when after a patch series get to PATCH -v22, > and then people assume that it *must* be ready, as opposed what it > really means, which is "the author is just persistently reposting and > rebasing the patch series over and over again". It would be helpful > if you directly acknowledge, in each patch submission, that it is not > yet ready for prime time. > > After all, right now, DEPT has generated two reports in ext4, both of > which were false positives, and both of which have required a lot of > maintainer times to prove to you that they were in fact false > positives. So are we all agreed that DEPT is not ready for prime > time? Yes. > > No one argued that their code must be buggy, either. So I don't think > > you have to worry about what's never happened. > > Well, you kept on insisting that ext4 must have a circular dependency, > and that depending on a "rescue wakeup" is bad programming practice, > but you'll reluctantly agree to make DEPT accept "rescue wakeups" if > that is the will of the developers. My concern here is the > fundmaental concept of "rescue wakeups" is wrong; I don't see how you > can distinguish between a valid wakeup and one that you and DEPT is > going to somehow characterize as dodgy. Your concern on it makes sense. I need to explain how I think about it more, but not now cuz I guess the other folks alrealy got tired enough. Let's talk about it later when needed again. > Consider: a process can first subscribe to multiple wait queues, and > arrange to be woken up by a timeout, and then call schedule() to go to > sleep. So it is not waiting on a single wait channel, but potentially > *multiple* wakeup sources. If you are going to prove that kernel is > not going to make forward progress, you need to prove that *all* ways > that process might not wake up aren't going to happen for some reason. > > Just because one wakeup source seems to form a circular dependency > proves nothing, since another wakeup source might be the designed and > architected way that code makes forward progress. I also think it's legal if the design is intended. But it's not if not. This is what I meant. Again, it's not about ext4. > You seem to be assuminng that one wakeup source is somehow the > "correct" one, and the other ways that process could be woken up is a > "rescue wakeup source" and you seem to believe that relying on a > "rescue wakeup source" is bad. But in the case of a process which has It depends on whether or not it's intended IMHO. > called prepare-to-wait on more than one wait queue, how is DEPT going > to distinguish between your "morally correct" wkaeup source, and the > "rescue wakeup source"? Sure, it should be done manually. I should do it on my own when that kind of issue arises. Agian, ext4 is not the case cuz, based on what Jan explained, there's no real circular dependency wrt commit wq, done wq and so on. > > No doubt. I already think so. But it doesn't mean that I have to keep > > quiet without discussing to imporve Dept. I will keep improving Dept in > > a reasonable way. > > Well, I don't want to be in a position of having to prove that every > single DEPT report in my code that doesn't make sense to me is > nonsense, or else DEPT will get merged. > > So maybe we need to reverse the burden of proof. I will keep in mind. > Instead of just sending a DEPT report, and then asking the maintainers > to explain why it is a false positive, how about if *you* use the DEPT > report to examinie the subsystem code, and then explain plain English, > how you think this could trigger in real life, or cause a performance > problem in real life or perhaps provide a script or C reproducer that > triggers the supposed deadlock? Makes sense. Let me try. > Yes, that means you will need to understand the code in question, but > hopefully the DEPT reports should be clear enough that someone who > isn't a deep expert in the code should be able to spot the bug. If > not, and if only a few deep experts of code in question will be able > to decipher the DEPT report and figure out a fix, that's really not > ideal. Agree. Just FYI, I've never blamed you are not the expert on Dept. > If DEPT can find a real bug and you can show that Lockdep wouldn't > have been able to find it, then that would be proof that it is making > a real contribution. That's would be real benefit. At the same time, > DEPT will hopefully be able to demonstrate a false positive rate which > is low enough that the benefits clearly outweight the costs. > > At the moment, I believe the scoreboard for DEPT with respect to ext4 > is zero real bugs found, and two false positives, both of which have > required significant rounds of e-mail before the subsystem maintainers > were able to prove to you that it was, indeed, DEPT reporting a false > positive. Right. But we've barely talked in a productive way. We've talked about other things way more than proving what you're mentioning. > Do you now understand why I am so concerned that you aren't putting an > RFC or NOT YET READY in the subject line? Yes. I will. > - Ted > > P.S. If DEPT had a CONFIG_EXPERIMENTAL, with a disclaimer in the > KConfig that some of its reports might be false positives, that might > be another way of easing my fears that this won't get used by > Syzkaller, and to generate a lot of burdensome triage work on the > maintainers. Thank you for straightforward explanation this time, Byungchul
On Sun, Mar 06, 2022 at 07:51:42PM +0900, Byungchul Park wrote: > > > > Users of DEPT must not have to understand how DEPT works in order to > > Users must not have to understand how Dept works for sure, and haters > must not blame things based on what they guess wrong. For the record, I don't hate DEPT. I *fear* that DEPT will result in my getting spammed with a huge number of false posiives once automated testing systems like Syzkaller, zero-day test robot, etcs., get a hold of it once it gets merged and start generating hundreds of automated reports. And when I tried to read the DEPT reports, and the DEPT documentation, and I found that its explanation for why ext4 had a circular dependency simply did not make sense. If my struggles to understand why DEPT was issuing a false positive is "guessing", then how do we have discussions over how to make DEPT better? > > called prepare-to-wait on more than one wait queue, how is DEPT going > > to distinguish between your "morally correct" wkaeup source, and the > > "rescue wakeup source"? > > Sure, it should be done manually. I should do it on my own when that > kind of issue arises. The question here is how often will it need to be done, and how easy will it be to "do it manually"? Suppose we mark all of the DEPT false positives before it gets merged? How easy will it be able to suppress future false positives in the future, as the kernel evolves? Perhaps one method is to haved a way to take a particular wait queue, or call to schedule(), or at the level of an entire kernel source file, and opt it out from DEPT analysis? That way, if DEPT gets merged, and a maintainer starts getting spammed by bogus (or incomprehensible) reports, there is a simople way they can annotate their source code to prevent DEPT from analyzing code that it is apparently not able to understand correctly. That way we don't necessarily need to have a debate over how close to zero percent false positives is necessary before DEPT can get merged. And we avoid needing to force maintainers to prove that a DEPT report is a false positive, which is from my experience hard to do, since they get accused of being DEPT haters and not understanding DEPT. - Ted
On Sat, Mar 05, 2022 at 03:05:23PM +0000, Joel Fernandes wrote: > On Sat, Mar 05, 2022 at 11:15:38PM +0900, Byungchul Park wrote: > > Almost all you've been blaming at Dept are totally non-sense. Based on > > what you're saying, I'm conviced that you don't understand how Dept > > works even 1%. You don't even try to understand it before blame. > > > > You don't have to understand and support it. But I can't response to you > > if you keep saying silly things that way. > > Byungchul, other than ext4 have there been any DEPT reports that other > subsystem maintainers' agree were valid usecases? Not yet. > Regarding false-positives, just to note lockdep is not without its share of > false-positives. Just that (as you know), the signal-to-noise ratio should be > high for it to be useful. I've put up with lockdep's false positives just > because it occasionally saves me from catastrophe. I love your insight. Agree. A tool would be useful only when it's *actually* helpful. I hope Dept would be so. > > > In any case, if DEPT is going to report these "circular dependencies > > > as bugs that MUST be fixed", it's going to be pure noise and I will > > > ignore all DEPT reports, and will push back on having Lockdep replaced > > > > Dept is going to be improved so that what you are concerning about won't > > be reported. > > Yeah I am looking forward to learning more about it however I was wondering > about the following: lockdep can already be used for modeling "resource > acquire/release" and "resource wait" semantics that are unrelated to locks, > like we do in mm reclaim. I am wondering why we cannot just use those existing > lockdep mechanisms for the wait/wake usecases (Assuming that we can agree 1. Lockdep can't work with general waits/events happening across contexts basically. To get over this, manual tagging of acquire/release can be used at each section that we suspect. But unfortunately, we cannot use the method if we cannot simply identify the sections. Furthermore, it's inevitable to miss sections that shouldn't get missed. 2. Some cases should be correctly tracked via wait/event model, not acquisition order model. For example, read-lock in rwlock should be defined as a waiter waiting for write-unlock, write-lock in rwlock as a waiter waiting for either read-unlock or write-unlock. Otherwise, if we try to track those cases using acquisition order, it cannot completely work. Don't you think it looks werid? 3. Tracking what we didn't track before means both stronger detection and new emergence of false positives, exactly same as Lockdep at its beginning when it started to track what we hadn't tracked before. Even though the emergence was allowed at that time, now that Locdkep got stable enough, folks would be more strict on new emergences. It's gonna get even worse if valid reports are getting prevented by false positives. For that reason, multi reporting functionality is essential. I was thinking to improve Lockdep to allow multi reporting. But it might be needed to change more than developing a new tool from scratch. Plus it might be even more difficult cuz Lockdep already works not badly. So even for Lockdep, I thought the new thing should be developed independently leaving Lockdep as it is. 4. (minor reason) The concept and name of acquisition and release is not for general wait/event. The design and implementation are not, either. I wanted to address the issue as soon as possible before we squeeze out Lockdep to use for general wait/event more and the kernel code gets weird. Of course, it doesn't mean Dept is more stable than Lockdep. However, I can tell Dept works what a dependency tool should do and we need to make the code go right. > that circular dependencies on related to wait/wake is a bad thing. Or perhaps > there's a reason why Peter Zijlstra did not use lockdep for wait/wake > dependencies (such as multiple wake sources) considering he wrote a lot of > that code. > > Keep kicking ass brother, you're doing great. Thank you! I'll go through this in a right way so as not to disappoint you! Thanks, Byungchul
On Sun, Mar 06, 2022 at 09:19:10AM -0500, Theodore Ts'o wrote: > On Sun, Mar 06, 2022 at 07:51:42PM +0900, Byungchul Park wrote: > > > > > > Users of DEPT must not have to understand how DEPT works in order to > > > > Users must not have to understand how Dept works for sure, and haters > > must not blame things based on what they guess wrong. > > For the record, I don't hate DEPT. I *fear* that DEPT will result in > my getting spammed with a huge number of false posiives once automated > testing systems like Syzkaller, zero-day test robot, etcs., get a hold > of it once it gets merged and start generating hundreds of automated > reports. Agree. Dept should not be a part of *automated testing system* until it finally works as much as Lockdep in terms of false positives. However, it's impossible to achieve it by doing it out of the tree. Besides automated testing system, Dept works great in the middle of developing something that is so complicated in terms of synchronization. They don't have to worry about real reports anymore, that should be reported, from getting prevented by a false positve. I will explicitely describe EXPERIMENTAL and "Dept might false-alarm" in Kconfig until it's considered a few-false-alarming tool. > > Sure, it should be done manually. I should do it on my own when that > > kind of issue arises. > > The question here is how often will it need to be done, and how easy I guess it's gonna rarely happens. I want to see too. > will it be to "do it manually"? Suppose we mark all of the DEPT false Very easy. Equal to or easier than the way we do for lockdep. But the interest would be wait/event objects rather than locks. > positives before it gets merged? How easy will it be able to suppress > future false positives in the future, as the kernel evolves? Same as - or even better than - what we do for Lockdep. And we'd better consider those activies as a code-documentation. Not only making things just work but organizing code and documenting in code, are also very meaningful. > Perhaps one method is to haved a way to take a particular wait queue, > or call to schedule(), or at the level of an entire kernel source > file, and opt it out from DEPT analysis? That way, if DEPT gets > merged, and a maintainer starts getting spammed by bogus (or Once Dept gets stable - hoplefully now that Dept is working very conservatively, there might not be as many false positives as you're concerning. The situation is in control. > That way we don't necessarily need to have a debate over how close to > zero percent false positives is necessary before DEPT can get merged. Non-sense. I would agree with you if it was so when Lockdep was merged. But I'll try to achieve almost zero false positives, again, it's impossible to do it out of tree. > And we avoid needing to force maintainers to prove that a DEPT report So... It'd be okay if Dept goes not as a part of automated testing system. Right? > is a false positive, which is from my experience hard to do, since > they get accused of being DEPT haters and not understanding DEPT. Honestly, it's not a problem of that they don't understand other domians than what they are familiar with, but another issue. I won't mention it. And it sounds like you'd do nothing unless it turns out to be problematic 100%. It's definitely the *easiest* way to maintain something because it's the same as not checking its correctness at all. Even if it's so hard to do, checking if the code is safe for real repeatedly, is what it surely should be done. Again, I understand it would be freaking hard. But it doesn't mean we should avoid it. Here, there seems to be two points you'd like to say: 1. Fundamental question. Does Dept track wait and event correctly? 2. Even if so, can Dept consider all the subtle things in the kernel? For 1, I'm willing to response to whatever it is. And not only me but we can make it perfectly work if the concept and direction is *right*. For 2, I need to ask things and try my best to fix those if it exists. Again. Dept won't be a part of *automated testing system* until it finally works as much as Lockdep in terms of false positives. Hopefully you are okay with it. --- Byungchul