Message ID | 20140703081318.GB20612@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/03/2014 04:13 PM, Liu Bo wrote: > On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: >> This got triggered by an rsync I think. I'm not sure which of my btrfs FS >> has the issue yet since BUG_ON isn't very helpful as discussed earlier. >> >> [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 >> [160562.950297] ------------[ cut here ]------------ >> [160562.965904] kernel BUG at fs/btrfs/locking.c:269! >> >> But shouldn't messages like 'parent transid verify failed' print which device >> this happened on to give the operator a hint on where the problem is? >> >> Could someone do a pass at those and make sure they all print the device >> ID/name? >> >> Bug below: >> >> Full log before the crash: >> INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> btrfs-transacti D 0000000000000000 0 3358 2 0x00000000 >> ffff8800c50ebc50 0000000000000046 ffff8800c50ebc20 ffff8800c50ebfd8 >> ffff8800c6914390 00000000000141c0 7fffffffffffffff ffff8801433b8f10 >> 0000000000000002 ffffffff8161c9b0 7fffffffffffffff ffff8800c50ebc60 >> Call Trace: >> [<ffffffff8161c9b0>] ? sock_rps_reset_flow+0x32/0x32 >> [<ffffffff8161d3c6>] schedule+0x73/0x75 >> [<ffffffff8161c9e9>] schedule_timeout+0x39/0x129 >> [<ffffffff8107653d>] ? get_parent_ip+0xd/0x3c >> [<ffffffff8162338f>] ? preempt_count_add+0x7a/0x8d >> [<ffffffff8161dbac>] __wait_for_common+0x11a/0x159 >> [<ffffffff8107810f>] ? wake_up_state+0x12/0x12 >> [<ffffffff8161dc0f>] wait_for_completion+0x24/0x26 >> [<ffffffff81237ce6>] btrfs_wait_and_free_delalloc_work+0x16/0x28 >> [<ffffffff8123fd3a>] btrfs_run_ordered_operations+0x1e7/0x21e >> [<ffffffff81229aa4>] btrfs_flush_all_pending_stuffs+0x4e/0x55 >> [<ffffffff8122b25a>] btrfs_commit_transaction+0x20d/0x8b0 >> [<ffffffff81227b41>] transaction_kthread+0xf8/0x1ab >> [<ffffffff81227a49>] ? btrfs_cleanup_transaction+0x44c/0x44c >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kworker/u8:13 D 0000000000000000 0 13157 2 0x00000080 >> Workqueue: btrfs-flush_delalloc normal_work_helper >> ffff8800041cfc00 0000000000000046 ffff8800041cfbd0 ffff8800041cffd8 >> ffff8800034f40d0 00000000000141c0 ffff88021f2941c0 ffff8800034f40d0 >> ffff8800041cfca0 ffffffff810fdc2f 0000000000000002 ffff8800041cfc10 >> Call Trace: >> [<ffffffff810fdc2f>] ? wait_on_page_read+0x3c/0x3c >> [<ffffffff8161d3c6>] schedule+0x73/0x75 >> [<ffffffff8161d56b>] io_schedule+0x60/0x7a >> [<ffffffff810fdc3d>] sleep_on_page+0xe/0x12 >> [<ffffffff8161d7fd>] __wait_on_bit+0x48/0x7a >> [<ffffffff810fdbdd>] wait_on_page_bit+0x7a/0x7c >> [<ffffffff81084821>] ? autoremove_wake_function+0x34/0x34 >> [<ffffffff810fef03>] filemap_fdatawait_range+0x7e/0x126 >> [<ffffffff8122f1cf>] ? btrfs_submit_direct+0x3f4/0x3f4 >> [<ffffffff8122d7aa>] ? btrfs_writepages+0x28/0x2a >> [<ffffffff811084c6>] ? do_writepages+0x1e/0x2c >> [<ffffffff810ff38e>] ? __filemap_fdatawrite_range+0x55/0x57 >> [<ffffffff8124006f>] btrfs_wait_ordered_range+0x6a/0x11a >> [<ffffffff8122fe01>] btrfs_run_delalloc_work+0x27/0x69 >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 >> [<ffffffff81066020>] worker_thread+0x136/0x205 >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> btrfs-transacti D 0000000000000000 0 3358 2 0x00000000 >> ffff8800c50ebc50 0000000000000046 ffff8800c50ebc20 ffff8800c50ebfd8 >> ffff8800c6914390 00000000000141c0 7fffffffffffffff ffff8801433b8f10 >> 0000000000000002 ffffffff8161c9b0 7fffffffffffffff ffff8800c50ebc60 >> Call Trace: >> [<ffffffff8161c9b0>] ? sock_rps_reset_flow+0x32/0x32 >> [<ffffffff8161d3c6>] schedule+0x73/0x75 >> [<ffffffff8161c9e9>] schedule_timeout+0x39/0x129 >> [<ffffffff8107653d>] ? get_parent_ip+0xd/0x3c >> [<ffffffff8162338f>] ? preempt_count_add+0x7a/0x8d >> [<ffffffff8161dbac>] __wait_for_common+0x11a/0x159 >> [<ffffffff8107810f>] ? wake_up_state+0x12/0x12 >> [<ffffffff8161dc0f>] wait_for_completion+0x24/0x26 >> [<ffffffff81237ce6>] btrfs_wait_and_free_delalloc_work+0x16/0x28 >> [<ffffffff8123fd3a>] btrfs_run_ordered_operations+0x1e7/0x21e >> [<ffffffff81229aa4>] btrfs_flush_all_pending_stuffs+0x4e/0x55 >> [<ffffffff8122b25a>] btrfs_commit_transaction+0x20d/0x8b0 >> [<ffffffff81227b41>] transaction_kthread+0xf8/0x1ab >> [<ffffffff81227a49>] ? btrfs_cleanup_transaction+0x44c/0x44c >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kworker/u8:13 D 0000000000000000 0 13157 2 0x00000080 >> Workqueue: btrfs-flush_delalloc normal_work_helper >> ffff8800041cfc00 0000000000000046 ffff8800041cfbd0 ffff8800041cffd8 >> ffff8800034f40d0 00000000000141c0 ffff88021f2941c0 ffff8800034f40d0 >> ffff8800041cfca0 ffffffff810fdc2f 0000000000000002 ffff8800041cfc10 >> Call Trace: >> [<ffffffff810fdc2f>] ? wait_on_page_read+0x3c/0x3c >> [<ffffffff8161d3c6>] schedule+0x73/0x75 >> [<ffffffff8161d56b>] io_schedule+0x60/0x7a >> [<ffffffff810fdc3d>] sleep_on_page+0xe/0x12 >> [<ffffffff8161d7fd>] __wait_on_bit+0x48/0x7a >> [<ffffffff810fdbdd>] wait_on_page_bit+0x7a/0x7c >> [<ffffffff81084821>] ? autoremove_wake_function+0x34/0x34 >> [<ffffffff810fef03>] filemap_fdatawait_range+0x7e/0x126 >> [<ffffffff8122f1cf>] ? btrfs_submit_direct+0x3f4/0x3f4 >> [<ffffffff8122d7aa>] ? btrfs_writepages+0x28/0x2a >> [<ffffffff811084c6>] ? do_writepages+0x1e/0x2c >> [<ffffffff810ff38e>] ? __filemap_fdatawrite_range+0x55/0x57 >> [<ffffffff8124006f>] btrfs_wait_ordered_range+0x6a/0x11a >> [<ffffffff8122fe01>] btrfs_run_delalloc_work+0x27/0x69 >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 >> [<ffffffff81066020>] worker_thread+0x136/0x205 >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> parent transid verify failed on 2776298520576 wanted 41015 found 18120 >> ------------[ cut here ]------------ >> kernel BUG at fs/btrfs/locking.c:269! >> invalid opcode: 0000 [#1] PREEMPT SMP >> Modules linked in: udp_diag tcp_diag inet_diag ip6table_filter ip6_tables ebtable_nat ebtables tun ppdev lp autofs4 binfmt_misc kl5kusb105 deflate ctr twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_generic twofish_common camellia_x86_64 camellia_generic serpent_avx_x86_64 serpent_sse2_x86_64 glue_helper lrw serpent_generic blowfish_x86_64 blowfish_generic blowfish_common cast5_avx_x86_64 ablk_helper cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_ssse3 sha512_generic ftdi_sio crypto_null keyspan af_key xfrm_algo dm_mirror dm_region_hash dm_log nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc ipt_REJECT xt_conntrack xt_nat xt_tcpudp xt_LOG iptable_mangle iptable_filter aes_x86_64 lm85 hwmon_vid dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack sg st snd_pcm_oss snd_mixer_oss fuse eeepc_wmi snd_hda_codec_realtek snd_hda_codec_generic asus_wmi micro ! > co! >> de snd_cmipci gameport sparse_keymap kvm_intel snd_hda_intel snd_opl3_lib rfkill kvm snd_mpu401_uart snd_seq_midi tpm_infineon snd_seq_midi_event snd_hda_controller snd_hda_codec rc_ati_x10 snd_hwdep snd_pcm snd_seq snd_rawmidi ati_remote asix battery coretemp libphy processor snd_seq_device snd_timer wmi usbnet evdev rc_core tpm_tis intel_rapl parport_pc snd parport tpm pcspkr lpc_ich i2c_i801 pl2303 xhci_hcd intel_powerclamp ezusb soundcore x86_pkg_temp_thermal usbserial xts gf128mul dm_crypt dm_mod raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx e1000e ptp pps_core crc32c_intel ghash_clmulni_intel sata_sil24 cryptd crct10dif_pclmul r8169 ehci_pci thermal ehci_hcd crc32_pclmul fan mii sata_mv usbcore usb_common [last unloaded: kl5kusb105] >> CPU: 0 PID: 25515 Comm: kworker/u8:2 Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 >> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012 >> Workqueue: btrfs-endio-write normal_work_helper >> task: ffff880019d58610 ti: ffff880004c24000 task.ti: ffff880004c24000 >> RIP: 0010:[<ffffffff8125751c>] [<ffffffff8125751c>] btrfs_assert_tree_read_locked+0x10/0x14 >> RSP: 0018:ffff880004c27848 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: ffff8800353cadd0 RCX: 0000000000000000 >> RDX: 000000000000001e RSI: ffff8801d6e46e10 RDI: ffff8800353cadd0 >> RBP: ffff880004c27848 R08: 0000000000000000 R09: ffff880041c943c8 >> R10: ffffffffffffffff R11: 0000000000000000 R12: 000000000000a037 >> R13: 0000000000000001 R14: ffff88020f032c00 R15: ffff8801957d201f >> FS: 0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00000000f777c064 CR3: 0000000045787000 CR4: 00000000000407f0 >> Stack: >> ffff880004c27868 ffffffff812578bd ffff8800353cadd0 000000000000a037 >> ffff880004c278b0 ffffffff81224799 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 0000000000000000 ffff8802121b2800 >> Call Trace: >> [<ffffffff812578bd>] btrfs_tree_read_unlock_blocking+0x5c/0xb7 >> [<ffffffff81224799>] verify_parent_transid+0x138/0x150 >> [<ffffffff81224867>] btree_read_extent_buffer_pages.constprop.45+0x7f/0x100 >> [<ffffffff8122722a>] btrfs_read_buffer+0x2c/0x2e >> [<ffffffff8120e7cf>] read_block_for_search.isra.26+0xae/0x287 >> [<ffffffff8121078d>] btrfs_search_old_slot+0x3e6/0x6bc >> [<ffffffff812454f9>] ? release_extent_buffer+0xad/0xb8 >> [<ffffffff812747cd>] __resolve_indirect_refs+0x16c/0x50d >> [<ffffffff81275a48>] find_parent_nodes+0x8e6/0x90a >> [<ffffffff81275aee>] __btrfs_find_all_roots+0x82/0xe0 >> [<ffffffff81275bb2>] btrfs_find_all_roots+0x4a/0x6a >> [<ffffffff81278c4a>] btrfs_qgroup_account_ref+0x10c/0x403 >> [<ffffffff8113fb2c>] ? __cache_free.isra.40+0x1b4/0x1c3 >> [<ffffffff81217de9>] btrfs_delayed_refs_qgroup_accounting+0xb5/0xe8 >> [<ffffffff8122bf37>] __btrfs_end_transaction+0x60/0x2cd >> [<ffffffff8122c1b4>] btrfs_end_transaction+0x10/0x12 >> [<ffffffff812327fd>] btrfs_finish_ordered_io+0x307/0x3bd >> [<ffffffff81232ab2>] finish_ordered_fn+0x15/0x17 >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 >> [<ffffffff81066020>] worker_thread+0x136/0x205 >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 >> Code: 89 e5 f0 ff 0f 5d c3 55 48 89 e5 f0 ff 0f 74 04 31 c0 eb 05 b8 01 00 00 00 5d c3 66 66 66 66 90 8b 47 58 55 48 89 e5 85 c0 75 02 <0f> 0b 5d c3 66 66 66 66 90 55 48 89 e5 41 55 41 89 f5 41 54 53 >> RIP [<ffffffff8125751c>] btrfs_assert_tree_read_locked+0x10/0x14 >> RSP <ffff880004c27848> >> ---[ end trace b88f907c1447016f ]--- >> Kernel panic - not syncing: Fatal exception >> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) >> drm_kms_helper: panic occurred, switching back to text console >> Rebooting in 20 seconds.. >> ACPI MEMORY or I/O RESET_REG. >> > After looking at the stack, it seems that we lack a check for @need_lock since eb > is taking it when @need_lock is true, could you please try this? Josef also sent a patch to address this problem before. https://patchwork.kernel.org/patch/4424521/ > > thanks, > -liubo > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 8bb4aa1..f00165d 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -369,7 +369,8 @@ static int verify_parent_transid(struct extent_io_tree > *io_tree, > out: > unlock_extent_cached(io_tree, eb->start, eb->start + eb->len - 1, > &cached_state, GFP_NOFS); > - btrfs_tree_read_unlock_blocking(eb); > + if (need_lock) > + btrfs_tree_read_unlock_blocking(eb); > return ret; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 03, 2014 at 04:20:47PM +0800, Wang Shilong wrote: > On 07/03/2014 04:13 PM, Liu Bo wrote: > >On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: > >>This got triggered by an rsync I think. I'm not sure which of my btrfs FS > >>has the issue yet since BUG_ON isn't very helpful as discussed earlier. > >> > >>[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 > >>[160562.950297] ------------[ cut here ]------------ > >>[160562.965904] kernel BUG at fs/btrfs/locking.c:269! > >> > >>But shouldn't messages like 'parent transid verify failed' print which device > >>this happened on to give the operator a hint on where the problem is? > >> > >>Could someone do a pass at those and make sure they all print the device > >>ID/name? > >> > >>Bug below: > >> > >>Full log before the crash: > >>INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. > >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 > >>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>btrfs-transacti D 0000000000000000 0 3358 2 0x00000000 > >> ffff8800c50ebc50 0000000000000046 ffff8800c50ebc20 ffff8800c50ebfd8 > >> ffff8800c6914390 00000000000141c0 7fffffffffffffff ffff8801433b8f10 > >> 0000000000000002 ffffffff8161c9b0 7fffffffffffffff ffff8800c50ebc60 > >>Call Trace: > >> [<ffffffff8161c9b0>] ? sock_rps_reset_flow+0x32/0x32 > >> [<ffffffff8161d3c6>] schedule+0x73/0x75 > >> [<ffffffff8161c9e9>] schedule_timeout+0x39/0x129 > >> [<ffffffff8107653d>] ? get_parent_ip+0xd/0x3c > >> [<ffffffff8162338f>] ? preempt_count_add+0x7a/0x8d > >> [<ffffffff8161dbac>] __wait_for_common+0x11a/0x159 > >> [<ffffffff8107810f>] ? wake_up_state+0x12/0x12 > >> [<ffffffff8161dc0f>] wait_for_completion+0x24/0x26 > >> [<ffffffff81237ce6>] btrfs_wait_and_free_delalloc_work+0x16/0x28 > >> [<ffffffff8123fd3a>] btrfs_run_ordered_operations+0x1e7/0x21e > >> [<ffffffff81229aa4>] btrfs_flush_all_pending_stuffs+0x4e/0x55 > >> [<ffffffff8122b25a>] btrfs_commit_transaction+0x20d/0x8b0 > >> [<ffffffff81227b41>] transaction_kthread+0xf8/0x1ab > >> [<ffffffff81227a49>] ? btrfs_cleanup_transaction+0x44c/0x44c > >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >>INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. > >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 > >>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>kworker/u8:13 D 0000000000000000 0 13157 2 0x00000080 > >>Workqueue: btrfs-flush_delalloc normal_work_helper > >> ffff8800041cfc00 0000000000000046 ffff8800041cfbd0 ffff8800041cffd8 > >> ffff8800034f40d0 00000000000141c0 ffff88021f2941c0 ffff8800034f40d0 > >> ffff8800041cfca0 ffffffff810fdc2f 0000000000000002 ffff8800041cfc10 > >>Call Trace: > >> [<ffffffff810fdc2f>] ? wait_on_page_read+0x3c/0x3c > >> [<ffffffff8161d3c6>] schedule+0x73/0x75 > >> [<ffffffff8161d56b>] io_schedule+0x60/0x7a > >> [<ffffffff810fdc3d>] sleep_on_page+0xe/0x12 > >> [<ffffffff8161d7fd>] __wait_on_bit+0x48/0x7a > >> [<ffffffff810fdbdd>] wait_on_page_bit+0x7a/0x7c > >> [<ffffffff81084821>] ? autoremove_wake_function+0x34/0x34 > >> [<ffffffff810fef03>] filemap_fdatawait_range+0x7e/0x126 > >> [<ffffffff8122f1cf>] ? btrfs_submit_direct+0x3f4/0x3f4 > >> [<ffffffff8122d7aa>] ? btrfs_writepages+0x28/0x2a > >> [<ffffffff811084c6>] ? do_writepages+0x1e/0x2c > >> [<ffffffff810ff38e>] ? __filemap_fdatawrite_range+0x55/0x57 > >> [<ffffffff8124006f>] btrfs_wait_ordered_range+0x6a/0x11a > >> [<ffffffff8122fe01>] btrfs_run_delalloc_work+0x27/0x69 > >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 > >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 > >> [<ffffffff81066020>] worker_thread+0x136/0x205 > >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f > >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >>INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. > >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 > >>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>btrfs-transacti D 0000000000000000 0 3358 2 0x00000000 > >> ffff8800c50ebc50 0000000000000046 ffff8800c50ebc20 ffff8800c50ebfd8 > >> ffff8800c6914390 00000000000141c0 7fffffffffffffff ffff8801433b8f10 > >> 0000000000000002 ffffffff8161c9b0 7fffffffffffffff ffff8800c50ebc60 > >>Call Trace: > >> [<ffffffff8161c9b0>] ? sock_rps_reset_flow+0x32/0x32 > >> [<ffffffff8161d3c6>] schedule+0x73/0x75 > >> [<ffffffff8161c9e9>] schedule_timeout+0x39/0x129 > >> [<ffffffff8107653d>] ? get_parent_ip+0xd/0x3c > >> [<ffffffff8162338f>] ? preempt_count_add+0x7a/0x8d > >> [<ffffffff8161dbac>] __wait_for_common+0x11a/0x159 > >> [<ffffffff8107810f>] ? wake_up_state+0x12/0x12 > >> [<ffffffff8161dc0f>] wait_for_completion+0x24/0x26 > >> [<ffffffff81237ce6>] btrfs_wait_and_free_delalloc_work+0x16/0x28 > >> [<ffffffff8123fd3a>] btrfs_run_ordered_operations+0x1e7/0x21e > >> [<ffffffff81229aa4>] btrfs_flush_all_pending_stuffs+0x4e/0x55 > >> [<ffffffff8122b25a>] btrfs_commit_transaction+0x20d/0x8b0 > >> [<ffffffff81227b41>] transaction_kthread+0xf8/0x1ab > >> [<ffffffff81227a49>] ? btrfs_cleanup_transaction+0x44c/0x44c > >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >>INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. > >> Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 > >>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>kworker/u8:13 D 0000000000000000 0 13157 2 0x00000080 > >>Workqueue: btrfs-flush_delalloc normal_work_helper > >> ffff8800041cfc00 0000000000000046 ffff8800041cfbd0 ffff8800041cffd8 > >> ffff8800034f40d0 00000000000141c0 ffff88021f2941c0 ffff8800034f40d0 > >> ffff8800041cfca0 ffffffff810fdc2f 0000000000000002 ffff8800041cfc10 > >>Call Trace: > >> [<ffffffff810fdc2f>] ? wait_on_page_read+0x3c/0x3c > >> [<ffffffff8161d3c6>] schedule+0x73/0x75 > >> [<ffffffff8161d56b>] io_schedule+0x60/0x7a > >> [<ffffffff810fdc3d>] sleep_on_page+0xe/0x12 > >> [<ffffffff8161d7fd>] __wait_on_bit+0x48/0x7a > >> [<ffffffff810fdbdd>] wait_on_page_bit+0x7a/0x7c > >> [<ffffffff81084821>] ? autoremove_wake_function+0x34/0x34 > >> [<ffffffff810fef03>] filemap_fdatawait_range+0x7e/0x126 > >> [<ffffffff8122f1cf>] ? btrfs_submit_direct+0x3f4/0x3f4 > >> [<ffffffff8122d7aa>] ? btrfs_writepages+0x28/0x2a > >> [<ffffffff811084c6>] ? do_writepages+0x1e/0x2c > >> [<ffffffff810ff38e>] ? __filemap_fdatawrite_range+0x55/0x57 > >> [<ffffffff8124006f>] btrfs_wait_ordered_range+0x6a/0x11a > >> [<ffffffff8122fe01>] btrfs_run_delalloc_work+0x27/0x69 > >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 > >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 > >> [<ffffffff81066020>] worker_thread+0x136/0x205 > >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f > >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >>parent transid verify failed on 2776298520576 wanted 41015 found 18120 > >>------------[ cut here ]------------ > >>kernel BUG at fs/btrfs/locking.c:269! > >>invalid opcode: 0000 [#1] PREEMPT SMP > >>Modules linked in: udp_diag tcp_diag inet_diag ip6table_filter ip6_tables ebtable_nat ebtables tun ppdev lp autofs4 binfmt_misc kl5kusb105 deflate ctr twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_generic twofish_common camellia_x86_64 camellia_generic serpent_avx_x86_64 serpent_sse2_x86_64 glue_helper lrw serpent_generic blowfish_x86_64 blowfish_generic blowfish_common cast5_avx_x86_64 ablk_helper cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_ssse3 sha512_generic ftdi_sio crypto_null keyspan af_key xfrm_algo dm_mirror dm_region_hash dm_log nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc ipt_REJECT xt_conntrack xt_nat xt_tcpudp xt_LOG iptable_mangle iptable_filter aes_x86_64 lm85 hwmon_vid dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack sg st snd_pcm_oss snd_mixer_oss fuse eeepc_wmi snd_hda_codec_realtek snd_hda_codec_generic asus_wmi micr o! > > co! > >> de snd_cmipci gameport sparse_keymap kvm_intel snd_hda_intel snd_opl3_lib rfkill kvm snd_mpu401_uart snd_seq_midi tpm_infineon snd_seq_midi_event snd_hda_controller snd_hda_codec rc_ati_x10 snd_hwdep snd_pcm snd_seq snd_rawmidi ati_remote asix battery coretemp libphy processor snd_seq_device snd_timer wmi usbnet evdev rc_core tpm_tis intel_rapl parport_pc snd parport tpm pcspkr lpc_ich i2c_i801 pl2303 xhci_hcd intel_powerclamp ezusb soundcore x86_pkg_temp_thermal usbserial xts gf128mul dm_crypt dm_mod raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx e1000e ptp pps_core crc32c_intel ghash_clmulni_intel sata_sil24 cryptd crct10dif_pclmul r8169 ehci_pci thermal ehci_hcd crc32_pclmul fan mii sata_mv usbcore usb_common [last unloaded: kl5kusb105] > >>CPU: 0 PID: 25515 Comm: kworker/u8:2 Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 > >>Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012 > >>Workqueue: btrfs-endio-write normal_work_helper > >>task: ffff880019d58610 ti: ffff880004c24000 task.ti: ffff880004c24000 > >>RIP: 0010:[<ffffffff8125751c>] [<ffffffff8125751c>] btrfs_assert_tree_read_locked+0x10/0x14 > >>RSP: 0018:ffff880004c27848 EFLAGS: 00010246 > >>RAX: 0000000000000000 RBX: ffff8800353cadd0 RCX: 0000000000000000 > >>RDX: 000000000000001e RSI: ffff8801d6e46e10 RDI: ffff8800353cadd0 > >>RBP: ffff880004c27848 R08: 0000000000000000 R09: ffff880041c943c8 > >>R10: ffffffffffffffff R11: 0000000000000000 R12: 000000000000a037 > >>R13: 0000000000000001 R14: ffff88020f032c00 R15: ffff8801957d201f > >>FS: 0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000 > >>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>CR2: 00000000f777c064 CR3: 0000000045787000 CR4: 00000000000407f0 > >>Stack: > >> ffff880004c27868 ffffffff812578bd ffff8800353cadd0 000000000000a037 > >> ffff880004c278b0 ffffffff81224799 0000000000000000 0000000000000000 > >> 0000000000000000 0000000000000000 0000000000000000 ffff8802121b2800 > >>Call Trace: > >> [<ffffffff812578bd>] btrfs_tree_read_unlock_blocking+0x5c/0xb7 > >> [<ffffffff81224799>] verify_parent_transid+0x138/0x150 > >> [<ffffffff81224867>] btree_read_extent_buffer_pages.constprop.45+0x7f/0x100 > >> [<ffffffff8122722a>] btrfs_read_buffer+0x2c/0x2e > >> [<ffffffff8120e7cf>] read_block_for_search.isra.26+0xae/0x287 > >> [<ffffffff8121078d>] btrfs_search_old_slot+0x3e6/0x6bc > >> [<ffffffff812454f9>] ? release_extent_buffer+0xad/0xb8 > >> [<ffffffff812747cd>] __resolve_indirect_refs+0x16c/0x50d > >> [<ffffffff81275a48>] find_parent_nodes+0x8e6/0x90a > >> [<ffffffff81275aee>] __btrfs_find_all_roots+0x82/0xe0 > >> [<ffffffff81275bb2>] btrfs_find_all_roots+0x4a/0x6a > >> [<ffffffff81278c4a>] btrfs_qgroup_account_ref+0x10c/0x403 > >> [<ffffffff8113fb2c>] ? __cache_free.isra.40+0x1b4/0x1c3 > >> [<ffffffff81217de9>] btrfs_delayed_refs_qgroup_accounting+0xb5/0xe8 > >> [<ffffffff8122bf37>] __btrfs_end_transaction+0x60/0x2cd > >> [<ffffffff8122c1b4>] btrfs_end_transaction+0x10/0x12 > >> [<ffffffff812327fd>] btrfs_finish_ordered_io+0x307/0x3bd > >> [<ffffffff81232ab2>] finish_ordered_fn+0x15/0x17 > >> [<ffffffff812508db>] normal_work_helper+0xfe/0x240 > >> [<ffffffff81065d7e>] process_one_work+0x195/0x2d2 > >> [<ffffffff81066020>] worker_thread+0x136/0x205 > >> [<ffffffff81065eea>] ? process_scheduled_works+0x2f/0x2f > >> [<ffffffff8106b4b4>] kthread+0xae/0xb6 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >> [<ffffffff8162667c>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106b406>] ? __kthread_parkme+0x61/0x61 > >>Code: 89 e5 f0 ff 0f 5d c3 55 48 89 e5 f0 ff 0f 74 04 31 c0 eb 05 b8 01 00 00 00 5d c3 66 66 66 66 90 8b 47 58 55 48 89 e5 85 c0 75 02 <0f> 0b 5d c3 66 66 66 66 90 55 48 89 e5 41 55 41 89 f5 41 54 53 > >>RIP [<ffffffff8125751c>] btrfs_assert_tree_read_locked+0x10/0x14 > >> RSP <ffff880004c27848> > >>---[ end trace b88f907c1447016f ]--- > >>Kernel panic - not syncing: Fatal exception > >>Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > >>drm_kms_helper: panic occurred, switching back to text console > >>Rebooting in 20 seconds.. > >>ACPI MEMORY or I/O RESET_REG. > >> > >After looking at the stack, it seems that we lack a check for @need_lock since eb > >is taking it when @need_lock is true, could you please try this? > Josef also sent a patch to address this problem before. > > https://patchwork.kernel.org/patch/4424521/ oops, thanks for that. -liubo > > > >thanks, > >-liubo > > > >diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >index 8bb4aa1..f00165d 100644 > >--- a/fs/btrfs/disk-io.c > >+++ b/fs/btrfs/disk-io.c > >@@ -369,7 +369,8 @@ static int verify_parent_transid(struct extent_io_tree > >*io_tree, > > out: > > unlock_extent_cached(io_tree, eb->start, eb->start + eb->len - 1, > > &cached_state, GFP_NOFS); > >- btrfs_tree_read_unlock_blocking(eb); > >+ if (need_lock) > >+ btrfs_tree_read_unlock_blocking(eb); > > return ret; > > } > >-- > >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > >. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks for the patch. Hopefully this will make it to the next 3.15.x
kernel.
I also went back to 3.14 anyway since the 'blocked for 120 seconds' look
like another instance of deadlocks we've been discussing here.
But just curious:
> >>[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120
What should I be doing about this?
Does it mean that I do have some kind of corruption/damage on my
filesystem?
Also, is it possible to have all these messages state which devid they
occurred on? I don't even know which device I should be worrying about
right now, and although I'm running scrub now, my understanding is that
scrub doesn't actually look at FS structures and is likely to miss this
anyway.
Thanks,
Marc
On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote: > Thanks for the patch. Hopefully this will make it to the next 3.15.x > kernel. > > I also went back to 3.14 anyway since the 'blocked for 120 seconds' look > like another instance of deadlocks we've been discussing here. > > But just curious: > > > >>[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 > > What should I be doing about this? > Does it mean that I do have some kind of corruption/damage on my > filesystem? > If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. > Also, is it possible to have all these messages state which devid they > occurred on? I don't even know which device I should be worrying about > right now, and although I'm running scrub now, my understanding is that > scrub doesn't actually look at FS structures and is likely to miss this > anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb->s_id which points to @fs_info->latest_device. thanks, -liubo > > Thanks, > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems .... > .... what McDonalds is to gourmet cooking > Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/03/2014 09:44 PM, Marc MERLIN wrote: > Thanks for the patch. Hopefully this will make it to the next 3.15.x > kernel. > > I also went back to 3.14 anyway since the 'blocked for 120 seconds' look > like another instance of deadlocks we've been discussing here. > > But just curious: > >>>> [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 > What should I be doing about this? > Does it mean that I do have some kind of corruption/damage on my > filesystem? I am afraid, scrub maybe could not fix such kind of errors, all scrub doing is to verify whether checksums match and if possible use good mirrors to rewrite bad one. Such errors seem imply contention itself is corrupted, we may have passed checksum check after ending io, but we fail generation check afterwards. So backups are always good.....:-) > > Also, is it possible to have all these messages state which devid they > occurred on? I don't even know which device I should be worrying about > right now, and although I'm running scrub now, my understanding is that > scrub doesn't actually look at FS structures and is likely to miss this > anyway. To get physical device name, we still need mirror num to know which device we are locating. Feel free to correct me if i miss something here.:-) Thanks, Wang > > Thanks, > Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote: > > > >>[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 > > > > What should I be doing about this? > > Does it mean that I do have some kind of corruption/damage on my > > filesystem? > > > If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read > the copy and repair the crc with the good one, it's all we can do about it. Right. It's not quite my question though. I mean I don't know what device it's on, never mind what file is affected. If I know which file is corrupted, I can simply delete it and restore from backup, no biggie. Right now I don't even know which one of my 3 btrfs filesystems (over 10TB) has this problem. That makes the message kind of problematic: "you have a problem, but not I'm not giving you any fighting chance of finding out where" :) > > Also, is it possible to have all these messages state which devid they > > occurred on? I don't even know which device I should be worrying about > > right now, and although I'm running scrub now, my understanding is that > > scrub doesn't actually look at FS structures and is likely to miss this > > anyway. > > Yes we can but it'd need a bit more effort, for now, all device msg we've seen > in panic info comes from sb->s_id which points to @fs_info->latest_device. Food for though, as is the message is unfortunately close to useless, except to an FS developer with a system that has only one btrfs filesystem. On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote: > I am afraid, scrub maybe could not fix such kind of errors, all scrub > doing is to verify whether checksums match and if possible use good > mirrors to rewrite bad one. I wouldn't be bothered if scrub can't fix it, but it would be good if it could tell me. > Such errors seem imply contention itself is corrupted, we may have passed > checksum check after ending io, but we fail generation check afterwards. So should I really replace scrub with find / -type f -print0 | xargs grep . >/dev/null ? Basically we need something that will scan the filesystem and ensure that all files are reachable correctly without causing filesystem problems, and if one is bad, output the name of the bad file(s). Scrub only does a half job of that it seems. > To get physical device name, we still need mirror num to know which device > we are locating. Ok, so it's missing for now and therefore the code can't easily report it, I understand. Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. Back to the original problem, would you agree that find / -type f -print0 | xargs grep . >/dev/nul? may do a better job scanning the entire FS for problems than scrub would? Thanks, Marc
On 07/04/2014 12:11 PM, Marc MERLIN wrote: > On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote: >>>>>> [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 >>> What should I be doing about this? >>> Does it mean that I do have some kind of corruption/damage on my >>> filesystem? >>> >> If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read >> the copy and repair the crc with the good one, it's all we can do about it. > Right. It's not quite my question though. > I mean I don't know what device it's on, never mind what file is affected. > If I know which file is corrupted, I can simply delete it and restore from > backup, no biggie. > Right now I don't even know which one of my 3 btrfs filesystems (over 10TB) > has this problem. That makes the message kind of problematic: "you have a > problem, but not I'm not giving you any fighting chance of finding out > where" :) > >>> Also, is it possible to have all these messages state which devid they >>> occurred on? I don't even know which device I should be worrying about >>> right now, and although I'm running scrub now, my understanding is that >>> scrub doesn't actually look at FS structures and is likely to miss this >>> anyway. >> Yes we can but it'd need a bit more effort, for now, all device msg we've seen >> in panic info comes from sb->s_id which points to @fs_info->latest_device. > Food for though, as is the message is unfortunately close to useless, except > to an FS developer with a system that has only one btrfs filesystem. > > On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote: >> I am afraid, scrub maybe could not fix such kind of errors, all scrub >> doing is to verify whether checksums match and if possible use good >> mirrors to rewrite bad one. > I wouldn't be bothered if scrub can't fix it, but it would be good if it > could tell me. > >> Such errors seem imply contention itself is corrupted, we may have passed >> checksum check after ending io, but we fail generation check afterwards. > > So should I really replace scrub with > find / -type f -print0 | xargs grep . >/dev/null ? > > Basically we need something that will scan the filesystem and ensure that > all files are reachable correctly without causing filesystem problems, and > if one is bad, output the name of the bad file(s). > Scrub only does a half job of that it seems. > >> To get physical device name, we still need mirror num to know which device >> we are locating. > Ok, so it's missing for now and therefore the code can't easily report it, > I understand. > > Well, I explained the problem, ext4 and others of course tell me which devid > an error is on, hopefully btrfs will able to do so in the near future. So it is ok for you to print one of btrfs filesystem device(for example device name) ? maybe it is not really physical address the metadata locates in, this is easier. > > Back to the original problem, would you agree that > find / -type f -print0 | xargs grep . >/dev/nul? > may do a better job scanning the entire FS for problems than scrub would? > > Thanks, > Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/04/2014 11:07 AM, Liu Bo wrote: > On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote: >> Thanks for the patch. Hopefully this will make it to the next 3.15.x >> kernel. >> >> I also went back to 3.14 anyway since the 'blocked for 120 seconds' look >> like another instance of deadlocks we've been discussing here. >> >> But just curious: >> >>>>> [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 >> What should I be doing about this? >> Does it mean that I do have some kind of corruption/damage on my >> filesystem? >> > If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read > the copy and repair the crc with the good one, it's all we can do about it. > >> Also, is it possible to have all these messages state which devid they >> occurred on? I don't even know which device I should be worrying about >> right now, and although I'm running scrub now, my understanding is that >> scrub doesn't actually look at FS structures and is likely to miss this >> anyway. > Yes we can but it'd need a bit more effort, for now, all device msg we've seen > in panic info comes from sb->s_id which points to @fs_info->latest_device. You means something like this: + printk_ratelimited("BTRFS (device: %s) parent transid verify failed on %llu wanted %llu found %llu\n", + eb->fs_info->sb->s_id, eb->start, + parent_transid, btrfs_header_generation(eb)); > > thanks, > -liubo > >> Thanks, >> Marc >> -- >> "A mouse is a device used to point at the xterm you want to type in" - A.S.R. >> Microsoft is to operating systems .... >> .... what McDonalds is to gourmet cooking >> Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote: > >Well, I explained the problem, ext4 and others of course tell me which > >devid > >an error is on, hopefully btrfs will able to do so in the near future. > > So it is ok for you to print one of btrfs filesystem device(for example > device name) ? maybe it is not really physical address the metadata > locates in, this is easier. Yes, the device name is great, now I can see which of my 3 filesystems has a problem, that's a start. Next would be knowing which filename this occurred in, but I understand this would be harder to get from that point in the code. Ideally scrub should be able to find that problem and report it, at least I would know which filesystem to rescan for errors: > >Back to the original problem, would you agree that > >find / -type f -print0 | xargs grep . >/dev/nul? I'll also have to try this to see if I get lucky with it :) > + printk_ratelimited("BTRFS (device: %s) parent transid verify > failed on %llu wanted %llu found %llu\n", > + eb->fs_info->sb->s_id, eb->start, > + parent_transid, btrfs_header_generation(eb)); That looks great. Ideally all such errors would look like this. Thanks for looking into this, I appreciate it. Best, Marc
On 07/04/2014 02:02 PM, Marc MERLIN wrote: > On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote: >>> Well, I explained the problem, ext4 and others of course tell me which >>> devid >>> an error is on, hopefully btrfs will able to do so in the near future. >> So it is ok for you to print one of btrfs filesystem device(for example >> device name) ? maybe it is not really physical address the metadata >> locates in, this is easier. > Yes, the device name is great, now I can see which of my 3 filesystems has a > problem, that's a start. > Next would be knowing which filename this occurred in, but I understand this > would be harder to get from that point in the code. > Ideally scrub should be able to find that problem and report it, at least I > would know which filesystem to rescan for errors: > >>> Back to the original problem, would you agree that >>> find / -type f -print0 | xargs grep . >/dev/nul? > I'll also have to try this to see if I get lucky with it :) > >> + printk_ratelimited("BTRFS (device: %s) parent transid verify >> failed on %llu wanted %llu found %llu\n", >> + eb->fs_info->sb->s_id, eb->start, >> + parent_transid, btrfs_header_generation(eb)); > That looks great. Ideally all such errors would look like this. OK, i found there is a mirror num recorded in struct @extent_buffer, so it is not diffcult to locate the real physical address that this corrupt metadata block locates. But another question is that if such problems happen, it should also help little, because usually other mirrors maybe have the same errors... Looking at btrfs codes, if it found such error which means checksum has passed but generation verification fails it dosen't even try other mirrors if existed, so i supposed such kinds of errors is fatal and could be corrected by btrfs itself.. Anyway, let me see if there are any other output even dose not output btrfs specific info. Thanks, > > Thanks for looking into this, I appreciate it. > > Best, > Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/04/2014 02:02 PM, Marc MERLIN wrote: > On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote: >>> Well, I explained the problem, ext4 and others of course tell me which >>> devid >>> an error is on, hopefully btrfs will able to do so in the near future. >> So it is ok for you to print one of btrfs filesystem device(for example >> device name) ? maybe it is not really physical address the metadata >> locates in, this is easier. > Yes, the device name is great, now I can see which of my 3 filesystems has a > problem, that's a start. > Next would be knowing which filename this occurred in, but I understand this > would be harder to get from that point in the code. > Ideally scrub should be able to find that problem and report it, at least I > would know which filesystem to rescan for errors: So there is a problem, ususally such generation verifications errors is related to Btrfs metdata block.it is maybe just a Btrfs node , not related to any actually files, or even a leaf that contains more that one file.... If this is a read/write path from normal fs/file root, we may output its' root..but if this is something like extent root...i think it helps little.... > >>> Back to the original problem, would you agree that >>> find / -type f -print0 | xargs grep . >/dev/nul? > I'll also have to try this to see if I get lucky with it :) > >> + printk_ratelimited("BTRFS (device: %s) parent transid verify >> failed on %llu wanted %llu found %llu\n", >> + eb->fs_info->sb->s_id, eb->start, >> + parent_transid, btrfs_header_generation(eb)); > That looks great. Ideally all such errors would look like this. > > Thanks for looking into this, I appreciate it. > > Best, > Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8bb4aa1..f00165d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -369,7 +369,8 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, out: unlock_extent_cached(io_tree, eb->start, eb->start + eb->len - 1, &cached_state, GFP_NOFS); - btrfs_tree_read_unlock_blocking(eb); + if (need_lock) + btrfs_tree_read_unlock_blocking(eb); return ret; }