Message ID | 1422375078-6916-1-git-send-email-fdmanana@suse.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Tue, Jan 27, 2015 at 11:11 AM, Filipe Manana <fdmanana@suse.com> wrote: > While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got > the following trace: This actually trades one bug for another: [ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621^M [ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress^M [ 1928.981324] INFO: lockdep is turned off.^M [ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G W 3.19.0-rc7-mason+ #41^M [ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012^M [ 1929.022207] ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78^M [ 1929.037267] ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8^M [ 1929.052315] 0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8^M [ 1929.067381] Call Trace:^M [ 1929.072344] <IRQ> [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e^M [ 1929.083968] [<ffffffff810856c4>] ___might_sleep+0x174/0x230^M [ 1929.095352] [<ffffffff810857d2>] __might_sleep+0x52/0x90^M [ 1929.106223] [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0^M [ 1929.117951] [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10^M [ 1929.129708] [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs]^M [ 1929.143370] [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs]^M [ 1929.157191] [<ffffffff812fa603>] bio_endio+0x53/0xa0^M [ 1929.167382] [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs]^M [ 1929.180161] [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs]^M [ 1929.194318] [<ffffffff812fa603>] bio_endio+0x53/0xa0^M [ 1929.204496] [<ffffffff8130401b>] blk_update_request+0x1eb/0x450^M [ 1929.216569] [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500^M [ 1929.229176] [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0^M [ 1929.240740] [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0^M [ 1929.252654] [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150^M [ 1929.264725] [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170^M [ 1929.276635] [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0^M [ 1929.288014] [<ffffffff8105d92e>] __do_softirq+0xde/0x600^M [ 1929.298885] [<ffffffff8105df6d>] irq_exit+0xbd/0xd0^M [ 1929.308879] [<ffffffff81034ea5>] smp_call_function_single_interrupt+0x35/0x40^M [ 1929.323455] [<ffffffff816c126f>] call_function_single_interrupt+0x6f/0x80^M [ 1929.337270] <EOI> [<ffffffff811fc745>] ? sync_inodes_sb+0x1b5/0x2a0^M [ 1929.350261] [<ffffffff811fc728>] ? sync_inodes_sb+0x198/0x2a0^M [ 1929.361991] [<ffffffff816badcf>] ? wait_for_completion+0xef/0x120^M [ 1929.374423] [<ffffffff812028d0>] ? fdatawrite_one_bdev+0x20/0x20^M [ 1929.386671] [<ffffffff812028d0>] ? fdatawrite_one_bdev+0x20/0x20^M [ 1929.398930] [<ffffffff812028ed>] sync_inodes_one_sb+0x1d/0x30^M [ 1929.410668] [<ffffffff811cf4c6>] iterate_supers+0xb6/0xf0^M [ 1929.421712] [<ffffffff81202935>] sys_sync+0x35/0x90^M [ 1929.431704] [<ffffffff816bfed2>] system_call_fastpath+0x12/0x17^M So we'll have to either put in a refcount or a spinlock instead. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8f75b99..8252a5e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -306,8 +306,17 @@ static void scrub_pending_bio_inc(struct scrub_ctx *sctx) static void scrub_pending_bio_dec(struct scrub_ctx *sctx) { + struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info; + + /* + * Hold the scrub_lock while doing the wakeup to ensure the + * sctx (and its wait queue list_wait) isn't destroyed/freed + * during the wakeup. + */ + mutex_lock(&fs_info->scrub_lock); atomic_dec(&sctx->bios_in_flight); wake_up(&sctx->list_wait); + mutex_unlock(&fs_info->scrub_lock); } static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) @@ -379,10 +388,15 @@ static void scrub_pending_trans_workers_dec(struct scrub_ctx *sctx) mutex_lock(&fs_info->scrub_lock); atomic_dec(&fs_info->scrubs_running); atomic_dec(&fs_info->scrubs_paused); - mutex_unlock(&fs_info->scrub_lock); atomic_dec(&sctx->workers_pending); wake_up(&fs_info->scrub_pause_wait); + /* + * Hold the scrub_lock while doing the wakeup to ensure the + * sctx (and its wait queue list_wait) isn't destroyed/freed + * during the wakeup. + */ wake_up(&sctx->list_wait); + mutex_unlock(&fs_info->scrub_lock); } static void scrub_free_csums(struct scrub_ctx *sctx)
While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got the following trace: [68127.807663] BUG: unable to handle kernel paging request at ffff8803f8947a50 [68127.807663] IP: [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] PGD 3003067 PUD 43e1f5067 PMD 43e030067 PTE 80000003f8947060 [68127.807663] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [68127.807663] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc processor parpo [68127.807663] CPU: 2 PID: 3081 Comm: kworker/u8:5 Not tainted 3.18.0-rc6-btrfs-next-3+ #4 [68127.807663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [68127.807663] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs] [68127.807663] task: ffff880101fc5250 ti: ffff8803f097c000 task.ti: ffff8803f097c000 [68127.807663] RIP: 0010:[<ffffffff8107da31>] [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] RSP: 0018:ffff8803f097fbb8 EFLAGS: 00010093 [68127.807663] RAX: 0000000028dd386c RBX: ffff8803f8947a50 RCX: 0000000028dd3854 [68127.807663] RDX: 0000000000000018 RSI: 0000000000000002 RDI: 0000000000000001 [68127.807663] RBP: ffff8803f097fbd8 R08: 0000000000000004 R09: 0000000000000001 [68127.807663] R10: ffff880102620980 R11: ffff8801f3e8c900 R12: 000000000001d390 [68127.807663] R13: 00000000cabd13c8 R14: ffff8803f8947800 R15: ffff88037c574f00 [68127.807663] FS: 0000000000000000(0000) GS:ffff88043dd00000(0000) knlGS:0000000000000000 [68127.807663] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [68127.807663] CR2: ffff8803f8947a50 CR3: 00000000b6481000 CR4: 00000000000006e0 [68127.807663] Stack: [68127.807663] ffffffff823942a8 ffff8803f8947a50 ffff8802a3416f80 0000000000000000 [68127.807663] ffff8803f097fc18 ffffffff8141e7c0 ffffffff81072948 000000000034f314 [68127.807663] ffff8803f097fc08 0000000000000292 ffff8803f097fc48 ffff8803f8947a50 [68127.807663] Call Trace: [68127.807663] [<ffffffff8141e7c0>] _raw_spin_lock_irqsave+0x4b/0x55 [68127.807663] [<ffffffff81072948>] ? __wake_up+0x22/0x4b [68127.807663] [<ffffffff81072948>] __wake_up+0x22/0x4b [68127.807663] [<ffffffffa0392327>] scrub_pending_bio_dec+0x32/0x36 [btrfs] [68127.807663] [<ffffffffa0395e70>] scrub_bio_end_io_worker+0x5a3/0x5c9 [btrfs] [68127.807663] [<ffffffff810e0c7c>] ? time_hardirqs_off+0x15/0x28 [68127.807663] [<ffffffff81078106>] ? trace_hardirqs_off_caller+0x4c/0xb9 [68127.807663] [<ffffffffa0372a7c>] normal_work_helper+0xf1/0x238 [btrfs] [68127.807663] [<ffffffffa0372d3d>] btrfs_scrub_helper+0x12/0x14 [btrfs] [68127.807663] [<ffffffff810582d2>] process_one_work+0x1e4/0x3b6 [68127.807663] [<ffffffff81078180>] ? trace_hardirqs_off+0xd/0xf [68127.807663] [<ffffffff81058dc9>] worker_thread+0x1fb/0x2a8 [68127.807663] [<ffffffff81058bce>] ? rescuer_thread+0x219/0x219 [68127.807663] [<ffffffff8105cd75>] kthread+0xdb/0xe3 [68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67 [68127.807663] [<ffffffff8141f1ec>] ret_from_fork+0x7c/0xb0 [68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67 [68127.807663] Code: 39 c2 75 14 8d 8a 00 00 01 00 89 d0 f0 0f b1 0b 39 d0 0f 84 81 00 00 00 4c 69 2d 27 86 99 00 fa 00 00 00 45 31 e4 4d 39 ec 74 2b <8b> 13 89 d0 c1 e8 10 66 39 c2 75 [68127.807663] RIP [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] RSP <ffff8803f097fbb8> [68127.807663] CR2: ffff8803f8947a50 [68127.807663] ---[ end trace d7045aac00a66cd8 ]--- This is due to a very tiny time window that can happen when wake_up() attempts to unlock the wait queue's lock, the scrub context object (sctx), which embeds the wait queue, is freed by the main scrub task: CPU 1 CPU 2 btrfs_scrub_dev() wait sctx->bios_in_flight == 0 scrub_bio_end_io_worker() scrub_pending_bio_dec() __wake_up() spin_lock_irqsave(&sctx->list_wait->lock, flags) default_wake_function() wake up task at CPU 2 wait sctx->workers_pending == 0 mutex_lock(&fs_info->scrub_lock) (...) mutex_lock(&fs_info->scrub_lock) scrub_free_ctx(sctx) kfree(sctx) spin_unlock_irqrestore(&sctx->list_wait->lock, flags) Fix this by holding the scrub lock while doing the wakeup. This isn't a recent regression, the issue as been around since the scrub feature was added (2011, commit a2de733c78fa7af51ba9670482fa7d392aa67c57). Signed-off-by: Filipe Manana <fdmanana@suse.com> --- fs/btrfs/scrub.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)