Message ID | 20180316183627.15476-1-jeffm@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 16, 2018 at 11:36 AM, <jeffm@suse.com> wrote: > From: Jeff Mahoney <jeffm@suse.com> > > While running btrfs/011, I hit the following lockdep splat. > > This is the important bit: > pcpu_alloc+0x1ac/0x5e0 > __percpu_counter_init+0x4e/0xb0 > btrfs_init_fs_root+0x99/0x1c0 [btrfs] > btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] > resolve_indirect_refs+0x130/0x830 [btrfs] > find_parent_nodes+0x69e/0xff0 [btrfs] > btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] > btrfs_find_all_roots+0x50/0x70 [btrfs] > btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] > btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] > > The percpu_counter_init call in btrfs_alloc_subvolume_writers > uses GFP_KERNEL, which we can't do during transaction commit. > > This switches it to GFP_NOFS. > > ======================================================== > WARNING: possible irq lock inversion dependency detected > 4.12.14-kvmsmall #8 Tainted: G W > -------------------------------------------------------- > kswapd0/50 just changed the state of lock: > (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > but this lock took another, RECLAIM_FS-unsafe lock in the past: > (pcpu_alloc_mutex){+.+.+.} > > and interrupts could create inverse lock ordering between them. > > other info that might help us debug this: > Chain exists of: > &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex > > Possible interrupt unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(pcpu_alloc_mutex); > local_irq_disable(); > lock(&delayed_node->mutex); > lock(&found->groups_sem); > <Interrupt> > lock(&delayed_node->mutex); > > *** DEADLOCK *** > > 2 locks held by kswapd0/50: > #0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0 > #1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50 > > the shortest dependencies between 2nd lock and 1st lock: > -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 { > HARDIRQ-ON-W at: > __mutex_lock+0x4e/0x8c0 > pcpu_alloc+0x1ac/0x5e0 > alloc_kmem_cache_cpus.isra.70+0x25/0xa0 > __do_tune_cpucache+0x2c/0x220 > do_tune_cpucache+0x26/0xc0 > enable_cpucache+0x6d/0xf0 > kmem_cache_init_late+0x42/0x75 > start_kernel+0x343/0x4cb > x86_64_start_kernel+0x127/0x134 > secondary_startup_64+0xa5/0xb0 > SOFTIRQ-ON-W at: > __mutex_lock+0x4e/0x8c0 > pcpu_alloc+0x1ac/0x5e0 > alloc_kmem_cache_cpus.isra.70+0x25/0xa0 > __do_tune_cpucache+0x2c/0x220 > do_tune_cpucache+0x26/0xc0 > enable_cpucache+0x6d/0xf0 > kmem_cache_init_late+0x42/0x75 > start_kernel+0x343/0x4cb > x86_64_start_kernel+0x127/0x134 > secondary_startup_64+0xa5/0xb0 > RECLAIM_FS-ON-W at: > __kmalloc+0x47/0x310 > pcpu_extend_area_map+0x2b/0xc0 > pcpu_alloc+0x3ec/0x5e0 > alloc_kmem_cache_cpus.isra.70+0x25/0xa0 > __do_tune_cpucache+0x2c/0x220 > do_tune_cpucache+0x26/0xc0 > enable_cpucache+0x6d/0xf0 > __kmem_cache_create+0x1bf/0x390 > create_cache+0xba/0x1b0 > kmem_cache_create+0x1f8/0x2b0 > ksm_init+0x6f/0x19d > do_one_initcall+0x50/0x1b0 > kernel_init_freeable+0x201/0x289 > kernel_init+0xa/0x100 > ret_from_fork+0x3a/0x50 > INITIAL USE at: > __mutex_lock+0x4e/0x8c0 > pcpu_alloc+0x1ac/0x5e0 > alloc_kmem_cache_cpus.isra.70+0x25/0xa0 > setup_cpu_cache+0x2f/0x1f0 > __kmem_cache_create+0x1bf/0x390 > create_boot_cache+0x8b/0xb1 > kmem_cache_init+0xa1/0x19e > start_kernel+0x270/0x4cb > x86_64_start_kernel+0x127/0x134 > secondary_startup_64+0xa5/0xb0 > } > ... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0 > ... acquired at: > pcpu_alloc+0x1ac/0x5e0 > __percpu_counter_init+0x4e/0xb0 > btrfs_init_fs_root+0x99/0x1c0 [btrfs] > btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] > resolve_indirect_refs+0x130/0x830 [btrfs] > find_parent_nodes+0x69e/0xff0 [btrfs] > btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] > btrfs_find_all_roots+0x50/0x70 [btrfs] > btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] > btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] > transaction_kthread+0x176/0x1b0 [btrfs] > kthread+0x102/0x140 > ret_from_fork+0x3a/0x50 > > -> (&fs_info->commit_root_sem){++++..} ops: 1566382 { > HARDIRQ-ON-W at: > down_write+0x3e/0xa0 > cache_block_group+0x287/0x420 [btrfs] > find_free_extent+0x106c/0x12d0 [btrfs] > btrfs_reserve_extent+0xd8/0x170 [btrfs] > cow_file_range.isra.66+0x133/0x470 [btrfs] > run_delalloc_range+0x121/0x410 [btrfs] > writepage_delalloc.isra.50+0xfe/0x180 [btrfs] > __extent_writepage+0x19a/0x360 [btrfs] > extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] > extent_writepages+0x4d/0x60 [btrfs] > do_writepages+0x1a/0x70 > __filemap_fdatawrite_range+0xa7/0xe0 > btrfs_rename+0x5ee/0xdb0 [btrfs] > vfs_rename+0x52a/0x7e0 > SyS_rename+0x351/0x3b0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > HARDIRQ-ON-R at: > down_read+0x35/0x90 > caching_thread+0x57/0x560 [btrfs] > normal_work_helper+0x1c0/0x5e0 [btrfs] > process_one_work+0x1e0/0x5c0 > worker_thread+0x44/0x390 > kthread+0x102/0x140 > ret_from_fork+0x3a/0x50 > SOFTIRQ-ON-W at: > down_write+0x3e/0xa0 > cache_block_group+0x287/0x420 [btrfs] > find_free_extent+0x106c/0x12d0 [btrfs] > btrfs_reserve_extent+0xd8/0x170 [btrfs] > cow_file_range.isra.66+0x133/0x470 [btrfs] > run_delalloc_range+0x121/0x410 [btrfs] > writepage_delalloc.isra.50+0xfe/0x180 [btrfs] > __extent_writepage+0x19a/0x360 [btrfs] > extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] > extent_writepages+0x4d/0x60 [btrfs] > do_writepages+0x1a/0x70 > __filemap_fdatawrite_range+0xa7/0xe0 > btrfs_rename+0x5ee/0xdb0 [btrfs] > vfs_rename+0x52a/0x7e0 > SyS_rename+0x351/0x3b0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > SOFTIRQ-ON-R at: > down_read+0x35/0x90 > caching_thread+0x57/0x560 [btrfs] > normal_work_helper+0x1c0/0x5e0 [btrfs] > process_one_work+0x1e0/0x5c0 > worker_thread+0x44/0x390 > kthread+0x102/0x140 > ret_from_fork+0x3a/0x50 > INITIAL USE at: > down_write+0x3e/0xa0 > cache_block_group+0x287/0x420 [btrfs] > find_free_extent+0x106c/0x12d0 [btrfs] > btrfs_reserve_extent+0xd8/0x170 [btrfs] > cow_file_range.isra.66+0x133/0x470 [btrfs] > run_delalloc_range+0x121/0x410 [btrfs] > writepage_delalloc.isra.50+0xfe/0x180 [btrfs] > __extent_writepage+0x19a/0x360 [btrfs] > extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] > extent_writepages+0x4d/0x60 [btrfs] > do_writepages+0x1a/0x70 > __filemap_fdatawrite_range+0xa7/0xe0 > btrfs_rename+0x5ee/0xdb0 [btrfs] > vfs_rename+0x52a/0x7e0 > SyS_rename+0x351/0x3b0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > } > ... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs] > ... acquired at: > cache_block_group+0x287/0x420 [btrfs] > find_free_extent+0x106c/0x12d0 [btrfs] > btrfs_reserve_extent+0xd8/0x170 [btrfs] > btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] > btrfs_create_tree+0xbb/0x2a0 [btrfs] > btrfs_create_uuid_tree+0x37/0x140 [btrfs] > open_ctree+0x23c0/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > -> (&found->groups_sem){++++..} ops: 2134587 { > HARDIRQ-ON-W at: > down_write+0x3e/0xa0 > __link_block_group+0x34/0x130 [btrfs] > btrfs_read_block_groups+0x33d/0x7b0 [btrfs] > open_ctree+0x2054/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > HARDIRQ-ON-R at: > down_read+0x35/0x90 > btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] > open_ctree+0x207b/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > SOFTIRQ-ON-W at: > down_write+0x3e/0xa0 > __link_block_group+0x34/0x130 [btrfs] > btrfs_read_block_groups+0x33d/0x7b0 [btrfs] > open_ctree+0x2054/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > SOFTIRQ-ON-R at: > down_read+0x35/0x90 > btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] > open_ctree+0x207b/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > INITIAL USE at: > down_write+0x3e/0xa0 > __link_block_group+0x34/0x130 [btrfs] > btrfs_read_block_groups+0x33d/0x7b0 [btrfs] > open_ctree+0x2054/0x2660 [btrfs] > btrfs_mount+0xd36/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > btrfs_mount+0x18c/0xf90 [btrfs] > mount_fs+0x3a/0x160 > vfs_kern_mount+0x66/0x150 > do_mount+0x1c1/0xcc0 > SyS_mount+0x7e/0xd0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > } > ... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs] > ... acquired at: > find_free_extent+0xcb4/0x12d0 [btrfs] > btrfs_reserve_extent+0xd8/0x170 [btrfs] > btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] > __btrfs_cow_block+0x110/0x5b0 [btrfs] > btrfs_cow_block+0xd7/0x290 [btrfs] > btrfs_search_slot+0x1f6/0x960 [btrfs] > btrfs_lookup_inode+0x2a/0x90 [btrfs] > __btrfs_update_delayed_inode+0x65/0x210 [btrfs] > btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs] > btrfs_evict_inode+0x3fe/0x6a0 [btrfs] > evict+0xc4/0x190 > __dentry_kill+0xbf/0x170 > dput+0x2ae/0x2f0 > SyS_rename+0x2a6/0x3b0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > -> (&delayed_node->mutex){+.+.-.} ops: 5580204 { > HARDIRQ-ON-W at: > __mutex_lock+0x4e/0x8c0 > btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] > btrfs_update_inode+0x83/0x110 [btrfs] > btrfs_dirty_inode+0x62/0xe0 [btrfs] > touch_atime+0x8c/0xb0 > do_generic_file_read+0x818/0xb10 > __vfs_read+0xdc/0x150 > vfs_read+0x8a/0x130 > SyS_read+0x45/0xa0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > SOFTIRQ-ON-W at: > __mutex_lock+0x4e/0x8c0 > btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] > btrfs_update_inode+0x83/0x110 [btrfs] > btrfs_dirty_inode+0x62/0xe0 [btrfs] > touch_atime+0x8c/0xb0 > do_generic_file_read+0x818/0xb10 > __vfs_read+0xdc/0x150 > vfs_read+0x8a/0x130 > SyS_read+0x45/0xa0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > IN-RECLAIM_FS-W at: > __mutex_lock+0x4e/0x8c0 > __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > btrfs_evict_inode+0x22c/0x6a0 [btrfs] > evict+0xc4/0x190 > dispose_list+0x35/0x50 > prune_icache_sb+0x42/0x50 > super_cache_scan+0x139/0x190 > shrink_slab+0x262/0x5b0 > shrink_node+0x2eb/0x2f0 > kswapd+0x2eb/0x890 > kthread+0x102/0x140 > ret_from_fork+0x3a/0x50 > INITIAL USE at: > __mutex_lock+0x4e/0x8c0 > btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] > btrfs_update_inode+0x83/0x110 [btrfs] > btrfs_dirty_inode+0x62/0xe0 [btrfs] > touch_atime+0x8c/0xb0 > do_generic_file_read+0x818/0xb10 > __vfs_read+0xdc/0x150 > vfs_read+0x8a/0x130 > SyS_read+0x45/0xa0 > do_syscall_64+0x79/0x1e0 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > } > ... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs] > ... acquired at: > __lock_acquire+0x264/0x11c0 > lock_acquire+0xbd/0x1e0 > __mutex_lock+0x4e/0x8c0 > __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > btrfs_evict_inode+0x22c/0x6a0 [btrfs] > evict+0xc4/0x190 > dispose_list+0x35/0x50 > prune_icache_sb+0x42/0x50 > super_cache_scan+0x139/0x190 > shrink_slab+0x262/0x5b0 > shrink_node+0x2eb/0x2f0 > kswapd+0x2eb/0x890 > kthread+0x102/0x140 > ret_from_fork+0x3a/0x50 > > stack backtrace: > CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > Call Trace: > dump_stack+0x78/0xb7 > print_irq_inversion_bug.part.38+0x19f/0x1aa > check_usage_forwards+0x102/0x120 > ? ret_from_fork+0x3a/0x50 > ? check_usage_backwards+0x110/0x110 > mark_lock+0x16c/0x270 > __lock_acquire+0x264/0x11c0 > ? pagevec_lookup_entries+0x1a/0x30 > ? truncate_inode_pages_range+0x2b3/0x7f0 > lock_acquire+0xbd/0x1e0 > ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > __mutex_lock+0x4e/0x8c0 > ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs] > __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] > btrfs_evict_inode+0x22c/0x6a0 [btrfs] > evict+0xc4/0x190 > dispose_list+0x35/0x50 > prune_icache_sb+0x42/0x50 > super_cache_scan+0x139/0x190 > shrink_slab+0x262/0x5b0 > shrink_node+0x2eb/0x2f0 > kswapd+0x2eb/0x890 > kthread+0x102/0x140 > ? mem_cgroup_shrink_node+0x2c0/0x2c0 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x3a/0x50 > Looks OK to me. Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> thanks, liubo > Signed-off-by: Jeff Mahoney <jeffm@suse.com> > --- > fs/btrfs/disk-io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 21f34ad0d411..eb6bb3169a9e 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) > if (!writers) > return ERR_PTR(-ENOMEM); > > - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); > + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); > if (ret < 0) { > kfree(writers); > return ERR_PTR(ret); > -- > 2.15.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16.03.2018 20:36, jeffm@suse.com wrote: > From: Jeff Mahoney <jeffm@suse.com> > > While running btrfs/011, I hit the following lockdep splat. > > This is the important bit: > pcpu_alloc+0x1ac/0x5e0 > __percpu_counter_init+0x4e/0xb0 > btrfs_init_fs_root+0x99/0x1c0 [btrfs] > btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] > resolve_indirect_refs+0x130/0x830 [btrfs] > find_parent_nodes+0x69e/0xff0 [btrfs] > btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] > btrfs_find_all_roots+0x50/0x70 [btrfs] > btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] > btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] > > The percpu_counter_init call in btrfs_alloc_subvolume_writers > uses GFP_KERNEL, which we can't do during transaction commit. > > This switches it to GFP_NOFS. Given there is effort underway to actually kill GFP_NOFS and replace it with the context annotation routines, shouldn't instead use those routines directly ? <snip> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/16/18 2:48 PM, Nikolay Borisov wrote: > > > On 16.03.2018 20:36, jeffm@suse.com wrote: >> From: Jeff Mahoney <jeffm@suse.com> >> >> While running btrfs/011, I hit the following lockdep splat. >> >> This is the important bit: >> pcpu_alloc+0x1ac/0x5e0 >> __percpu_counter_init+0x4e/0xb0 >> btrfs_init_fs_root+0x99/0x1c0 [btrfs] >> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] >> resolve_indirect_refs+0x130/0x830 [btrfs] >> find_parent_nodes+0x69e/0xff0 [btrfs] >> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] >> btrfs_find_all_roots+0x50/0x70 [btrfs] >> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] >> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] >> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers >> uses GFP_KERNEL, which we can't do during transaction commit. >> >> This switches it to GFP_NOFS. > > Given there is effort underway to actually kill GFP_NOFS and replace it > with the context annotation routines, shouldn't instead use those > routines directly ? I don't think those have landed yet. When they do, it should obsolete the gfp flags here in any context since we can also read roots from code that doesn't need GFP_NOFS. -Jeff
On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote: > From: Jeff Mahoney <jeffm@suse.com> > > While running btrfs/011, I hit the following lockdep splat. > > This is the important bit: > pcpu_alloc+0x1ac/0x5e0 > __percpu_counter_init+0x4e/0xb0 > btrfs_init_fs_root+0x99/0x1c0 [btrfs] > btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] > resolve_indirect_refs+0x130/0x830 [btrfs] > find_parent_nodes+0x69e/0xff0 [btrfs] > btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] > btrfs_find_all_roots+0x50/0x70 [btrfs] > btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] > btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] > > The percpu_counter_init call in btrfs_alloc_subvolume_writers > uses GFP_KERNEL, which we can't do during transaction commit. > > This switches it to GFP_NOFS. > Signed-off-by: Jeff Mahoney <jeffm@suse.com> > --- > fs/btrfs/disk-io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 21f34ad0d411..eb6bb3169a9e 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) > if (!writers) > return ERR_PTR(-ENOMEM); > > - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); > + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); A line above the diff context is another allocation that does GFP_NOFS, so one of the gfp flags were wrong. Looks like there's another instance where percpu allocates with GFP_KERNEL: create_space_info that can be called from the path that allocates chunks, so this also looks like a NOFS candidate. And in the same function, there's another indirect and hidden GFP_KERNEL allocation from kobject_init_and_add. So in this case we can't fix all the gfp problems at the call site and will have to use the scoped approach eventually. I haven't found any instance of such lockdep reports in my logs (over a long period), so it's quite unlikely to end up in the recursive allocation. Patch added to next, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/16/18 4:12 PM, David Sterba wrote: > On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote: >> From: Jeff Mahoney <jeffm@suse.com> >> >> While running btrfs/011, I hit the following lockdep splat. >> >> This is the important bit: >> pcpu_alloc+0x1ac/0x5e0 >> __percpu_counter_init+0x4e/0xb0 >> btrfs_init_fs_root+0x99/0x1c0 [btrfs] >> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] >> resolve_indirect_refs+0x130/0x830 [btrfs] >> find_parent_nodes+0x69e/0xff0 [btrfs] >> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] >> btrfs_find_all_roots+0x50/0x70 [btrfs] >> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] >> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] >> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers >> uses GFP_KERNEL, which we can't do during transaction commit. >> >> This switches it to GFP_NOFS. > >> Signed-off-by: Jeff Mahoney <jeffm@suse.com> >> --- >> fs/btrfs/disk-io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 21f34ad0d411..eb6bb3169a9e 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) >> if (!writers) >> return ERR_PTR(-ENOMEM); >> >> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); >> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); > > A line above the diff context is another allocation that does GFP_NOFS, > so one of the gfp flags were wrong. This one was wrong. It was initially implicitly GFP_KERNEL until Tejun added the gfp_t argument and used GFP_KERNEL for most of the sites. Since that was effectively a no-op, it was the right thing for him to do without asking every subsystem maintainer their preference. > Looks like there's another instance where percpu allocates with > GFP_KERNEL: create_space_info that can be called from the path that > allocates chunks, so this also looks like a NOFS candidate. That's probably for the same reason. > And in the same function, there's another indirect and hidden GFP_KERNEL > allocation from kobject_init_and_add. So in this case we can't fix all > the gfp problems at the call site and will have to use the scoped > approach eventually. Yep. That's not a huge barrier, though. We can push the kobject_add into a workqueue pretty easily. > I haven't found any instance of such lockdep reports in my logs (over a > long period), so it's quite unlikely to end up in the recursive > allocation. > > Patch added to next, thanks. When hunting to see if this had already been fixed, I did find two reports. One from Qu from April of last year and another from Mike Galbraith in 2016. -Jeff
On 3/16/18 4:12 PM, David Sterba wrote: > On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote: >> From: Jeff Mahoney <jeffm@suse.com> >> >> While running btrfs/011, I hit the following lockdep splat. >> >> This is the important bit: >> pcpu_alloc+0x1ac/0x5e0 >> __percpu_counter_init+0x4e/0xb0 >> btrfs_init_fs_root+0x99/0x1c0 [btrfs] >> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] >> resolve_indirect_refs+0x130/0x830 [btrfs] >> find_parent_nodes+0x69e/0xff0 [btrfs] >> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] >> btrfs_find_all_roots+0x50/0x70 [btrfs] >> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] >> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] >> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers >> uses GFP_KERNEL, which we can't do during transaction commit. >> >> This switches it to GFP_NOFS. > >> Signed-off-by: Jeff Mahoney <jeffm@suse.com> >> --- >> fs/btrfs/disk-io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 21f34ad0d411..eb6bb3169a9e 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) >> if (!writers) >> return ERR_PTR(-ENOMEM); >> >> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); >> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); > > A line above the diff context is another allocation that does GFP_NOFS, > so one of the gfp flags were wrong. > > Looks like there's another instance where percpu allocates with > GFP_KERNEL: create_space_info that can be called from the path that > allocates chunks, so this also looks like a NOFS candidate. We can get rid of this case entirely. Those call sites should be removed since the space_infos are all allocated at mount time. -Jeff
On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote: > On 3/16/18 4:12 PM, David Sterba wrote: > > On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote: > >> From: Jeff Mahoney <jeffm@suse.com> > >> > >> While running btrfs/011, I hit the following lockdep splat. > >> > >> This is the important bit: > >> pcpu_alloc+0x1ac/0x5e0 > >> __percpu_counter_init+0x4e/0xb0 > >> btrfs_init_fs_root+0x99/0x1c0 [btrfs] > >> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] > >> resolve_indirect_refs+0x130/0x830 [btrfs] > >> find_parent_nodes+0x69e/0xff0 [btrfs] > >> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] > >> btrfs_find_all_roots+0x50/0x70 [btrfs] > >> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] > >> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] > >> > >> The percpu_counter_init call in btrfs_alloc_subvolume_writers > >> uses GFP_KERNEL, which we can't do during transaction commit. > >> > >> This switches it to GFP_NOFS. > > > >> Signed-off-by: Jeff Mahoney <jeffm@suse.com> > >> --- > >> fs/btrfs/disk-io.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index 21f34ad0d411..eb6bb3169a9e 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) > >> if (!writers) > >> return ERR_PTR(-ENOMEM); > >> > >> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); > >> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); > > > > A line above the diff context is another allocation that does GFP_NOFS, > > so one of the gfp flags were wrong. > > > > Looks like there's another instance where percpu allocates with > > GFP_KERNEL: create_space_info that can be called from the path that > > allocates chunks, so this also looks like a NOFS candidate. > > We can get rid of this case entirely. Those call sites should be > removed since the space_infos are all allocated at mount time. That would be great and make a few things simpler. So this means that __find_space_info never fails once the space infos are properly initialized, right? That was my concern in do_chunk_alloc and btrfs_make_block_group (that's called from __btrfs_alloc_chunk). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/19/18 2:08 PM, David Sterba wrote: > On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote: >> On 3/16/18 4:12 PM, David Sterba wrote: >>> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote: >>>> From: Jeff Mahoney <jeffm@suse.com> >>>> >>>> While running btrfs/011, I hit the following lockdep splat. >>>> >>>> This is the important bit: >>>> pcpu_alloc+0x1ac/0x5e0 >>>> __percpu_counter_init+0x4e/0xb0 >>>> btrfs_init_fs_root+0x99/0x1c0 [btrfs] >>>> btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] >>>> resolve_indirect_refs+0x130/0x830 [btrfs] >>>> find_parent_nodes+0x69e/0xff0 [btrfs] >>>> btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] >>>> btrfs_find_all_roots+0x50/0x70 [btrfs] >>>> btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] >>>> btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] >>>> >>>> The percpu_counter_init call in btrfs_alloc_subvolume_writers >>>> uses GFP_KERNEL, which we can't do during transaction commit. >>>> >>>> This switches it to GFP_NOFS. >>> >>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com> >>>> --- >>>> fs/btrfs/disk-io.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index 21f34ad0d411..eb6bb3169a9e 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) >>>> if (!writers) >>>> return ERR_PTR(-ENOMEM); >>>> >>>> - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); >>>> + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); >>> >>> A line above the diff context is another allocation that does GFP_NOFS, >>> so one of the gfp flags were wrong. >>> >>> Looks like there's another instance where percpu allocates with >>> GFP_KERNEL: create_space_info that can be called from the path that >>> allocates chunks, so this also looks like a NOFS candidate. >> >> We can get rid of this case entirely. Those call sites should be >> removed since the space_infos are all allocated at mount time. > > That would be great and make a few things simpler. So this means that > __find_space_info never fails once the space infos are properly > initialized, right? That was my concern in do_chunk_alloc and > btrfs_make_block_group (that's called from __btrfs_alloc_chunk). That's a different case. The raid levels are added when the first block group of a particular read level is loaded up. That can happen when the block groups are read in initially, where it should be safe to use GFP_KERNEL or when a chunk of a new type is allocated. The thing is that a chunk of a new type will only be allocated when we're converting via balance, so we may be able to do the kobject_add for the raid level when we start the balance rather than wait for it to create the block group. -Jeff
======================================================== WARNING: possible irq lock inversion dependency detected 4.12.14-kvmsmall #8 Tainted: G W -------------------------------------------------------- kswapd0/50 just changed the state of lock: (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] but this lock took another, RECLAIM_FS-unsafe lock in the past: (pcpu_alloc_mutex){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcpu_alloc_mutex); local_irq_disable(); lock(&delayed_node->mutex); lock(&found->groups_sem); <Interrupt> lock(&delayed_node->mutex); *** DEADLOCK *** 2 locks held by kswapd0/50: #0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0 #1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50 the shortest dependencies between 2nd lock and 1st lock: -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 RECLAIM_FS-ON-W at: __kmalloc+0x47/0x310 pcpu_extend_area_map+0x2b/0xc0 pcpu_alloc+0x3ec/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 __kmem_cache_create+0x1bf/0x390 create_cache+0xba/0x1b0 kmem_cache_create+0x1f8/0x2b0 ksm_init+0x6f/0x19d do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x201/0x289 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 setup_cpu_cache+0x2f/0x1f0 __kmem_cache_create+0x1bf/0x390 create_boot_cache+0x8b/0xb1 kmem_cache_init+0xa1/0x19e start_kernel+0x270/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 } ... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0 ... acquired at: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] transaction_kthread+0x176/0x1b0 [btrfs] kthread+0x102/0x140 ret_from_fork+0x3a/0x50 -> (&fs_info->commit_root_sem){++++..} ops: 1566382 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs] ... acquired at: cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] btrfs_create_tree+0xbb/0x2a0 [btrfs] btrfs_create_uuid_tree+0x37/0x140 [btrfs] open_ctree+0x23c0/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&found->groups_sem){++++..} ops: 2134587 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 INITIAL USE at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs] ... acquired at: find_free_extent+0xcb4/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] __btrfs_cow_block+0x110/0x5b0 [btrfs] btrfs_cow_block+0xd7/0x290 [btrfs] btrfs_search_slot+0x1f6/0x960 [btrfs] btrfs_lookup_inode+0x2a/0x90 [btrfs] __btrfs_update_delayed_inode+0x65/0x210 [btrfs] btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs] btrfs_evict_inode+0x3fe/0x6a0 [btrfs] evict+0xc4/0x190 __dentry_kill+0xbf/0x170 dput+0x2ae/0x2f0 SyS_rename+0x2a6/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&delayed_node->mutex){+.+.-.} ops: 5580204 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 IN-RECLAIM_FS-W at: __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs] ... acquired at: __lock_acquire+0x264/0x11c0 lock_acquire+0xbd/0x1e0 __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 stack backtrace: CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x78/0xb7 print_irq_inversion_bug.part.38+0x19f/0x1aa check_usage_forwards+0x102/0x120 ? ret_from_fork+0x3a/0x50 ? check_usage_backwards+0x110/0x110 mark_lock+0x16c/0x270 __lock_acquire+0x264/0x11c0 ? pagevec_lookup_entries+0x1a/0x30 ? truncate_inode_pages_range+0x2b3/0x7f0 lock_acquire+0xbd/0x1e0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] __mutex_lock+0x4e/0x8c0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ? mem_cgroup_shrink_node+0x2c0/0x2c0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x3a/0x50 Signed-off-by: Jeff Mahoney <jeffm@suse.com> --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 21f34ad0d411..eb6bb3169a9e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void) if (!writers) return ERR_PTR(-ENOMEM); - ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL); + ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS); if (ret < 0) { kfree(writers); return ERR_PTR(ret);
From: Jeff Mahoney <jeffm@suse.com> While running btrfs/011, I hit the following lockdep splat. This is the important bit: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] The percpu_counter_init call in btrfs_alloc_subvolume_writers uses GFP_KERNEL, which we can't do during transaction commit. This switches it to GFP_NOFS.