btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
diff mbox

Message ID 20180316183627.15476-1-jeffm@suse.com
State New
Headers show

Commit Message

Jeff Mahoney March 16, 2018, 6:36 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

While running btrfs/011, I hit the following lockdep splat.

This is the important bit:
   pcpu_alloc+0x1ac/0x5e0
   __percpu_counter_init+0x4e/0xb0
   btrfs_init_fs_root+0x99/0x1c0 [btrfs]
   btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
   resolve_indirect_refs+0x130/0x830 [btrfs]
   find_parent_nodes+0x69e/0xff0 [btrfs]
   btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
   btrfs_find_all_roots+0x50/0x70 [btrfs]
   btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
   btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]

The percpu_counter_init call in btrfs_alloc_subvolume_writers
uses GFP_KERNEL, which we can't do during transaction commit.

This switches it to GFP_NOFS.

Comments

Liu Bo March 16, 2018, 6:45 p.m. UTC | #1
On Fri, Mar 16, 2018 at 11:36 AM,  <jeffm@suse.com> wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> While running btrfs/011, I hit the following lockdep splat.
>
> This is the important bit:
>    pcpu_alloc+0x1ac/0x5e0
>    __percpu_counter_init+0x4e/0xb0
>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>    resolve_indirect_refs+0x130/0x830 [btrfs]
>    find_parent_nodes+0x69e/0xff0 [btrfs]
>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
>
> This switches it to GFP_NOFS.
>
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 4.12.14-kvmsmall #8 Tainted: G        W
> --------------------------------------------------------
> kswapd0/50 just changed the state of lock:
>  (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
> but this lock took another, RECLAIM_FS-unsafe lock in the past:
>  (pcpu_alloc_mutex){+.+.+.}
>
> and interrupts could create inverse lock ordering between them.
>
> other info that might help us debug this:
> Chain exists of:
>   &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
>
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(pcpu_alloc_mutex);
>                                local_irq_disable();
>                                lock(&delayed_node->mutex);
>                                lock(&found->groups_sem);
>   <Interrupt>
>     lock(&delayed_node->mutex);
>
>  *** DEADLOCK ***
>
> 2 locks held by kswapd0/50:
>  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
>  #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
>
> the shortest dependencies between 2nd lock and 1st lock:
>    -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
>       HARDIRQ-ON-W at:
>                           __mutex_lock+0x4e/0x8c0
>                           pcpu_alloc+0x1ac/0x5e0
>                           alloc_kmem_cache_cpus.isra.70+0x25/0xa0
>                           __do_tune_cpucache+0x2c/0x220
>                           do_tune_cpucache+0x26/0xc0
>                           enable_cpucache+0x6d/0xf0
>                           kmem_cache_init_late+0x42/0x75
>                           start_kernel+0x343/0x4cb
>                           x86_64_start_kernel+0x127/0x134
>                           secondary_startup_64+0xa5/0xb0
>       SOFTIRQ-ON-W at:
>                           __mutex_lock+0x4e/0x8c0
>                           pcpu_alloc+0x1ac/0x5e0
>                           alloc_kmem_cache_cpus.isra.70+0x25/0xa0
>                           __do_tune_cpucache+0x2c/0x220
>                           do_tune_cpucache+0x26/0xc0
>                           enable_cpucache+0x6d/0xf0
>                           kmem_cache_init_late+0x42/0x75
>                           start_kernel+0x343/0x4cb
>                           x86_64_start_kernel+0x127/0x134
>                           secondary_startup_64+0xa5/0xb0
>       RECLAIM_FS-ON-W at:
>                              __kmalloc+0x47/0x310
>                              pcpu_extend_area_map+0x2b/0xc0
>                              pcpu_alloc+0x3ec/0x5e0
>                              alloc_kmem_cache_cpus.isra.70+0x25/0xa0
>                              __do_tune_cpucache+0x2c/0x220
>                              do_tune_cpucache+0x26/0xc0
>                              enable_cpucache+0x6d/0xf0
>                              __kmem_cache_create+0x1bf/0x390
>                              create_cache+0xba/0x1b0
>                              kmem_cache_create+0x1f8/0x2b0
>                              ksm_init+0x6f/0x19d
>                              do_one_initcall+0x50/0x1b0
>                              kernel_init_freeable+0x201/0x289
>                              kernel_init+0xa/0x100
>                              ret_from_fork+0x3a/0x50
>       INITIAL USE at:
>                          __mutex_lock+0x4e/0x8c0
>                          pcpu_alloc+0x1ac/0x5e0
>                          alloc_kmem_cache_cpus.isra.70+0x25/0xa0
>                          setup_cpu_cache+0x2f/0x1f0
>                          __kmem_cache_create+0x1bf/0x390
>                          create_boot_cache+0x8b/0xb1
>                          kmem_cache_init+0xa1/0x19e
>                          start_kernel+0x270/0x4cb
>                          x86_64_start_kernel+0x127/0x134
>                          secondary_startup_64+0xa5/0xb0
>     }
>     ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
>     ... acquired at:
>    pcpu_alloc+0x1ac/0x5e0
>    __percpu_counter_init+0x4e/0xb0
>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>    resolve_indirect_refs+0x130/0x830 [btrfs]
>    find_parent_nodes+0x69e/0xff0 [btrfs]
>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>    transaction_kthread+0x176/0x1b0 [btrfs]
>    kthread+0x102/0x140
>    ret_from_fork+0x3a/0x50
>
>   -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
>      HARDIRQ-ON-W at:
>                         down_write+0x3e/0xa0
>                         cache_block_group+0x287/0x420 [btrfs]
>                         find_free_extent+0x106c/0x12d0 [btrfs]
>                         btrfs_reserve_extent+0xd8/0x170 [btrfs]
>                         cow_file_range.isra.66+0x133/0x470 [btrfs]
>                         run_delalloc_range+0x121/0x410 [btrfs]
>                         writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
>                         __extent_writepage+0x19a/0x360 [btrfs]
>                         extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
>                         extent_writepages+0x4d/0x60 [btrfs]
>                         do_writepages+0x1a/0x70
>                         __filemap_fdatawrite_range+0xa7/0xe0
>                         btrfs_rename+0x5ee/0xdb0 [btrfs]
>                         vfs_rename+0x52a/0x7e0
>                         SyS_rename+0x351/0x3b0
>                         do_syscall_64+0x79/0x1e0
>                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
>      HARDIRQ-ON-R at:
>                         down_read+0x35/0x90
>                         caching_thread+0x57/0x560 [btrfs]
>                         normal_work_helper+0x1c0/0x5e0 [btrfs]
>                         process_one_work+0x1e0/0x5c0
>                         worker_thread+0x44/0x390
>                         kthread+0x102/0x140
>                         ret_from_fork+0x3a/0x50
>      SOFTIRQ-ON-W at:
>                         down_write+0x3e/0xa0
>                         cache_block_group+0x287/0x420 [btrfs]
>                         find_free_extent+0x106c/0x12d0 [btrfs]
>                         btrfs_reserve_extent+0xd8/0x170 [btrfs]
>                         cow_file_range.isra.66+0x133/0x470 [btrfs]
>                         run_delalloc_range+0x121/0x410 [btrfs]
>                         writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
>                         __extent_writepage+0x19a/0x360 [btrfs]
>                         extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
>                         extent_writepages+0x4d/0x60 [btrfs]
>                         do_writepages+0x1a/0x70
>                         __filemap_fdatawrite_range+0xa7/0xe0
>                         btrfs_rename+0x5ee/0xdb0 [btrfs]
>                         vfs_rename+0x52a/0x7e0
>                         SyS_rename+0x351/0x3b0
>                         do_syscall_64+0x79/0x1e0
>                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
>      SOFTIRQ-ON-R at:
>                         down_read+0x35/0x90
>                         caching_thread+0x57/0x560 [btrfs]
>                         normal_work_helper+0x1c0/0x5e0 [btrfs]
>                         process_one_work+0x1e0/0x5c0
>                         worker_thread+0x44/0x390
>                         kthread+0x102/0x140
>                         ret_from_fork+0x3a/0x50
>      INITIAL USE at:
>                        down_write+0x3e/0xa0
>                        cache_block_group+0x287/0x420 [btrfs]
>                        find_free_extent+0x106c/0x12d0 [btrfs]
>                        btrfs_reserve_extent+0xd8/0x170 [btrfs]
>                        cow_file_range.isra.66+0x133/0x470 [btrfs]
>                        run_delalloc_range+0x121/0x410 [btrfs]
>                        writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
>                        __extent_writepage+0x19a/0x360 [btrfs]
>                        extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
>                        extent_writepages+0x4d/0x60 [btrfs]
>                        do_writepages+0x1a/0x70
>                        __filemap_fdatawrite_range+0xa7/0xe0
>                        btrfs_rename+0x5ee/0xdb0 [btrfs]
>                        vfs_rename+0x52a/0x7e0
>                        SyS_rename+0x351/0x3b0
>                        do_syscall_64+0x79/0x1e0
>                        entry_SYSCALL_64_after_hwframe+0x42/0xb7
>    }
>    ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
>    ... acquired at:
>    cache_block_group+0x287/0x420 [btrfs]
>    find_free_extent+0x106c/0x12d0 [btrfs]
>    btrfs_reserve_extent+0xd8/0x170 [btrfs]
>    btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
>    btrfs_create_tree+0xbb/0x2a0 [btrfs]
>    btrfs_create_uuid_tree+0x37/0x140 [btrfs]
>    open_ctree+0x23c0/0x2660 [btrfs]
>    btrfs_mount+0xd36/0xf90 [btrfs]
>    mount_fs+0x3a/0x160
>    vfs_kern_mount+0x66/0x150
>    btrfs_mount+0x18c/0xf90 [btrfs]
>    mount_fs+0x3a/0x160
>    vfs_kern_mount+0x66/0x150
>    do_mount+0x1c1/0xcc0
>    SyS_mount+0x7e/0xd0
>    do_syscall_64+0x79/0x1e0
>    entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
>  -> (&found->groups_sem){++++..} ops: 2134587 {
>     HARDIRQ-ON-W at:
>                       down_write+0x3e/0xa0
>                       __link_block_group+0x34/0x130 [btrfs]
>                       btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
>                       open_ctree+0x2054/0x2660 [btrfs]
>                       btrfs_mount+0xd36/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       btrfs_mount+0x18c/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       do_mount+0x1c1/0xcc0
>                       SyS_mount+0x7e/0xd0
>                       do_syscall_64+0x79/0x1e0
>                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
>     HARDIRQ-ON-R at:
>                       down_read+0x35/0x90
>                       btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
>                       open_ctree+0x207b/0x2660 [btrfs]
>                       btrfs_mount+0xd36/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       btrfs_mount+0x18c/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       do_mount+0x1c1/0xcc0
>                       SyS_mount+0x7e/0xd0
>                       do_syscall_64+0x79/0x1e0
>                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
>     SOFTIRQ-ON-W at:
>                       down_write+0x3e/0xa0
>                       __link_block_group+0x34/0x130 [btrfs]
>                       btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
>                       open_ctree+0x2054/0x2660 [btrfs]
>                       btrfs_mount+0xd36/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       btrfs_mount+0x18c/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       do_mount+0x1c1/0xcc0
>                       SyS_mount+0x7e/0xd0
>                       do_syscall_64+0x79/0x1e0
>                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
>     SOFTIRQ-ON-R at:
>                       down_read+0x35/0x90
>                       btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
>                       open_ctree+0x207b/0x2660 [btrfs]
>                       btrfs_mount+0xd36/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       btrfs_mount+0x18c/0xf90 [btrfs]
>                       mount_fs+0x3a/0x160
>                       vfs_kern_mount+0x66/0x150
>                       do_mount+0x1c1/0xcc0
>                       SyS_mount+0x7e/0xd0
>                       do_syscall_64+0x79/0x1e0
>                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
>     INITIAL USE at:
>                      down_write+0x3e/0xa0
>                      __link_block_group+0x34/0x130 [btrfs]
>                      btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
>                      open_ctree+0x2054/0x2660 [btrfs]
>                      btrfs_mount+0xd36/0xf90 [btrfs]
>                      mount_fs+0x3a/0x160
>                      vfs_kern_mount+0x66/0x150
>                      btrfs_mount+0x18c/0xf90 [btrfs]
>                      mount_fs+0x3a/0x160
>                      vfs_kern_mount+0x66/0x150
>                      do_mount+0x1c1/0xcc0
>                      SyS_mount+0x7e/0xd0
>                      do_syscall_64+0x79/0x1e0
>                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
>   }
>   ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
>   ... acquired at:
>    find_free_extent+0xcb4/0x12d0 [btrfs]
>    btrfs_reserve_extent+0xd8/0x170 [btrfs]
>    btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
>    __btrfs_cow_block+0x110/0x5b0 [btrfs]
>    btrfs_cow_block+0xd7/0x290 [btrfs]
>    btrfs_search_slot+0x1f6/0x960 [btrfs]
>    btrfs_lookup_inode+0x2a/0x90 [btrfs]
>    __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
>    btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
>    btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
>    evict+0xc4/0x190
>    __dentry_kill+0xbf/0x170
>    dput+0x2ae/0x2f0
>    SyS_rename+0x2a6/0x3b0
>    do_syscall_64+0x79/0x1e0
>    entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
>    HARDIRQ-ON-W at:
>                     __mutex_lock+0x4e/0x8c0
>                     btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
>                     btrfs_update_inode+0x83/0x110 [btrfs]
>                     btrfs_dirty_inode+0x62/0xe0 [btrfs]
>                     touch_atime+0x8c/0xb0
>                     do_generic_file_read+0x818/0xb10
>                     __vfs_read+0xdc/0x150
>                     vfs_read+0x8a/0x130
>                     SyS_read+0x45/0xa0
>                     do_syscall_64+0x79/0x1e0
>                     entry_SYSCALL_64_after_hwframe+0x42/0xb7
>    SOFTIRQ-ON-W at:
>                     __mutex_lock+0x4e/0x8c0
>                     btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
>                     btrfs_update_inode+0x83/0x110 [btrfs]
>                     btrfs_dirty_inode+0x62/0xe0 [btrfs]
>                     touch_atime+0x8c/0xb0
>                     do_generic_file_read+0x818/0xb10
>                     __vfs_read+0xdc/0x150
>                     vfs_read+0x8a/0x130
>                     SyS_read+0x45/0xa0
>                     do_syscall_64+0x79/0x1e0
>                     entry_SYSCALL_64_after_hwframe+0x42/0xb7
>    IN-RECLAIM_FS-W at:
>                        __mutex_lock+0x4e/0x8c0
>                        __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>                        btrfs_evict_inode+0x22c/0x6a0 [btrfs]
>                        evict+0xc4/0x190
>                        dispose_list+0x35/0x50
>                        prune_icache_sb+0x42/0x50
>                        super_cache_scan+0x139/0x190
>                        shrink_slab+0x262/0x5b0
>                        shrink_node+0x2eb/0x2f0
>                        kswapd+0x2eb/0x890
>                        kthread+0x102/0x140
>                        ret_from_fork+0x3a/0x50
>    INITIAL USE at:
>                    __mutex_lock+0x4e/0x8c0
>                    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
>                    btrfs_update_inode+0x83/0x110 [btrfs]
>                    btrfs_dirty_inode+0x62/0xe0 [btrfs]
>                    touch_atime+0x8c/0xb0
>                    do_generic_file_read+0x818/0xb10
>                    __vfs_read+0xdc/0x150
>                    vfs_read+0x8a/0x130
>                    SyS_read+0x45/0xa0
>                    do_syscall_64+0x79/0x1e0
>                    entry_SYSCALL_64_after_hwframe+0x42/0xb7
>  }
>  ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
>  ... acquired at:
>    __lock_acquire+0x264/0x11c0
>    lock_acquire+0xbd/0x1e0
>    __mutex_lock+0x4e/0x8c0
>    __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>    btrfs_evict_inode+0x22c/0x6a0 [btrfs]
>    evict+0xc4/0x190
>    dispose_list+0x35/0x50
>    prune_icache_sb+0x42/0x50
>    super_cache_scan+0x139/0x190
>    shrink_slab+0x262/0x5b0
>    shrink_node+0x2eb/0x2f0
>    kswapd+0x2eb/0x890
>    kthread+0x102/0x140
>    ret_from_fork+0x3a/0x50
>
> stack backtrace:
> CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  dump_stack+0x78/0xb7
>  print_irq_inversion_bug.part.38+0x19f/0x1aa
>  check_usage_forwards+0x102/0x120
>  ? ret_from_fork+0x3a/0x50
>  ? check_usage_backwards+0x110/0x110
>  mark_lock+0x16c/0x270
>  __lock_acquire+0x264/0x11c0
>  ? pagevec_lookup_entries+0x1a/0x30
>  ? truncate_inode_pages_range+0x2b3/0x7f0
>  lock_acquire+0xbd/0x1e0
>  ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>  __mutex_lock+0x4e/0x8c0
>  ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>  ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>  ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
>  __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
>  btrfs_evict_inode+0x22c/0x6a0 [btrfs]
>  evict+0xc4/0x190
>  dispose_list+0x35/0x50
>  prune_icache_sb+0x42/0x50
>  super_cache_scan+0x139/0x190
>  shrink_slab+0x262/0x5b0
>  shrink_node+0x2eb/0x2f0
>  kswapd+0x2eb/0x890
>  kthread+0x102/0x140
>  ? mem_cgroup_shrink_node+0x2c0/0x2c0
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x3a/0x50
>

Looks OK to me.

Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>

thanks,
liubo

> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/disk-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 21f34ad0d411..eb6bb3169a9e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>         if (!writers)
>                 return ERR_PTR(-ENOMEM);
>
> -       ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> +       ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>         if (ret < 0) {
>                 kfree(writers);
>                 return ERR_PTR(ret);
> --
> 2.15.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov March 16, 2018, 6:48 p.m. UTC | #2
On 16.03.2018 20:36, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> While running btrfs/011, I hit the following lockdep splat.
> 
> This is the important bit:
>    pcpu_alloc+0x1ac/0x5e0
>    __percpu_counter_init+0x4e/0xb0
>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>    resolve_indirect_refs+0x130/0x830 [btrfs]
>    find_parent_nodes+0x69e/0xff0 [btrfs]
>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> 
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
> 
> This switches it to GFP_NOFS.

Given there is effort underway to actually kill GFP_NOFS and replace it
with the context annotation routines, shouldn't instead use those
routines directly ?

<snip>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney March 16, 2018, 6:54 p.m. UTC | #3
On 3/16/18 2:48 PM, Nikolay Borisov wrote:
> 
> 
> On 16.03.2018 20:36, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>>    pcpu_alloc+0x1ac/0x5e0
>>    __percpu_counter_init+0x4e/0xb0
>>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>    resolve_indirect_refs+0x130/0x830 [btrfs]
>>    find_parent_nodes+0x69e/0xff0 [btrfs]
>>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
> 
> Given there is effort underway to actually kill GFP_NOFS and replace it
> with the context annotation routines, shouldn't instead use those
> routines directly ?

I don't think those have landed yet.  When they do, it should obsolete
the gfp flags here in any context since we can also read roots from code
that doesn't need GFP_NOFS.

-Jeff
David Sterba March 16, 2018, 8:12 p.m. UTC | #4
On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> While running btrfs/011, I hit the following lockdep splat.
> 
> This is the important bit:
>    pcpu_alloc+0x1ac/0x5e0
>    __percpu_counter_init+0x4e/0xb0
>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>    resolve_indirect_refs+0x130/0x830 [btrfs]
>    find_parent_nodes+0x69e/0xff0 [btrfs]
>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> 
> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> uses GFP_KERNEL, which we can't do during transaction commit.
> 
> This switches it to GFP_NOFS.

> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/disk-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 21f34ad0d411..eb6bb3169a9e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>  	if (!writers)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> +	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);

A line above the diff context is another allocation that does GFP_NOFS,
so one of the gfp flags were wrong.

Looks like there's another instance where percpu allocates with
GFP_KERNEL: create_space_info that can be called from the path that
allocates chunks, so this also looks like a NOFS candidate.

And in the same function, there's another indirect and hidden GFP_KERNEL
allocation from kobject_init_and_add. So in this case we can't fix all
the gfp problems at the call site and will have to use the scoped
approach eventually.

I haven't found any instance of such lockdep reports in my logs (over a
long period), so it's quite unlikely to end up in the recursive
allocation.

Patch added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney March 16, 2018, 9:21 p.m. UTC | #5
On 3/16/18 4:12 PM, David Sterba wrote:
> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>>    pcpu_alloc+0x1ac/0x5e0
>>    __percpu_counter_init+0x4e/0xb0
>>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>    resolve_indirect_refs+0x130/0x830 [btrfs]
>>    find_parent_nodes+0x69e/0xff0 [btrfs]
>>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
> 
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  fs/btrfs/disk-io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 21f34ad0d411..eb6bb3169a9e 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>>  	if (!writers)
>>  		return ERR_PTR(-ENOMEM);
>>  
>> -	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>> +	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> 
> A line above the diff context is another allocation that does GFP_NOFS,
> so one of the gfp flags were wrong.

This one was wrong.  It was initially implicitly GFP_KERNEL until Tejun
added the gfp_t argument and used GFP_KERNEL for most of the sites.
Since that was effectively a no-op, it was the right thing for him to do
without asking every subsystem maintainer their preference.

> Looks like there's another instance where percpu allocates with
> GFP_KERNEL: create_space_info that can be called from the path that
> allocates chunks, so this also looks like a NOFS candidate.

That's probably for the same reason.

> And in the same function, there's another indirect and hidden GFP_KERNEL
> allocation from kobject_init_and_add. So in this case we can't fix all
> the gfp problems at the call site and will have to use the scoped
> approach eventually.

Yep.  That's not a huge barrier, though.  We can push the kobject_add
into a workqueue pretty easily.

> I haven't found any instance of such lockdep reports in my logs (over a
> long period), so it's quite unlikely to end up in the recursive
> allocation.
> 
> Patch added to next, thanks. 

When hunting to see if this had already been fixed, I did find two
reports.  One from Qu from April of last year and another from Mike
Galbraith in 2016.

-Jeff
Jeff Mahoney March 19, 2018, 5:52 p.m. UTC | #6
On 3/16/18 4:12 PM, David Sterba wrote:
> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>>    pcpu_alloc+0x1ac/0x5e0
>>    __percpu_counter_init+0x4e/0xb0
>>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>    resolve_indirect_refs+0x130/0x830 [btrfs]
>>    find_parent_nodes+0x69e/0xff0 [btrfs]
>>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
> 
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  fs/btrfs/disk-io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 21f34ad0d411..eb6bb3169a9e 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>>  	if (!writers)
>>  		return ERR_PTR(-ENOMEM);
>>  
>> -	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>> +	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> 
> A line above the diff context is another allocation that does GFP_NOFS,
> so one of the gfp flags were wrong.
> 
> Looks like there's another instance where percpu allocates with
> GFP_KERNEL: create_space_info that can be called from the path that
> allocates chunks, so this also looks like a NOFS candidate.

We can get rid of this case entirely.  Those call sites should be
removed since the space_infos are all allocated at mount time.

-Jeff
David Sterba March 19, 2018, 6:08 p.m. UTC | #7
On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
> On 3/16/18 4:12 PM, David Sterba wrote:
> > On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
> >> From: Jeff Mahoney <jeffm@suse.com>
> >>
> >> While running btrfs/011, I hit the following lockdep splat.
> >>
> >> This is the important bit:
> >>    pcpu_alloc+0x1ac/0x5e0
> >>    __percpu_counter_init+0x4e/0xb0
> >>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> >>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> >>    resolve_indirect_refs+0x130/0x830 [btrfs]
> >>    find_parent_nodes+0x69e/0xff0 [btrfs]
> >>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> >>    btrfs_find_all_roots+0x50/0x70 [btrfs]
> >>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> >>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> >>
> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> >> uses GFP_KERNEL, which we can't do during transaction commit.
> >>
> >> This switches it to GFP_NOFS.
> > 
> >> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> >> ---
> >>  fs/btrfs/disk-io.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 21f34ad0d411..eb6bb3169a9e 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
> >>  	if (!writers)
> >>  		return ERR_PTR(-ENOMEM);
> >>  
> >> -	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> >> +	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> > 
> > A line above the diff context is another allocation that does GFP_NOFS,
> > so one of the gfp flags were wrong.
> > 
> > Looks like there's another instance where percpu allocates with
> > GFP_KERNEL: create_space_info that can be called from the path that
> > allocates chunks, so this also looks like a NOFS candidate.
> 
> We can get rid of this case entirely.  Those call sites should be
> removed since the space_infos are all allocated at mount time.

That would be great and make a few things simpler. So this means that
__find_space_info never fails once the space infos are properly
initialized, right? That was my concern in do_chunk_alloc and
btrfs_make_block_group (that's called from __btrfs_alloc_chunk).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney March 19, 2018, 9:15 p.m. UTC | #8
On 3/19/18 2:08 PM, David Sterba wrote:
> On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
>> On 3/16/18 4:12 PM, David Sterba wrote:
>>> On Fri, Mar 16, 2018 at 02:36:27PM -0400, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> While running btrfs/011, I hit the following lockdep splat.
>>>>
>>>> This is the important bit:
>>>>    pcpu_alloc+0x1ac/0x5e0
>>>>    __percpu_counter_init+0x4e/0xb0
>>>>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>>>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>>>    resolve_indirect_refs+0x130/0x830 [btrfs]
>>>>    find_parent_nodes+0x69e/0xff0 [btrfs]
>>>>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>>>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>>>>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>>>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>>>
>>>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>>>> uses GFP_KERNEL, which we can't do during transaction commit.
>>>>
>>>> This switches it to GFP_NOFS.
>>>
>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>>> ---
>>>>  fs/btrfs/disk-io.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 21f34ad0d411..eb6bb3169a9e 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
>>>>  	if (!writers)
>>>>  		return ERR_PTR(-ENOMEM);
>>>>  
>>>> -	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>>>> +	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>>>
>>> A line above the diff context is another allocation that does GFP_NOFS,
>>> so one of the gfp flags were wrong.
>>>
>>> Looks like there's another instance where percpu allocates with
>>> GFP_KERNEL: create_space_info that can be called from the path that
>>> allocates chunks, so this also looks like a NOFS candidate.
>>
>> We can get rid of this case entirely.  Those call sites should be
>> removed since the space_infos are all allocated at mount time.
> 
> That would be great and make a few things simpler. So this means that
> __find_space_info never fails once the space infos are properly
> initialized, right? That was my concern in do_chunk_alloc and
> btrfs_make_block_group (that's called from __btrfs_alloc_chunk).

That's a different case.  The raid levels are added when the first block
group of a particular read level is loaded up.  That can happen when the
block groups are read in initially, where it should be safe to use
GFP_KERNEL or when a chunk of a new type is allocated.  The thing is
that a chunk of a new type will only be allocated when we're converting
via balance, so we may be able to do the kobject_add for the raid level
when we start the balance rather than wait for it to create the block group.

-Jeff

Patch
diff mbox

========================================================
WARNING: possible irq lock inversion dependency detected
4.12.14-kvmsmall #8 Tainted: G        W
--------------------------------------------------------
kswapd0/50 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 (pcpu_alloc_mutex){+.+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
  &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pcpu_alloc_mutex);
                               local_irq_disable();
                               lock(&delayed_node->mutex);
                               lock(&found->groups_sem);
  <Interrupt>
    lock(&delayed_node->mutex);

 *** DEADLOCK ***

2 locks held by kswapd0/50:
 #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
 #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50

the shortest dependencies between 2nd lock and 1st lock:
   -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
      HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          pcpu_alloc+0x1ac/0x5e0
                          alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                          __do_tune_cpucache+0x2c/0x220
                          do_tune_cpucache+0x26/0xc0
                          enable_cpucache+0x6d/0xf0
                          kmem_cache_init_late+0x42/0x75
                          start_kernel+0x343/0x4cb
                          x86_64_start_kernel+0x127/0x134
                          secondary_startup_64+0xa5/0xb0
      SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          pcpu_alloc+0x1ac/0x5e0
                          alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                          __do_tune_cpucache+0x2c/0x220
                          do_tune_cpucache+0x26/0xc0
                          enable_cpucache+0x6d/0xf0
                          kmem_cache_init_late+0x42/0x75
                          start_kernel+0x343/0x4cb
                          x86_64_start_kernel+0x127/0x134
                          secondary_startup_64+0xa5/0xb0
      RECLAIM_FS-ON-W at:
                             __kmalloc+0x47/0x310
                             pcpu_extend_area_map+0x2b/0xc0
                             pcpu_alloc+0x3ec/0x5e0
                             alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                             __do_tune_cpucache+0x2c/0x220
                             do_tune_cpucache+0x26/0xc0
                             enable_cpucache+0x6d/0xf0
                             __kmem_cache_create+0x1bf/0x390
                             create_cache+0xba/0x1b0
                             kmem_cache_create+0x1f8/0x2b0
                             ksm_init+0x6f/0x19d
                             do_one_initcall+0x50/0x1b0
                             kernel_init_freeable+0x201/0x289
                             kernel_init+0xa/0x100
                             ret_from_fork+0x3a/0x50
      INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         pcpu_alloc+0x1ac/0x5e0
                         alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                         setup_cpu_cache+0x2f/0x1f0
                         __kmem_cache_create+0x1bf/0x390
                         create_boot_cache+0x8b/0xb1
                         kmem_cache_init+0xa1/0x19e
                         start_kernel+0x270/0x4cb
                         x86_64_start_kernel+0x127/0x134
                         secondary_startup_64+0xa5/0xb0
    }
    ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
    ... acquired at:
   pcpu_alloc+0x1ac/0x5e0
   __percpu_counter_init+0x4e/0xb0
   btrfs_init_fs_root+0x99/0x1c0 [btrfs]
   btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
   resolve_indirect_refs+0x130/0x830 [btrfs]
   find_parent_nodes+0x69e/0xff0 [btrfs]
   btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
   btrfs_find_all_roots+0x50/0x70 [btrfs]
   btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
   btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
   transaction_kthread+0x176/0x1b0 [btrfs]
   kthread+0x102/0x140
   ret_from_fork+0x3a/0x50

  -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
     HARDIRQ-ON-W at:
                        down_write+0x3e/0xa0
                        cache_block_group+0x287/0x420 [btrfs]
                        find_free_extent+0x106c/0x12d0 [btrfs]
                        btrfs_reserve_extent+0xd8/0x170 [btrfs]
                        cow_file_range.isra.66+0x133/0x470 [btrfs]
                        run_delalloc_range+0x121/0x410 [btrfs]
                        writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                        __extent_writepage+0x19a/0x360 [btrfs]
                        extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                        extent_writepages+0x4d/0x60 [btrfs]
                        do_writepages+0x1a/0x70
                        __filemap_fdatawrite_range+0xa7/0xe0
                        btrfs_rename+0x5ee/0xdb0 [btrfs]
                        vfs_rename+0x52a/0x7e0
                        SyS_rename+0x351/0x3b0
                        do_syscall_64+0x79/0x1e0
                        entry_SYSCALL_64_after_hwframe+0x42/0xb7
     HARDIRQ-ON-R at:
                        down_read+0x35/0x90
                        caching_thread+0x57/0x560 [btrfs]
                        normal_work_helper+0x1c0/0x5e0 [btrfs]
                        process_one_work+0x1e0/0x5c0
                        worker_thread+0x44/0x390
                        kthread+0x102/0x140
                        ret_from_fork+0x3a/0x50
     SOFTIRQ-ON-W at:
                        down_write+0x3e/0xa0
                        cache_block_group+0x287/0x420 [btrfs]
                        find_free_extent+0x106c/0x12d0 [btrfs]
                        btrfs_reserve_extent+0xd8/0x170 [btrfs]
                        cow_file_range.isra.66+0x133/0x470 [btrfs]
                        run_delalloc_range+0x121/0x410 [btrfs]
                        writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                        __extent_writepage+0x19a/0x360 [btrfs]
                        extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                        extent_writepages+0x4d/0x60 [btrfs]
                        do_writepages+0x1a/0x70
                        __filemap_fdatawrite_range+0xa7/0xe0
                        btrfs_rename+0x5ee/0xdb0 [btrfs]
                        vfs_rename+0x52a/0x7e0
                        SyS_rename+0x351/0x3b0
                        do_syscall_64+0x79/0x1e0
                        entry_SYSCALL_64_after_hwframe+0x42/0xb7
     SOFTIRQ-ON-R at:
                        down_read+0x35/0x90
                        caching_thread+0x57/0x560 [btrfs]
                        normal_work_helper+0x1c0/0x5e0 [btrfs]
                        process_one_work+0x1e0/0x5c0
                        worker_thread+0x44/0x390
                        kthread+0x102/0x140
                        ret_from_fork+0x3a/0x50
     INITIAL USE at:
                       down_write+0x3e/0xa0
                       cache_block_group+0x287/0x420 [btrfs]
                       find_free_extent+0x106c/0x12d0 [btrfs]
                       btrfs_reserve_extent+0xd8/0x170 [btrfs]
                       cow_file_range.isra.66+0x133/0x470 [btrfs]
                       run_delalloc_range+0x121/0x410 [btrfs]
                       writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                       __extent_writepage+0x19a/0x360 [btrfs]
                       extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                       extent_writepages+0x4d/0x60 [btrfs]
                       do_writepages+0x1a/0x70
                       __filemap_fdatawrite_range+0xa7/0xe0
                       btrfs_rename+0x5ee/0xdb0 [btrfs]
                       vfs_rename+0x52a/0x7e0
                       SyS_rename+0x351/0x3b0
                       do_syscall_64+0x79/0x1e0
                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
   }
   ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
   ... acquired at:
   cache_block_group+0x287/0x420 [btrfs]
   find_free_extent+0x106c/0x12d0 [btrfs]
   btrfs_reserve_extent+0xd8/0x170 [btrfs]
   btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
   btrfs_create_tree+0xbb/0x2a0 [btrfs]
   btrfs_create_uuid_tree+0x37/0x140 [btrfs]
   open_ctree+0x23c0/0x2660 [btrfs]
   btrfs_mount+0xd36/0xf90 [btrfs]
   mount_fs+0x3a/0x160
   vfs_kern_mount+0x66/0x150
   btrfs_mount+0x18c/0xf90 [btrfs]
   mount_fs+0x3a/0x160
   vfs_kern_mount+0x66/0x150
   do_mount+0x1c1/0xcc0
   SyS_mount+0x7e/0xd0
   do_syscall_64+0x79/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7

 -> (&found->groups_sem){++++..} ops: 2134587 {
    HARDIRQ-ON-W at:
                      down_write+0x3e/0xa0
                      __link_block_group+0x34/0x130 [btrfs]
                      btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                      open_ctree+0x2054/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    HARDIRQ-ON-R at:
                      down_read+0x35/0x90
                      btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                      open_ctree+0x207b/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-W at:
                      down_write+0x3e/0xa0
                      __link_block_group+0x34/0x130 [btrfs]
                      btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                      open_ctree+0x2054/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-R at:
                      down_read+0x35/0x90
                      btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                      open_ctree+0x207b/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    INITIAL USE at:
                     down_write+0x3e/0xa0
                     __link_block_group+0x34/0x130 [btrfs]
                     btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                     open_ctree+0x2054/0x2660 [btrfs]
                     btrfs_mount+0xd36/0xf90 [btrfs]
                     mount_fs+0x3a/0x160
                     vfs_kern_mount+0x66/0x150
                     btrfs_mount+0x18c/0xf90 [btrfs]
                     mount_fs+0x3a/0x160
                     vfs_kern_mount+0x66/0x150
                     do_mount+0x1c1/0xcc0
                     SyS_mount+0x7e/0xd0
                     do_syscall_64+0x79/0x1e0
                     entry_SYSCALL_64_after_hwframe+0x42/0xb7
  }
  ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
  ... acquired at:
   find_free_extent+0xcb4/0x12d0 [btrfs]
   btrfs_reserve_extent+0xd8/0x170 [btrfs]
   btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
   __btrfs_cow_block+0x110/0x5b0 [btrfs]
   btrfs_cow_block+0xd7/0x290 [btrfs]
   btrfs_search_slot+0x1f6/0x960 [btrfs]
   btrfs_lookup_inode+0x2a/0x90 [btrfs]
   __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
   btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
   btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
   evict+0xc4/0x190
   __dentry_kill+0xbf/0x170
   dput+0x2ae/0x2f0
   SyS_rename+0x2a6/0x3b0
   do_syscall_64+0x79/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
   HARDIRQ-ON-W at:
                    __mutex_lock+0x4e/0x8c0
                    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                    btrfs_update_inode+0x83/0x110 [btrfs]
                    btrfs_dirty_inode+0x62/0xe0 [btrfs]
                    touch_atime+0x8c/0xb0
                    do_generic_file_read+0x818/0xb10
                    __vfs_read+0xdc/0x150
                    vfs_read+0x8a/0x130
                    SyS_read+0x45/0xa0
                    do_syscall_64+0x79/0x1e0
                    entry_SYSCALL_64_after_hwframe+0x42/0xb7
   SOFTIRQ-ON-W at:
                    __mutex_lock+0x4e/0x8c0
                    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                    btrfs_update_inode+0x83/0x110 [btrfs]
                    btrfs_dirty_inode+0x62/0xe0 [btrfs]
                    touch_atime+0x8c/0xb0
                    do_generic_file_read+0x818/0xb10
                    __vfs_read+0xdc/0x150
                    vfs_read+0x8a/0x130
                    SyS_read+0x45/0xa0
                    do_syscall_64+0x79/0x1e0
                    entry_SYSCALL_64_after_hwframe+0x42/0xb7
   IN-RECLAIM_FS-W at:
                       __mutex_lock+0x4e/0x8c0
                       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                       evict+0xc4/0x190
                       dispose_list+0x35/0x50
                       prune_icache_sb+0x42/0x50
                       super_cache_scan+0x139/0x190
                       shrink_slab+0x262/0x5b0
                       shrink_node+0x2eb/0x2f0
                       kswapd+0x2eb/0x890
                       kthread+0x102/0x140
                       ret_from_fork+0x3a/0x50
   INITIAL USE at:
                   __mutex_lock+0x4e/0x8c0
                   btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                   btrfs_update_inode+0x83/0x110 [btrfs]
                   btrfs_dirty_inode+0x62/0xe0 [btrfs]
                   touch_atime+0x8c/0xb0
                   do_generic_file_read+0x818/0xb10
                   __vfs_read+0xdc/0x150
                   vfs_read+0x8a/0x130
                   SyS_read+0x45/0xa0
                   do_syscall_64+0x79/0x1e0
                   entry_SYSCALL_64_after_hwframe+0x42/0xb7
 }
 ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
 ... acquired at:
   __lock_acquire+0x264/0x11c0
   lock_acquire+0xbd/0x1e0
   __mutex_lock+0x4e/0x8c0
   __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
   btrfs_evict_inode+0x22c/0x6a0 [btrfs]
   evict+0xc4/0x190
   dispose_list+0x35/0x50
   prune_icache_sb+0x42/0x50
   super_cache_scan+0x139/0x190
   shrink_slab+0x262/0x5b0
   shrink_node+0x2eb/0x2f0
   kswapd+0x2eb/0x890
   kthread+0x102/0x140
   ret_from_fork+0x3a/0x50

stack backtrace:
CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x78/0xb7
 print_irq_inversion_bug.part.38+0x19f/0x1aa
 check_usage_forwards+0x102/0x120
 ? ret_from_fork+0x3a/0x50
 ? check_usage_backwards+0x110/0x110
 mark_lock+0x16c/0x270
 __lock_acquire+0x264/0x11c0
 ? pagevec_lookup_entries+0x1a/0x30
 ? truncate_inode_pages_range+0x2b3/0x7f0
 lock_acquire+0xbd/0x1e0
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 __mutex_lock+0x4e/0x8c0
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 btrfs_evict_inode+0x22c/0x6a0 [btrfs]
 evict+0xc4/0x190
 dispose_list+0x35/0x50
 prune_icache_sb+0x42/0x50
 super_cache_scan+0x139/0x190
 shrink_slab+0x262/0x5b0
 shrink_node+0x2eb/0x2f0
 kswapd+0x2eb/0x890
 kthread+0x102/0x140
 ? mem_cgroup_shrink_node+0x2c0/0x2c0
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x3a/0x50

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad0d411..eb6bb3169a9e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1108,7 +1108,7 @@  static struct btrfs_subvolume_writers *btrfs_alloc_subvolume_writers(void)
 	if (!writers)
 		return ERR_PTR(-ENOMEM);
 
-	ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
+	ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
 	if (ret < 0) {
 		kfree(writers);
 		return ERR_PTR(ret);