btrfs: discard: reduce the block group ref when grabbing from unused block group list
diff mbox series

Message ID 20200703070550.39299-1-wqu@suse.com
State New
Headers show
Series
  • btrfs: discard: reduce the block group ref when grabbing from unused block group list
Related show

Commit Message

Qu Wenruo July 3, 2020, 7:05 a.m. UTC
[BUG]
The following small test script can trigger ASSERT() at unmount time:

  mkfs.btrfs -f $dev
  mount $dev $mnt
  mount -o remount,discard=async $mnt
  umount $mnt

The call trace:
  assertion failed: atomic_read(&block_group->count) == 1, in fs/btrfs/block-group.c:3431
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3204!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 4 PID: 10389 Comm: umount Tainted: G           O      5.8.0-rc3-custom+ #68
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
   close_ctree+0x2cb/0x323 [btrfs]
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   __prepare_exit_to_usermode+0x1bc/0x1c0
   __syscall_return_slowpath+0x47/0x230
   do_syscall_64+0x64/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The code:
                ASSERT(atomic_read(&block_group->count) == 1);
                btrfs_put_block_group(block_group);

[CAUSE]
Obviously it's some btrfs_get_block_group() call doesn't get its put
call.

The offending btrfs_get_block_group() happens here:

  void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
  {
  	if (list_empty(&bg->bg_list)) {
  		btrfs_get_block_group(bg);
		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
  	}
  }

So every call sites removing the block group from unused_bgs list should
reduce the ref count of that block group.

However for async discard, it didn't follow the call convention:

  void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
  {
  	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
  				 bg_list) {
  		list_del_init(&block_group->bg_list);
  		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
  	}
  }

And in btrfs_discard_queue_work(), it doesn't call
btrfs_put_block_group() either.

[FIX]
Fix the problem by reducing the reference count when we grab the block
group from unused_bgs list.

Reported-by: Marcos Paulo de Souza <marcos@mpdesouza.com>
Fixes: 6e80d4f8c422 ("btrfs: handle empty block_group removal for async discard")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/discard.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Qu Wenruo July 3, 2020, 1:01 p.m. UTC | #1
On 2020/7/3 下午3:05, Qu Wenruo wrote:
> [BUG]
> The following small test script can trigger ASSERT() at unmount time:
> 
>   mkfs.btrfs -f $dev
>   mount $dev $mnt
>   mount -o remount,discard=async $mnt
>   umount $mnt
> 
> The call trace:
>   assertion failed: atomic_read(&block_group->count) == 1, in fs/btrfs/block-group.c:3431
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ctree.h:3204!
>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 4 PID: 10389 Comm: umount Tainted: G           O      5.8.0-rc3-custom+ #68
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>   Call Trace:
>    btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
>    close_ctree+0x2cb/0x323 [btrfs]
>    btrfs_put_super+0x15/0x17 [btrfs]
>    generic_shutdown_super+0x72/0x110
>    kill_anon_super+0x18/0x30
>    btrfs_kill_super+0x17/0x30 [btrfs]
>    deactivate_locked_super+0x3b/0xa0
>    deactivate_super+0x40/0x50
>    cleanup_mnt+0x135/0x190
>    __cleanup_mnt+0x12/0x20
>    task_work_run+0x64/0xb0
>    __prepare_exit_to_usermode+0x1bc/0x1c0
>    __syscall_return_slowpath+0x47/0x230
>    do_syscall_64+0x64/0xb0
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The code:
>                 ASSERT(atomic_read(&block_group->count) == 1);
>                 btrfs_put_block_group(block_group);
> 
> [CAUSE]
> Obviously it's some btrfs_get_block_group() call doesn't get its put
> call.
> 
> The offending btrfs_get_block_group() happens here:
> 
>   void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>   {
>   	if (list_empty(&bg->bg_list)) {
>   		btrfs_get_block_group(bg);
> 		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
>   	}
>   }
> 
> So every call sites removing the block group from unused_bgs list should
> reduce the ref count of that block group.
> 
> However for async discard, it didn't follow the call convention:
> 
>   void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
>   {
>   	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
>   				 bg_list) {
>   		list_del_init(&block_group->bg_list);
>   		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
>   	}
>   }
> 
> And in btrfs_discard_queue_work(), it doesn't call
> btrfs_put_block_group() either.
> 
> [FIX]
> Fix the problem by reducing the reference count when we grab the block
> group from unused_bgs list.
> 
> Reported-by: Marcos Paulo de Souza <marcos@mpdesouza.com>

My bad, the reported by tag should use his awesome suse mail address.

David, would you please fix this at merge time?

Thanks,
Qu
> Fixes: 6e80d4f8c422 ("btrfs: handle empty block_group removal for async discard")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/discard.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 5615320fa659..741c7e19c32f 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -619,6 +619,7 @@ void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
>  	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
>  				 bg_list) {
>  		list_del_init(&block_group->bg_list);
> +		btrfs_put_block_group(block_group);
>  		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
>  	}
>  	spin_unlock(&fs_info->unused_bgs_lock);
>
David Sterba July 3, 2020, 1:31 p.m. UTC | #2
On Fri, Jul 03, 2020 at 09:01:42PM +0800, Qu Wenruo wrote:
> > group from unused_bgs list.
> > 
> > Reported-by: Marcos Paulo de Souza <marcos@mpdesouza.com>
> 
> My bad, the reported by tag should use his awesome suse mail address.
> 
> David, would you please fix this at merge time?

Yeah, no problem.
Marcos Paulo de Souza July 4, 2020, 9:45 p.m. UTC | #3
On Fri, 2020-07-03 at 15:05 +0800, Qu Wenruo wrote:
> [BUG]
> The following small test script can trigger ASSERT() at unmount time:
> 
>   mkfs.btrfs -f $dev
>   mount $dev $mnt
>   mount -o remount,discard=async $mnt
>   umount $mnt
> 
> The call trace:
>   assertion failed: atomic_read(&block_group->count) == 1, in
> fs/btrfs/block-group.c:3431
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ctree.h:3204!
>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 4 PID: 10389 Comm: umount Tainted: G           O      5.8.0-
> rc3-custom+ #68
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0
> 02/06/2015
>   Call Trace:
>    btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
>    close_ctree+0x2cb/0x323 [btrfs]
>    btrfs_put_super+0x15/0x17 [btrfs]
>    generic_shutdown_super+0x72/0x110
>    kill_anon_super+0x18/0x30
>    btrfs_kill_super+0x17/0x30 [btrfs]
>    deactivate_locked_super+0x3b/0xa0
>    deactivate_super+0x40/0x50
>    cleanup_mnt+0x135/0x190
>    __cleanup_mnt+0x12/0x20
>    task_work_run+0x64/0xb0
>    __prepare_exit_to_usermode+0x1bc/0x1c0
>    __syscall_return_slowpath+0x47/0x230
>    do_syscall_64+0x64/0xb0
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The code:
>                 ASSERT(atomic_read(&block_group->count) == 1);
>                 btrfs_put_block_group(block_group);
> 
> [CAUSE]
> Obviously it's some btrfs_get_block_group() call doesn't get its put
> call.
> 
> The offending btrfs_get_block_group() happens here:
> 
>   void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>   {
>   	if (list_empty(&bg->bg_list)) {
>   		btrfs_get_block_group(bg);
> 		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
>   	}
>   }
> 
> So every call sites removing the block group from unused_bgs list
> should
> reduce the ref count of that block group.
> 
> However for async discard, it didn't follow the call convention:
> 
>   void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info
> *fs_info)
>   {
>   	list_for_each_entry_safe(block_group, next, &fs_info-
> >unused_bgs,
>   				 bg_list) {
>   		list_del_init(&block_group->bg_list);
>   		btrfs_discard_queue_work(&fs_info->discard_ctl,
> block_group);
>   	}
>   }
> 
> And in btrfs_discard_queue_work(), it doesn't call
> btrfs_put_block_group() either.
> 
> [FIX]
> Fix the problem by reducing the reference count when we grab the
> block
> group from unused_bgs list.

xfstests is happy about the change.

Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com>

> 
> Reported-by: Marcos Paulo de Souza <marcos@mpdesouza.com>
> Fixes: 6e80d4f8c422 ("btrfs: handle empty block_group removal for
> async discard")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/discard.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 5615320fa659..741c7e19c32f 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -619,6 +619,7 @@ void btrfs_discard_punt_unused_bgs_list(struct
> btrfs_fs_info *fs_info)
>  	list_for_each_entry_safe(block_group, next, &fs_info-
> >unused_bgs,
>  				 bg_list) {
>  		list_del_init(&block_group->bg_list);
> +		btrfs_put_block_group(block_group);
>  		btrfs_discard_queue_work(&fs_info->discard_ctl,
> block_group);
>  	}
>  	spin_unlock(&fs_info->unused_bgs_lock);
Anand Jain July 6, 2020, 5:18 a.m. UTC | #4
We should have a set of remount test cases.


Looks good.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
David Sterba July 7, 2020, 2:06 p.m. UTC | #5
On Fri, Jul 03, 2020 at 03:05:50PM +0800, Qu Wenruo wrote:
> [BUG]
> The following small test script can trigger ASSERT() at unmount time:
> 
>   mkfs.btrfs -f $dev
>   mount $dev $mnt
>   mount -o remount,discard=async $mnt
>   umount $mnt
> 
> The call trace:
>   assertion failed: atomic_read(&block_group->count) == 1, in fs/btrfs/block-group.c:3431
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ctree.h:3204!
>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 4 PID: 10389 Comm: umount Tainted: G           O      5.8.0-rc3-custom+ #68
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>   Call Trace:
>    btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
>    close_ctree+0x2cb/0x323 [btrfs]
>    btrfs_put_super+0x15/0x17 [btrfs]
>    generic_shutdown_super+0x72/0x110
>    kill_anon_super+0x18/0x30
>    btrfs_kill_super+0x17/0x30 [btrfs]
>    deactivate_locked_super+0x3b/0xa0
>    deactivate_super+0x40/0x50
>    cleanup_mnt+0x135/0x190
>    __cleanup_mnt+0x12/0x20
>    task_work_run+0x64/0xb0
>    __prepare_exit_to_usermode+0x1bc/0x1c0
>    __syscall_return_slowpath+0x47/0x230
>    do_syscall_64+0x64/0xb0
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The code:
>                 ASSERT(atomic_read(&block_group->count) == 1);
>                 btrfs_put_block_group(block_group);
> 
> [CAUSE]
> Obviously it's some btrfs_get_block_group() call doesn't get its put
> call.
> 
> The offending btrfs_get_block_group() happens here:
> 
>   void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>   {
>   	if (list_empty(&bg->bg_list)) {
>   		btrfs_get_block_group(bg);
> 		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
>   	}
>   }
> 
> So every call sites removing the block group from unused_bgs list should
> reduce the ref count of that block group.
> 
> However for async discard, it didn't follow the call convention:
> 
>   void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
>   {
>   	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
>   				 bg_list) {
>   		list_del_init(&block_group->bg_list);
>   		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
>   	}
>   }
> 
> And in btrfs_discard_queue_work(), it doesn't call
> btrfs_put_block_group() either.
> 
> [FIX]
> Fix the problem by reducing the reference count when we grab the block
> group from unused_bgs list.
> 
> Reported-by: Marcos Paulo de Souza <marcos@mpdesouza.com>
> Fixes: 6e80d4f8c422 ("btrfs: handle empty block_group removal for async discard")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added it misc-next, thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 5615320fa659..741c7e19c32f 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -619,6 +619,7 @@  void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
 	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
 				 bg_list) {
 		list_del_init(&block_group->bg_list);
+		btrfs_put_block_group(block_group);
 		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);