mbox series

[v2,0/2] don't hold dev_replace rwsem over whole of btrfs_map_block

Message ID 20240508114016.18119-1-jth@kernel.org (mailing list archive)
Headers show
Series don't hold dev_replace rwsem over whole of btrfs_map_block | expand

Message

Johannes Thumshirn May 8, 2024, 11:40 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
btrfs_map_block' sent as a series as I've added a 2nd patch, which
I've came accross while looking at the code.

@Filipe, unfortunately I can't find the original report from the CI
anymore, so I don't have the stacktrace handy.

Changes to RFC:
- Incorporated Filipe's review
- Added patch #2
Link to RFC:
https://lore.kernel.org/linux-btrfs/2454cd4eb1694d37056e492af32b23743c63202b.1714663442.git.jth@kernel.org/

Johannes Thumshirn (2):
  btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
  btrfs: pass 'struct btrfs_io_geometry' into handle_ops_on_dev_replace

 fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

Comments

Filipe Manana May 8, 2024, 12:38 p.m. UTC | #1
On Wed, May 8, 2024 at 12:40 PM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
> btrfs_map_block' sent as a series as I've added a 2nd patch, which
> I've came accross while looking at the code.
>
> @Filipe, unfortunately I can't find the original report from the CI
> anymore, so I don't have the stacktrace handy.
>
> Changes to RFC:
> - Incorporated Filipe's review
> - Added patch #2
> Link to RFC:
> https://lore.kernel.org/linux-btrfs/2454cd4eb1694d37056e492af32b23743c63202b.1714663442.git.jth@kernel.org/
>
> Johannes Thumshirn (2):
>   btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
>   btrfs: pass 'struct btrfs_io_geometry' into handle_ops_on_dev_replace
>
>  fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++--------------------
>  1 file changed, 29 insertions(+), 23 deletions(-)

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

>
> --
> 2.35.3
>
David Sterba May 9, 2024, 5:39 p.m. UTC | #2
On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
> btrfs_map_block' sent as a series as I've added a 2nd patch, which
> I've came accross while looking at the code.
> 
> @Filipe, unfortunately I can't find the original report from the CI
> anymore, so I don't have the stacktrace handy.

I can reproduce it. Please edit it a bit for the of changelog.

[ 1022.275649] ============================================
[ 1022.276151] WARNING: possible recursive locking detected
[ 1022.277976] 6.9.0-rc7-default+ #417 Not tainted
[ 1022.278419] --------------------------------------------
[ 1022.278916] btrfs/27578 is trying to acquire lock:
[ 1022.279372] ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0
[ 1022.280228] 
               but task is already holding lock:
[ 1022.280786] ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0
[ 1022.281722] 
               other info that might help us debug this:
[ 1022.282405]  Possible unsafe locking scenario:

[ 1022.282863]        CPU0
[ 1022.283087]        ----
[ 1022.283311]   lock(&fs_info->dev_replace.rwsem);
[ 1022.283676]   lock(&fs_info->dev_replace.rwsem);
[ 1022.284036] 
                *** DEADLOCK ***

[ 1022.284526]  May be due to missing lock nesting notation

[ 1022.285077] 1 lock held by btrfs/27578:
[ 1022.285394]  #0: ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0
[ 1022.286109] 
               stack backtrace:
[ 1022.286482] CPU: 3 PID: 27578 Comm: btrfs Not tainted 6.9.0-rc7-default+ #417
[ 1022.287008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 1022.287924] Call Trace:
[ 1022.288189]  <TASK>
[ 1022.288433]  dump_stack_lvl+0x61/0x80
[ 1022.288799]  validate_chain+0x6ef/0xcc0
[ 1022.289179]  ? check_prev_add+0x12f0/0x12f0
[ 1022.289581]  ? mark_lock+0xe3/0xa10
[ 1022.289936]  __lock_acquire+0x8cf/0xda0
[ 1022.290324]  lock_acquire+0x149/0x3e0
[ 1022.290673]  ? btrfs_map_block+0x25b/0x11e0
[ 1022.291086]  ? lock_sync+0xd0/0xd0
[ 1022.291432]  ? lock_is_held_type+0x83/0xf0
[ 1022.291831]  ? __might_resched+0x162/0x240
[ 1022.292232]  ? rcu_read_unlock+0x80/0x80
[ 1022.292617]  down_read+0x9a/0x370
[ 1022.292959]  ? btrfs_map_block+0x25b/0x11e0
[ 1022.293419]  ? rwsem_down_read_slowpath+0x740/0x740
[ 1022.293876]  ? do_raw_read_unlock+0x23/0x50
[ 1022.294278]  ? _raw_read_unlock+0x1a/0x30
[ 1022.294667]  btrfs_map_block+0x25b/0x11e0
[ 1022.295057]  ? btrfs_map_discard+0x4e0/0x4e0
[ 1022.295478]  ? btrfs_bio_counter_sub+0x70/0x70
[ 1022.295912]  btrfs_submit_chunk+0x1a2/0x950
[ 1022.296313]  ? btrfs_bio_init+0x80/0x80
[ 1022.296691]  ? bio_check_pages_dirty+0x390/0x390
[ 1022.297159]  ? do_raw_spin_unlock+0x8d/0xe0
[ 1022.297652]  ? alloc_test_extent_buffer+0x2c0/0x2c0
[ 1022.298168]  btrfs_submit_bio+0x1c/0x40
[ 1022.298627]  read_extent_buffer_pages+0x26e/0x3b0
[ 1022.299068]  btrfs_read_extent_buffer+0xe1/0x300
[ 1022.299605]  read_tree_block+0x55/0x80
[ 1022.300039]  read_block_for_search+0x3ca/0x520
[ 1022.300500]  ? btrfs_release_path+0x1e0/0x1e0
[ 1022.300937]  ? lock_is_held_type+0x83/0xf0
[ 1022.301414]  ? btrfs_search_slot_get_root+0x280/0x520
[ 1022.301961]  ? __might_sleep+0x28/0xd0
[ 1022.302360]  btrfs_search_slot+0x4a3/0x13b0
[ 1022.302801]  ? balance_level+0x15c0/0x15c0
[ 1022.303195]  ? rcu_is_watching+0x1c/0x40
[ 1022.303648]  ? kmem_cache_alloc+0x262/0x2f0
[ 1022.304076]  ? find_held_lock+0x8a/0xa0
[ 1022.304543]  btrfs_get_raid_extent_offset+0x14e/0x600
[ 1022.305091]  ? btrfs_insert_raid_extent+0x160/0x160
[ 1022.305617]  ? down_read+0xb7/0x370
[ 1022.306027]  ? down_read+0x2e/0x370
[ 1022.306446]  ? set_io_stripe+0x9d/0x140
[ 1022.306924]  btrfs_map_block+0x3e7/0x11e0
[ 1022.307404]  ? btrfs_map_discard+0x4e0/0x4e0
[ 1022.307917]  ? scrub_write_endio+0x1c0/0x1c0
[ 1022.308422]  ? btrfs_bio_init+0x67/0x80
[ 1022.308842]  scrub_submit_extent_sector_read.isra.0+0x308/0x630
[ 1022.309434]  ? scrub_stripe_submit_repair_read+0x440/0x440
[ 1022.310062]  ? _find_next_bit+0x37/0xc0
[ 1022.310483]  ? scrub_submit_initial_read+0x139/0x3e0
[ 1022.311040]  submit_initial_group_read+0x175/0x330
[ 1022.311617]  ? reacquire_held_locks+0x280/0x280
[ 1022.312118]  ? scrub_submit_initial_read+0x3e0/0x3e0
[ 1022.312725]  ? scrub_simple_mirror.isra.0+0x1e9/0x380
[ 1022.313292]  scrub_simple_mirror.isra.0+0x2e7/0x380
[ 1022.313808]  scrub_stripe+0x76d/0x7d0
[ 1022.314255]  ? __lock_acquired+0x1ee/0x400
[ 1022.314740]  ? scrub_setup_ctx+0x210/0x210
[ 1022.315212]  ? btrfs_find_chunk_map+0x37/0x50
[ 1022.315696]  ? do_raw_read_unlock+0x23/0x50
[ 1022.316228]  scrub_chunk+0x137/0x200
[ 1022.316690]  scrub_enumerate_chunks+0x539/0x960
[ 1022.317169]  ? scrub_chunk+0x200/0x200
[ 1022.317608]  ? bit_wait_io_timeout+0x10/0xc0
[ 1022.318101]  ? __up_read+0x175/0x4c0
[ 1022.318558]  ? _down_write_nest_lock+0x190/0x1c0
[ 1022.319067]  ? btrfs_dev_name+0xb0/0xb0
[ 1022.319514]  ? btrfs_scrub_dev+0x313/0x820
[ 1022.320034]  btrfs_scrub_dev+0x3e4/0x820
[ 1022.320518]  ? cleanup_transaction+0x100/0x100
[ 1022.321043]  ? scrub_print_warning_inode+0x540/0x540
[ 1022.321613]  btrfs_dev_replace_start+0x41f/0x550
[ 1022.322150]  ? btrfs_dev_replace_finishing+0xa60/0xa60
[ 1022.322735]  ? do_raw_spin_trylock+0xc7/0x110
[ 1022.323205]  ? do_raw_spin_lock+0x1a0/0x1a0
[ 1022.323581]  btrfs_dev_replace_by_ioctl+0x8e/0xf0
[ 1022.323951]  btrfs_ioctl+0x4cf/0x17c0
[ 1022.324333]  ? do_sigaction+0x234/0x420
[ 1022.324652]  ? btrfs_ioctl_get_supported_features+0x20/0x20
[ 1022.325069]  ? sigaction_compat_abi+0x10/0x10
[ 1022.325494]  ? lock_release+0x9a/0x1a0
[ 1022.325837]  ? __x64_sys_rt_sigaction+0x185/0x1b0
[ 1022.326203]  ? __x64_sys_sigprocmask+0x230/0x230
[ 1022.326682]  ? lockdep_hardirqs_on_prepare+0x13c/0x200
[ 1022.327094]  __x64_sys_ioctl+0xbc/0xe0
[ 1022.327407]  do_syscall_64+0x5b/0xf0
[ 1022.327745]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1022.328217] RIP: 0033:0x7f756e53f3df
[ 1022.330048] RSP: 002b:00007ffd174401c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1022.330774] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f756e53f3df
[ 1022.331499] RDX: 00007ffd174410b0 RSI: 00000000ca289435 RDI: 0000000000000003
[ 1022.332176] RBP: 00007ffd17440280 R08: 0000000000000000 R09: 0000000000000007
[ 1022.332820] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffd17443f9e
[ 1022.333452] R13: 0000555f1341a2a0 R14: 0000000000000000 R15: 0000000000000004
[ 1022.334082]  </TASK>
David Sterba May 9, 2024, 5:43 p.m. UTC | #3
On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
> btrfs_map_block' sent as a series as I've added a 2nd patch, which
> I've came accross while looking at the code.
> 
> @Filipe, unfortunately I can't find the original report from the CI
> anymore, so I don't have the stacktrace handy.

And right after the stack trace I found this, it could be the same
use-after-free that Filipe fixed, I haven't analyzed it:

[ 1040.699694] BUG: KASAN: slab-use-after-free in btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.700735] Read of size 8 at addr ffff888113385880 by task kworker/u32:12/13766

[ 1040.701948] CPU: 3 PID: 13766 Comm: kworker/u32:12 Not tainted 6.9.0-rc7-default+ #417
[ 1040.702834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 1040.704167] Workqueue: btrfs-endio-write btrfs_work_helper
[ 1040.704861] Call Trace:
[ 1040.705242]  <TASK>
[ 1040.705581]  dump_stack_lvl+0x61/0x80
[ 1040.706069]  print_address_description.constprop.0+0x75/0x310
[ 1040.706652]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.707315]  print_report+0x118/0x216
[ 1040.707837]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.708434]  ? __virt_addr_valid+0x11e/0x2b0
[ 1040.708877]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.709507]  kasan_report+0x11d/0x1f0
[ 1040.710013]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.710677]  btrfs_insert_one_raid_extent+0x1e2/0x430
[ 1040.711313]  ? record_root_in_trans+0x164/0x1a0
[ 1040.711781]  ? btrfs_lru_cache_clear+0x80/0x80
[ 1040.712232]  ? btrfs_record_root_in_trans+0xa8/0xc0
[ 1040.712716]  ? start_transaction+0x154/0xa50
[ 1040.713154]  btrfs_insert_raid_extent+0x9d/0x160
[ 1040.713681]  btrfs_finish_one_ordered+0x5af/0xb60
[ 1040.714220]  ? rcu_is_watching+0x1c/0x40
[ 1040.714712]  ? lock_acquire+0xc1/0x3e0
[ 1040.715220]  ? btrfs_unlink_subvol+0x650/0x650
[ 1040.715705]  ? do_raw_spin_trylock+0xc7/0x110
[ 1040.716274]  btrfs_work_helper+0x118/0x310
[ 1040.716822]  ? rcu_is_watching+0x1c/0x40
[ 1040.717329]  process_one_work+0x4be/0x9f0
[ 1040.717750]  ? pwq_dec_nr_in_flight+0x250/0x250
[ 1040.718350]  ? __list_add_valid_or_report+0x3f/0x70
[ 1040.718980]  worker_thread+0x38d/0x680
[ 1040.720200]  ? __kthread_parkme+0xc8/0xe0
[ 1040.720764]  ? rescuer_thread+0x5e0/0x5e0
[ 1040.721307]  kthread+0x171/0x1b0
[ 1040.723690]  ? kthread+0xe5/0x1b0
[ 1040.724181]  ? kthread_complete_and_exit+0x20/0x20
[ 1040.724782]  ret_from_fork+0x28/0x50
[ 1040.726471]  ? kthread_complete_and_exit+0x20/0x20
[ 1040.726930]  ret_from_fork_asm+0x11/0x20
[ 1040.727415]  </TASK>

[ 1040.727863] Allocated by task 27528:
[ 1040.728109]  kasan_save_stack+0x37/0x60
[ 1040.728360]  kasan_save_track+0x10/0x30
[ 1040.728602]  __kasan_kmalloc+0x83/0x90
[ 1040.728845]  btrfs_alloc_device+0xa4/0x2e0
[ 1040.729094]  btrfs_init_dev_replace_tgtdev.constprop.0+0x1a4/0x4b0
[ 1040.729444]  btrfs_dev_replace_start+0xdf/0x550
[ 1040.729713]  btrfs_dev_replace_by_ioctl+0x8e/0xf0
[ 1040.730010]  btrfs_ioctl+0x4cf/0x17c0
[ 1040.730292]  __x64_sys_ioctl+0xbc/0xe0
[ 1040.730836]  do_syscall_64+0x5b/0xf0
[ 1040.731352]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

[ 1040.732308] Freed by task 27578:
[ 1040.732780]  kasan_save_stack+0x37/0x60
[ 1040.733458]  kasan_save_track+0x10/0x30
[ 1040.733982]  kasan_save_free_info+0x37/0x50
[ 1040.734454]  poison_slab_object+0x19d/0x1e0
[ 1040.734908]  __kasan_slab_free+0x10/0x30
[ 1040.735287]  kfree+0xf8/0x2d0
[ 1040.735750]  btrfs_rm_dev_replace_free_srcdev+0x7a/0x150
[ 1040.736377]  btrfs_dev_replace_finishing+0x90d/0xa60
[ 1040.736983]  btrfs_dev_replace_start+0x429/0x550
[ 1040.737552]  btrfs_dev_replace_by_ioctl+0x8e/0xf0
[ 1040.738113]  btrfs_ioctl+0x4cf/0x17c0
[ 1040.738601]  __x64_sys_ioctl+0xbc/0xe0
[ 1040.739155]  do_syscall_64+0x5b/0xf0
[ 1040.739759]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

[ 1040.740712] The buggy address belongs to the object at ffff888113385800
                which belongs to the cache kmalloc-1k of size 1024
[ 1040.741932] The buggy address is located 128 bytes inside of
                freed 1024-byte region [ffff888113385800, ffff888113385c00)

[ 1040.743275] The buggy address belongs to the physical page:
[ 1040.743997] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113380
[ 1040.745033] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 1040.745867] flags: 0x4500000000840(slab|head|section=34|zone=2)
[ 1040.746766] page_type: 0xffffffff()
[ 1040.747241] raw: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10
[ 1040.747981] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ 1040.748723] head: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10
[ 1040.749474] head: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
[ 1040.750291] head: 0004500000000003 ffff88813f2ce001 dead000000000122 00000000ffffffff
[ 1040.751227] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1040.751961] page dumped because: kasan: bad access detected

[ 1040.752735] Memory state around the buggy address:
[ 1040.753213]  ffff888113385780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1040.753960]  ffff888113385800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1040.754660] >ffff888113385880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1040.755360]                    ^
[ 1040.755725]  ffff888113385900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1040.756435]  ffff888113385980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1040.757274] ==================================================================
[ 1041.613686] BTRFS info (device vdd): scrub: started on devid 1
[ 1041.613740] BTRFS info (device vdd): scrub: started on devid 2
[ 1042.472852] assertion failed: i < stripe->nr_sectors, in fs/btrfs/scrub.c:753
[ 1042.473680] ------------[ cut here ]------------
[ 1042.474180] kernel BUG at fs/btrfs/scrub.c:753!
[ 1042.474550] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1042.474954] CPU: 0 PID: 27589 Comm: btrfs Tainted: G    B              6.9.0-rc7-default+ #417
[ 1042.475640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 1042.476821] RIP: 0010:calc_sector_number.cold+0x3d/0x5d
[ 1042.477428] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5
[ 1042.479196] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286
[ 1042.479840] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000
[ 1042.480706] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7
[ 1042.481511] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c
[ 1042.482360] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080
[ 1042.483170] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72
[ 1042.483957] FS:  00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000
[ 1042.484860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1042.485584] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0
[ 1042.486319] Call Trace:
[ 1042.486620]  <TASK>
[ 1042.486940]  ? show_trace_log_lvl+0x1e9/0x2e0
[ 1042.487501]  ? scrub_read_endio+0x5a/0x290
[ 1042.488060]  ? __die+0x4f/0x8c
[ 1042.488513]  ? die+0x25/0x40
[ 1042.488855]  ? do_trap+0x125/0x180
[ 1042.489277]  ? calc_sector_number.cold+0x3d/0x5d
[ 1042.489878]  ? calc_sector_number.cold+0x3d/0x5d
[ 1042.490488]  ? do_error_trap+0x8b/0x130
[ 1042.491002]  ? calc_sector_number.cold+0x3d/0x5d
[ 1042.491541]  ? handle_invalid_op+0x38/0x40
[ 1042.492009]  ? calc_sector_number.cold+0x3d/0x5d
[ 1042.492542]  ? exc_invalid_op+0x29/0x40
[ 1042.493035]  ? asm_exc_invalid_op+0x16/0x20
[ 1042.493575]  ? this_cpu_in_panic+0x1d/0x30
[ 1042.494034]  ? calc_sector_number.cold+0x3d/0x5d
[ 1042.494535]  scrub_read_endio+0x5a/0x290
[ 1042.495058]  scrub_submit_extent_sector_read.isra.0+0x60a/0x630
[ 1042.495695]  ? scrub_stripe_submit_repair_read+0x440/0x440
[ 1042.496321]  ? _find_next_bit+0x37/0xc0
[ 1042.496855]  ? scrub_submit_initial_read+0x139/0x3e0
[ 1042.497367]  submit_initial_group_read+0x175/0x330
[ 1042.497926]  ? lock_sync+0xd0/0xd0
[ 1042.498329]  ? scrub_submit_initial_read+0x3e0/0x3e0
[ 1042.498885]  ? rcu_read_unlock+0x80/0x80
[ 1042.499383]  scrub_simple_mirror.isra.0+0x2e7/0x380
[ 1042.499965]  scrub_stripe+0x76d/0x7d0
[ 1042.500469]  ? rcu_is_watching+0x1c/0x40
[ 1042.500898]  ? rcu_is_watching+0x1c/0x40
[ 1042.501399]  ? scrub_setup_ctx+0x210/0x210
[ 1042.501887]  ? __down_write_trylock+0xf8/0x260
[ 1042.502428]  ? down_read_trylock+0x50/0x50
[ 1042.502977]  ? __might_resched+0x162/0x240
[ 1042.503462]  ? rcu_read_unlock+0x80/0x80
[ 1042.503980]  ? do_raw_read_unlock+0x23/0x50
[ 1042.504501]  scrub_chunk+0x137/0x200
[ 1042.505023]  scrub_enumerate_chunks+0x539/0x960
[ 1042.505594]  ? scrub_chunk+0x200/0x200
[ 1042.506084]  ? do_raw_spin_unlock+0x8d/0xe0
[ 1042.506562]  ? bit_wait_io_timeout+0x10/0xc0
[ 1042.507087]  ? do_raw_spin_trylock+0xc7/0x110
[ 1042.507635]  btrfs_scrub_dev+0x3e4/0x820
[ 1042.508105]  ? scrub_print_warning_inode+0x540/0x540
[ 1042.508666]  ? mnt_get_write_access+0x6a/0x120
[ 1042.509177]  ? mnt_get_write_access+0xfd/0x120
[ 1042.509788]  btrfs_ioctl+0x79d/0x17c0
[ 1042.510236]  ? btrfs_ioctl_get_supported_features+0x20/0x20
[ 1042.510852]  ? lock_sync+0xd0/0xd0
[ 1042.511274]  ? do_raw_spin_trylock+0xc7/0x110
[ 1042.511751]  ? do_raw_spin_lock+0x1a0/0x1a0
[ 1042.512374]  ? rcu_is_watching+0x1c/0x40
[ 1042.512957]  ? rcu_is_watching+0x1c/0x40
[ 1042.513538]  ? lock_release+0x52/0x1a0
[ 1042.514104]  ? __fget_files+0x13b/0x230
[ 1042.514669]  __x64_sys_ioctl+0xbc/0xe0
[ 1042.515235]  do_syscall_64+0x5b/0xf0
[ 1042.515777]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1042.516471] RIP: 0033:0x7fd369fa03df
[ 1042.516968] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 1042.518755] RSP: 002b:00007fd36967dc80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1042.519784] RAX: ffffffffffffffda RBX: 00005627eeea1758 RCX: 00007fd369fa03df
[ 1042.520548] RDX: 00005627eeea1758 RSI: 00000000c400941b RDI: 0000000000000003
[ 1042.521301] RBP: 0000000000000000 R08: 00007fd36967e6c0 R09: 32203a6b63617473
[ 1042.521944] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffdb8
[ 1042.522786] R13: 0000000000000002 R14: 00007ffecdb6f450 R15: 00007fd368e7e000
[ 1042.523695]  </TASK>
[ 1042.523965] Modules linked in: loop
[ 1042.524367] ---[ end trace 0000000000000000 ]---
[ 1042.524856] RIP: 0010:calc_sector_number.cold+0x3d/0x5d
[ 1042.525382] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5
[ 1042.527082] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286
[ 1042.527686] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000
[ 1042.528388] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7
[ 1042.529109] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c
[ 1042.529830] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080
[ 1042.530583] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72
[ 1042.531412] FS:  00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000
[ 1042.534519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1042.535162] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0
Johannes Thumshirn May 10, 2024, 9:09 a.m. UTC | #4
On 09.05.24 19:40, David Sterba wrote:
> On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
>> btrfs_map_block' sent as a series as I've added a 2nd patch, which
>> I've came accross while looking at the code.
>>
>> @Filipe, unfortunately I can't find the original report from the CI
>> anymore, so I don't have the stacktrace handy.
> 
> I can reproduce it. Please edit it a bit for the of changelog.
> 

Thanks a lot, added it to the changelog.
Johannes Thumshirn May 10, 2024, 9:11 a.m. UTC | #5
On 09.05.24 19:43, David Sterba wrote:
> On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of
>> btrfs_map_block' sent as a series as I've added a 2nd patch, which
>> I've came accross while looking at the code.
>>
>> @Filipe, unfortunately I can't find the original report from the CI
>> anymore, so I don't have the stacktrace handy.
> 
> And right after the stack trace I found this, it could be the same
> use-after-free that Filipe fixed, I haven't analyzed it:

 From a first glance, this looks different but related. I /think/ we 
need to protect the btrfs_device in btrfs_insert_one_raid_extent with 
the dev_replace rwsem as well. I'll look into it.


> [ 1040.699694] BUG: KASAN: slab-use-after-free in btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.700735] Read of size 8 at addr ffff888113385880 by task kworker/u32:12/13766
> 
> [ 1040.701948] CPU: 3 PID: 13766 Comm: kworker/u32:12 Not tainted 6.9.0-rc7-default+ #417
> [ 1040.702834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [ 1040.704167] Workqueue: btrfs-endio-write btrfs_work_helper
> [ 1040.704861] Call Trace:
> [ 1040.705242]  <TASK>
> [ 1040.705581]  dump_stack_lvl+0x61/0x80
> [ 1040.706069]  print_address_description.constprop.0+0x75/0x310
> [ 1040.706652]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.707315]  print_report+0x118/0x216
> [ 1040.707837]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.708434]  ? __virt_addr_valid+0x11e/0x2b0
> [ 1040.708877]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.709507]  kasan_report+0x11d/0x1f0
> [ 1040.710013]  ? btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.710677]  btrfs_insert_one_raid_extent+0x1e2/0x430
> [ 1040.711313]  ? record_root_in_trans+0x164/0x1a0
> [ 1040.711781]  ? btrfs_lru_cache_clear+0x80/0x80
> [ 1040.712232]  ? btrfs_record_root_in_trans+0xa8/0xc0
> [ 1040.712716]  ? start_transaction+0x154/0xa50
> [ 1040.713154]  btrfs_insert_raid_extent+0x9d/0x160
> [ 1040.713681]  btrfs_finish_one_ordered+0x5af/0xb60
> [ 1040.714220]  ? rcu_is_watching+0x1c/0x40
> [ 1040.714712]  ? lock_acquire+0xc1/0x3e0
> [ 1040.715220]  ? btrfs_unlink_subvol+0x650/0x650
> [ 1040.715705]  ? do_raw_spin_trylock+0xc7/0x110
> [ 1040.716274]  btrfs_work_helper+0x118/0x310
> [ 1040.716822]  ? rcu_is_watching+0x1c/0x40
> [ 1040.717329]  process_one_work+0x4be/0x9f0
> [ 1040.717750]  ? pwq_dec_nr_in_flight+0x250/0x250
> [ 1040.718350]  ? __list_add_valid_or_report+0x3f/0x70
> [ 1040.718980]  worker_thread+0x38d/0x680
> [ 1040.720200]  ? __kthread_parkme+0xc8/0xe0
> [ 1040.720764]  ? rescuer_thread+0x5e0/0x5e0
> [ 1040.721307]  kthread+0x171/0x1b0
> [ 1040.723690]  ? kthread+0xe5/0x1b0
> [ 1040.724181]  ? kthread_complete_and_exit+0x20/0x20
> [ 1040.724782]  ret_from_fork+0x28/0x50
> [ 1040.726471]  ? kthread_complete_and_exit+0x20/0x20
> [ 1040.726930]  ret_from_fork_asm+0x11/0x20
> [ 1040.727415]  </TASK>
> 
> [ 1040.727863] Allocated by task 27528:
> [ 1040.728109]  kasan_save_stack+0x37/0x60
> [ 1040.728360]  kasan_save_track+0x10/0x30
> [ 1040.728602]  __kasan_kmalloc+0x83/0x90
> [ 1040.728845]  btrfs_alloc_device+0xa4/0x2e0
> [ 1040.729094]  btrfs_init_dev_replace_tgtdev.constprop.0+0x1a4/0x4b0
> [ 1040.729444]  btrfs_dev_replace_start+0xdf/0x550
> [ 1040.729713]  btrfs_dev_replace_by_ioctl+0x8e/0xf0
> [ 1040.730010]  btrfs_ioctl+0x4cf/0x17c0
> [ 1040.730292]  __x64_sys_ioctl+0xbc/0xe0
> [ 1040.730836]  do_syscall_64+0x5b/0xf0
> [ 1040.731352]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> [ 1040.732308] Freed by task 27578:
> [ 1040.732780]  kasan_save_stack+0x37/0x60
> [ 1040.733458]  kasan_save_track+0x10/0x30
> [ 1040.733982]  kasan_save_free_info+0x37/0x50
> [ 1040.734454]  poison_slab_object+0x19d/0x1e0
> [ 1040.734908]  __kasan_slab_free+0x10/0x30
> [ 1040.735287]  kfree+0xf8/0x2d0
> [ 1040.735750]  btrfs_rm_dev_replace_free_srcdev+0x7a/0x150
> [ 1040.736377]  btrfs_dev_replace_finishing+0x90d/0xa60
> [ 1040.736983]  btrfs_dev_replace_start+0x429/0x550
> [ 1040.737552]  btrfs_dev_replace_by_ioctl+0x8e/0xf0
> [ 1040.738113]  btrfs_ioctl+0x4cf/0x17c0
> [ 1040.738601]  __x64_sys_ioctl+0xbc/0xe0
> [ 1040.739155]  do_syscall_64+0x5b/0xf0
> [ 1040.739759]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> [ 1040.740712] The buggy address belongs to the object at ffff888113385800
>                  which belongs to the cache kmalloc-1k of size 1024
> [ 1040.741932] The buggy address is located 128 bytes inside of
>                  freed 1024-byte region [ffff888113385800, ffff888113385c00)
> 
> [ 1040.743275] The buggy address belongs to the physical page:
> [ 1040.743997] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113380
> [ 1040.745033] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 1040.745867] flags: 0x4500000000840(slab|head|section=34|zone=2)
> [ 1040.746766] page_type: 0xffffffff()
> [ 1040.747241] raw: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10
> [ 1040.747981] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
> [ 1040.748723] head: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10
> [ 1040.749474] head: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
> [ 1040.750291] head: 0004500000000003 ffff88813f2ce001 dead000000000122 00000000ffffffff
> [ 1040.751227] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 1040.751961] page dumped because: kasan: bad access detected
> 
> [ 1040.752735] Memory state around the buggy address:
> [ 1040.753213]  ffff888113385780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 1040.753960]  ffff888113385800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 1040.754660] >ffff888113385880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 1040.755360]                    ^
> [ 1040.755725]  ffff888113385900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 1040.756435]  ffff888113385980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 1040.757274] ==================================================================
> [ 1041.613686] BTRFS info (device vdd): scrub: started on devid 1
> [ 1041.613740] BTRFS info (device vdd): scrub: started on devid 2
> [ 1042.472852] assertion failed: i < stripe->nr_sectors, in fs/btrfs/scrub.c:753
> [ 1042.473680] ------------[ cut here ]------------
> [ 1042.474180] kernel BUG at fs/btrfs/scrub.c:753!
> [ 1042.474550] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [ 1042.474954] CPU: 0 PID: 27589 Comm: btrfs Tainted: G    B              6.9.0-rc7-default+ #417
> [ 1042.475640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [ 1042.476821] RIP: 0010:calc_sector_number.cold+0x3d/0x5d
> [ 1042.477428] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5
> [ 1042.479196] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286
> [ 1042.479840] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000
> [ 1042.480706] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7
> [ 1042.481511] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c
> [ 1042.482360] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080
> [ 1042.483170] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72
> [ 1042.483957] FS:  00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000
> [ 1042.484860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1042.485584] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0
> [ 1042.486319] Call Trace:
> [ 1042.486620]  <TASK>
> [ 1042.486940]  ? show_trace_log_lvl+0x1e9/0x2e0
> [ 1042.487501]  ? scrub_read_endio+0x5a/0x290
> [ 1042.488060]  ? __die+0x4f/0x8c
> [ 1042.488513]  ? die+0x25/0x40
> [ 1042.488855]  ? do_trap+0x125/0x180
> [ 1042.489277]  ? calc_sector_number.cold+0x3d/0x5d
> [ 1042.489878]  ? calc_sector_number.cold+0x3d/0x5d
> [ 1042.490488]  ? do_error_trap+0x8b/0x130
> [ 1042.491002]  ? calc_sector_number.cold+0x3d/0x5d
> [ 1042.491541]  ? handle_invalid_op+0x38/0x40
> [ 1042.492009]  ? calc_sector_number.cold+0x3d/0x5d
> [ 1042.492542]  ? exc_invalid_op+0x29/0x40
> [ 1042.493035]  ? asm_exc_invalid_op+0x16/0x20
> [ 1042.493575]  ? this_cpu_in_panic+0x1d/0x30
> [ 1042.494034]  ? calc_sector_number.cold+0x3d/0x5d
> [ 1042.494535]  scrub_read_endio+0x5a/0x290
> [ 1042.495058]  scrub_submit_extent_sector_read.isra.0+0x60a/0x630
> [ 1042.495695]  ? scrub_stripe_submit_repair_read+0x440/0x440
> [ 1042.496321]  ? _find_next_bit+0x37/0xc0
> [ 1042.496855]  ? scrub_submit_initial_read+0x139/0x3e0
> [ 1042.497367]  submit_initial_group_read+0x175/0x330
> [ 1042.497926]  ? lock_sync+0xd0/0xd0
> [ 1042.498329]  ? scrub_submit_initial_read+0x3e0/0x3e0
> [ 1042.498885]  ? rcu_read_unlock+0x80/0x80
> [ 1042.499383]  scrub_simple_mirror.isra.0+0x2e7/0x380
> [ 1042.499965]  scrub_stripe+0x76d/0x7d0
> [ 1042.500469]  ? rcu_is_watching+0x1c/0x40
> [ 1042.500898]  ? rcu_is_watching+0x1c/0x40
> [ 1042.501399]  ? scrub_setup_ctx+0x210/0x210
> [ 1042.501887]  ? __down_write_trylock+0xf8/0x260
> [ 1042.502428]  ? down_read_trylock+0x50/0x50
> [ 1042.502977]  ? __might_resched+0x162/0x240
> [ 1042.503462]  ? rcu_read_unlock+0x80/0x80
> [ 1042.503980]  ? do_raw_read_unlock+0x23/0x50
> [ 1042.504501]  scrub_chunk+0x137/0x200
> [ 1042.505023]  scrub_enumerate_chunks+0x539/0x960
> [ 1042.505594]  ? scrub_chunk+0x200/0x200
> [ 1042.506084]  ? do_raw_spin_unlock+0x8d/0xe0
> [ 1042.506562]  ? bit_wait_io_timeout+0x10/0xc0
> [ 1042.507087]  ? do_raw_spin_trylock+0xc7/0x110
> [ 1042.507635]  btrfs_scrub_dev+0x3e4/0x820
> [ 1042.508105]  ? scrub_print_warning_inode+0x540/0x540
> [ 1042.508666]  ? mnt_get_write_access+0x6a/0x120
> [ 1042.509177]  ? mnt_get_write_access+0xfd/0x120
> [ 1042.509788]  btrfs_ioctl+0x79d/0x17c0
> [ 1042.510236]  ? btrfs_ioctl_get_supported_features+0x20/0x20
> [ 1042.510852]  ? lock_sync+0xd0/0xd0
> [ 1042.511274]  ? do_raw_spin_trylock+0xc7/0x110
> [ 1042.511751]  ? do_raw_spin_lock+0x1a0/0x1a0
> [ 1042.512374]  ? rcu_is_watching+0x1c/0x40
> [ 1042.512957]  ? rcu_is_watching+0x1c/0x40
> [ 1042.513538]  ? lock_release+0x52/0x1a0
> [ 1042.514104]  ? __fget_files+0x13b/0x230
> [ 1042.514669]  __x64_sys_ioctl+0xbc/0xe0
> [ 1042.515235]  do_syscall_64+0x5b/0xf0
> [ 1042.515777]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [ 1042.516471] RIP: 0033:0x7fd369fa03df
> [ 1042.516968] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [ 1042.518755] RSP: 002b:00007fd36967dc80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 1042.519784] RAX: ffffffffffffffda RBX: 00005627eeea1758 RCX: 00007fd369fa03df
> [ 1042.520548] RDX: 00005627eeea1758 RSI: 00000000c400941b RDI: 0000000000000003
> [ 1042.521301] RBP: 0000000000000000 R08: 00007fd36967e6c0 R09: 32203a6b63617473
> [ 1042.521944] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffdb8
> [ 1042.522786] R13: 0000000000000002 R14: 00007ffecdb6f450 R15: 00007fd368e7e000
> [ 1042.523695]  </TASK>
> [ 1042.523965] Modules linked in: loop
> [ 1042.524367] ---[ end trace 0000000000000000 ]---
> [ 1042.524856] RIP: 0010:calc_sector_number.cold+0x3d/0x5d
> [ 1042.525382] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5
> [ 1042.527082] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286
> [ 1042.527686] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000
> [ 1042.528388] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7
> [ 1042.529109] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c
> [ 1042.529830] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080
> [ 1042.530583] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72
> [ 1042.531412] FS:  00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000
> [ 1042.534519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1042.535162] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0
>