diff mbox series

btrfs: zoned: fix use-after-free in do_zone_finish

Message ID 4b26736862c49050ef907e6f326ab34c0e82c6b8.1709208898.git.jth@kernel.org (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned: fix use-after-free in do_zone_finish | expand

Commit Message

Johannes Thumshirn Feb. 29, 2024, 12:16 p.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Shinichiro reported the following use-after-free triggered by the device
replace operation in fstests btrfs/070.

 BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
 ==================================================================
 BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
 Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007

 CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x90
  print_report+0xcf/0x670
  ? __virt_addr_valid+0x200/0x3e0
  kasan_report+0xd8/0x110
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  do_zone_finish+0x91a/0xb90 [btrfs]
  btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
  ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
  ? btrfs_put_root+0x2d/0x220 [btrfs]
  ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
  cleaner_kthread+0x21e/0x380 [btrfs]
  ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
  kthread+0x2e3/0x3c0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

 Allocated by task 3493983:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  __kasan_kmalloc+0xaa/0xb0
  btrfs_alloc_device+0xb3/0x4e0 [btrfs]
  device_list_add.constprop.0+0x993/0x1630 [btrfs]
  btrfs_scan_one_device+0x219/0x3d0 [btrfs]
  btrfs_control_ioctl+0x26e/0x310 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 Freed by task 3494056:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  kasan_save_free_info+0x3f/0x60
  poison_slab_object+0x102/0x170
  __kasan_slab_free+0x32/0x70
  kfree+0x11b/0x320
  btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
  btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
  btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
  btrfs_ioctl+0xb27/0x57d0 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 The buggy address belongs to the object at ffff8881543c8000
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 96 bytes inside of
  freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)

 The buggy address belongs to the physical page:
 page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
 head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

This UAF happens because we're accessing stale zone information of a
already removed btrfs_device in do_zone_finish().

The sequence of events is as follows:

btrfs_dev_replace_start
  btrfs_scrub_dev
   btrfs_dev_replace_finishing
    btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
    btrfs_rm_dev_replace_free_srcdev
     btrfs_free_device                              <-- device freed

cleaner_kthread
 btrfs_delete_unused_bgs
  btrfs_zone_finish
   do_zone_finish              <-- refers the freed device

The reason for this is that we're using a cached pointer to the chunk_map
from the block group, but on device replace this cached pointer can
contain stale device entries.

So grab a fresh reference to the chunk map and don't rely on the cached
version.

Many thanks to Shinichiro for analyzing the bug.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Filipe Manana Feb. 29, 2024, 12:57 p.m. UTC | #1
On Thu, Feb 29, 2024 at 12:37 PM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Shinichiro reported the following use-after-free triggered by the device
> replace operation in fstests btrfs/070.
>
>  BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
>  ==================================================================
>  BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
>  Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
>
>  CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
>  Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x5b/0x90
>   print_report+0xcf/0x670
>   ? __virt_addr_valid+0x200/0x3e0
>   kasan_report+0xd8/0x110
>   ? do_zone_finish+0x91a/0xb90 [btrfs]
>   ? do_zone_finish+0x91a/0xb90 [btrfs]
>   do_zone_finish+0x91a/0xb90 [btrfs]
>   btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
>   ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
>   ? btrfs_put_root+0x2d/0x220 [btrfs]
>   ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
>   cleaner_kthread+0x21e/0x380 [btrfs]
>   ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
>   kthread+0x2e3/0x3c0
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x31/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1b/0x30
>   </TASK>
>
>  Allocated by task 3493983:
>   kasan_save_stack+0x33/0x60
>   kasan_save_track+0x14/0x30
>   __kasan_kmalloc+0xaa/0xb0
>   btrfs_alloc_device+0xb3/0x4e0 [btrfs]
>   device_list_add.constprop.0+0x993/0x1630 [btrfs]
>   btrfs_scan_one_device+0x219/0x3d0 [btrfs]
>   btrfs_control_ioctl+0x26e/0x310 [btrfs]
>   __x64_sys_ioctl+0x134/0x1b0
>   do_syscall_64+0x99/0x190
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
>  Freed by task 3494056:
>   kasan_save_stack+0x33/0x60
>   kasan_save_track+0x14/0x30
>   kasan_save_free_info+0x3f/0x60
>   poison_slab_object+0x102/0x170
>   __kasan_slab_free+0x32/0x70
>   kfree+0x11b/0x320
>   btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
>   btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
>   btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
>   btrfs_ioctl+0xb27/0x57d0 [btrfs]
>   __x64_sys_ioctl+0x134/0x1b0
>   do_syscall_64+0x99/0x190
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
>  The buggy address belongs to the object at ffff8881543c8000
>   which belongs to the cache kmalloc-1k of size 1024
>  The buggy address is located 96 bytes inside of
>   freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
>
>  The buggy address belongs to the physical page:
>  page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
>  head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>  page_type: 0xffffffff()
>  raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
>  raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>  page dumped because: kasan: bad access detected
>
>  Memory state around the buggy address:
>   ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                         ^
>   ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> This UAF happens because we're accessing stale zone information of a
> already removed btrfs_device in do_zone_finish().
>
> The sequence of events is as follows:
>
> btrfs_dev_replace_start
>   btrfs_scrub_dev
>    btrfs_dev_replace_finishing
>     btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
>     btrfs_rm_dev_replace_free_srcdev
>      btrfs_free_device                              <-- device freed
>
> cleaner_kthread
>  btrfs_delete_unused_bgs
>   btrfs_zone_finish
>    do_zone_finish              <-- refers the freed device
>
> The reason for this is that we're using a cached pointer to the chunk_map
> from the block group, but on device replace this cached pointer can
> contain stale device entries.
>
> So grab a fresh reference to the chunk map and don't rely on the cached
> version.
>
> Many thanks to Shinichiro for analyzing the bug.
>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/zoned.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 3317bebfca95..c27d2214136e 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2237,7 +2237,8 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
>         btrfs_clear_data_reloc_bg(block_group);
>         spin_unlock(&block_group->lock);
>
> -       map = block_group->physical_map;
> +       map = btrfs_find_chunk_map(fs_info, block_group->start,
> +                                  block_group->length);

This fits nicely within a single line, why the split?

So the whole problem comes from the fact that
block_group->physical_map is a clone of the original chunk map, which
the device replace operation didn't update.

Don't we need to update block_group->physical_map?
I don't see anywhere else updating it, it's only set when loading
block groups from disk at mount time or when creating a new one.
After a device is replaced it's never updated. So can't this
use-after-free happen in other code paths that use the stale chunk map
at block_group->physical_map?

Thanks.




>         for (i = 0; i < map->num_stripes; i++) {
>                 struct btrfs_device *device = map->stripes[i].dev;
>                 const u64 physical = map->stripes[i].physical;
> --
> 2.35.3
>
>
Johannes Thumshirn Feb. 29, 2024, 1:10 p.m. UTC | #2
On 29.02.24 13:58, Filipe Manana wrote:
> On Thu, Feb 29, 2024 at 12:37 PM Johannes Thumshirn <jth@kernel.org> wrote:
>>
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> Shinichiro reported the following use-after-free triggered by the device
>> replace operation in fstests btrfs/070.
>>
>>   BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
>>   ==================================================================
>>   BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
>>   Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
>>
>>   CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
>>   Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
>>   Call Trace:
>>    <TASK>
>>    dump_stack_lvl+0x5b/0x90
>>    print_report+0xcf/0x670
>>    ? __virt_addr_valid+0x200/0x3e0
>>    kasan_report+0xd8/0x110
>>    ? do_zone_finish+0x91a/0xb90 [btrfs]
>>    ? do_zone_finish+0x91a/0xb90 [btrfs]
>>    do_zone_finish+0x91a/0xb90 [btrfs]
>>    btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
>>    ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
>>    ? btrfs_put_root+0x2d/0x220 [btrfs]
>>    ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
>>    cleaner_kthread+0x21e/0x380 [btrfs]
>>    ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
>>    kthread+0x2e3/0x3c0
>>    ? __pfx_kthread+0x10/0x10
>>    ret_from_fork+0x31/0x70
>>    ? __pfx_kthread+0x10/0x10
>>    ret_from_fork_asm+0x1b/0x30
>>    </TASK>
>>
>>   Allocated by task 3493983:
>>    kasan_save_stack+0x33/0x60
>>    kasan_save_track+0x14/0x30
>>    __kasan_kmalloc+0xaa/0xb0
>>    btrfs_alloc_device+0xb3/0x4e0 [btrfs]
>>    device_list_add.constprop.0+0x993/0x1630 [btrfs]
>>    btrfs_scan_one_device+0x219/0x3d0 [btrfs]
>>    btrfs_control_ioctl+0x26e/0x310 [btrfs]
>>    __x64_sys_ioctl+0x134/0x1b0
>>    do_syscall_64+0x99/0x190
>>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>
>>   Freed by task 3494056:
>>    kasan_save_stack+0x33/0x60
>>    kasan_save_track+0x14/0x30
>>    kasan_save_free_info+0x3f/0x60
>>    poison_slab_object+0x102/0x170
>>    __kasan_slab_free+0x32/0x70
>>    kfree+0x11b/0x320
>>    btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
>>    btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
>>    btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
>>    btrfs_ioctl+0xb27/0x57d0 [btrfs]
>>    __x64_sys_ioctl+0x134/0x1b0
>>    do_syscall_64+0x99/0x190
>>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>
>>   The buggy address belongs to the object at ffff8881543c8000
>>    which belongs to the cache kmalloc-1k of size 1024
>>   The buggy address is located 96 bytes inside of
>>    freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
>>
>>   The buggy address belongs to the physical page:
>>   page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
>>   head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>>   flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>>   page_type: 0xffffffff()
>>   raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
>>   raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>>   page dumped because: kasan: bad access detected
>>
>>   Memory state around the buggy address:
>>    ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>    ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                          ^
>>    ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>    ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>> This UAF happens because we're accessing stale zone information of a
>> already removed btrfs_device in do_zone_finish().
>>
>> The sequence of events is as follows:
>>
>> btrfs_dev_replace_start
>>    btrfs_scrub_dev
>>     btrfs_dev_replace_finishing
>>      btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
>>      btrfs_rm_dev_replace_free_srcdev
>>       btrfs_free_device                              <-- device freed
>>
>> cleaner_kthread
>>   btrfs_delete_unused_bgs
>>    btrfs_zone_finish
>>     do_zone_finish              <-- refers the freed device
>>
>> The reason for this is that we're using a cached pointer to the chunk_map
>> from the block group, but on device replace this cached pointer can
>> contain stale device entries.
>>
>> So grab a fresh reference to the chunk map and don't rely on the cached
>> version.
>>
>> Many thanks to Shinichiro for analyzing the bug.
>>
>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/zoned.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
>> index 3317bebfca95..c27d2214136e 100644
>> --- a/fs/btrfs/zoned.c
>> +++ b/fs/btrfs/zoned.c
>> @@ -2237,7 +2237,8 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
>>          btrfs_clear_data_reloc_bg(block_group);
>>          spin_unlock(&block_group->lock);
>>
>> -       map = block_group->physical_map;
>> +       map = btrfs_find_chunk_map(fs_info, block_group->start,
>> +                                  block_group->length);
> 
> This fits nicely within a single line, why the split?

Could be because I did use btrfs_find_chunk_map_nolock() before and lock 
the whole loop, which cause deadlocks...

> So the whole problem comes from the fact that
> block_group->physical_map is a clone of the original chunk map, which
> the device replace operation didn't update.
> 
> Don't we need to update block_group->physical_map?
> I don't see anywhere else updating it, it's only set when loading
> block groups from disk at mount time or when creating a new one.
> After a device is replaced it's never updated. So can't this
> use-after-free happen in other code paths that use the stale chunk map
> at block_group->physical_map?

Yes I can give it a shot, but to do this conveniently we'd need to add a 
back-pointer from the chunk map to the block group.

Let me see how that'll look.
Filipe Manana Feb. 29, 2024, 1:13 p.m. UTC | #3
On Thu, Feb 29, 2024 at 1:10 PM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 29.02.24 13:58, Filipe Manana wrote:
> > On Thu, Feb 29, 2024 at 12:37 PM Johannes Thumshirn <jth@kernel.org> wrote:
> >>
> >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >>
> >> Shinichiro reported the following use-after-free triggered by the device
> >> replace operation in fstests btrfs/070.
> >>
> >>   BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
> >>   ==================================================================
> >>   BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
> >>   Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
> >>
> >>   CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
> >>   Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
> >>   Call Trace:
> >>    <TASK>
> >>    dump_stack_lvl+0x5b/0x90
> >>    print_report+0xcf/0x670
> >>    ? __virt_addr_valid+0x200/0x3e0
> >>    kasan_report+0xd8/0x110
> >>    ? do_zone_finish+0x91a/0xb90 [btrfs]
> >>    ? do_zone_finish+0x91a/0xb90 [btrfs]
> >>    do_zone_finish+0x91a/0xb90 [btrfs]
> >>    btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
> >>    ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
> >>    ? btrfs_put_root+0x2d/0x220 [btrfs]
> >>    ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
> >>    cleaner_kthread+0x21e/0x380 [btrfs]
> >>    ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
> >>    kthread+0x2e3/0x3c0
> >>    ? __pfx_kthread+0x10/0x10
> >>    ret_from_fork+0x31/0x70
> >>    ? __pfx_kthread+0x10/0x10
> >>    ret_from_fork_asm+0x1b/0x30
> >>    </TASK>
> >>
> >>   Allocated by task 3493983:
> >>    kasan_save_stack+0x33/0x60
> >>    kasan_save_track+0x14/0x30
> >>    __kasan_kmalloc+0xaa/0xb0
> >>    btrfs_alloc_device+0xb3/0x4e0 [btrfs]
> >>    device_list_add.constprop.0+0x993/0x1630 [btrfs]
> >>    btrfs_scan_one_device+0x219/0x3d0 [btrfs]
> >>    btrfs_control_ioctl+0x26e/0x310 [btrfs]
> >>    __x64_sys_ioctl+0x134/0x1b0
> >>    do_syscall_64+0x99/0x190
> >>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >>
> >>   Freed by task 3494056:
> >>    kasan_save_stack+0x33/0x60
> >>    kasan_save_track+0x14/0x30
> >>    kasan_save_free_info+0x3f/0x60
> >>    poison_slab_object+0x102/0x170
> >>    __kasan_slab_free+0x32/0x70
> >>    kfree+0x11b/0x320
> >>    btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
> >>    btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
> >>    btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
> >>    btrfs_ioctl+0xb27/0x57d0 [btrfs]
> >>    __x64_sys_ioctl+0x134/0x1b0
> >>    do_syscall_64+0x99/0x190
> >>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >>
> >>   The buggy address belongs to the object at ffff8881543c8000
> >>    which belongs to the cache kmalloc-1k of size 1024
> >>   The buggy address is located 96 bytes inside of
> >>    freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
> >>
> >>   The buggy address belongs to the physical page:
> >>   page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
> >>   head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> >>   flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> >>   page_type: 0xffffffff()
> >>   raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
> >>   raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> >>   page dumped because: kasan: bad access detected
> >>
> >>   Memory state around the buggy address:
> >>    ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>    ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>   >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>                                                          ^
> >>    ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>    ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>
> >> This UAF happens because we're accessing stale zone information of a
> >> already removed btrfs_device in do_zone_finish().
> >>
> >> The sequence of events is as follows:
> >>
> >> btrfs_dev_replace_start
> >>    btrfs_scrub_dev
> >>     btrfs_dev_replace_finishing
> >>      btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
> >>      btrfs_rm_dev_replace_free_srcdev
> >>       btrfs_free_device                              <-- device freed
> >>
> >> cleaner_kthread
> >>   btrfs_delete_unused_bgs
> >>    btrfs_zone_finish
> >>     do_zone_finish              <-- refers the freed device
> >>
> >> The reason for this is that we're using a cached pointer to the chunk_map
> >> from the block group, but on device replace this cached pointer can
> >> contain stale device entries.
> >>
> >> So grab a fresh reference to the chunk map and don't rely on the cached
> >> version.
> >>
> >> Many thanks to Shinichiro for analyzing the bug.
> >>
> >> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>   fs/btrfs/zoned.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> >> index 3317bebfca95..c27d2214136e 100644
> >> --- a/fs/btrfs/zoned.c
> >> +++ b/fs/btrfs/zoned.c
> >> @@ -2237,7 +2237,8 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
> >>          btrfs_clear_data_reloc_bg(block_group);
> >>          spin_unlock(&block_group->lock);
> >>
> >> -       map = block_group->physical_map;
> >> +       map = btrfs_find_chunk_map(fs_info, block_group->start,
> >> +                                  block_group->length);
> >
> > This fits nicely within a single line, why the split?
>
> Could be because I did use btrfs_find_chunk_map_nolock() before and lock
> the whole loop, which cause deadlocks...
>
> > So the whole problem comes from the fact that
> > block_group->physical_map is a clone of the original chunk map, which
> > the device replace operation didn't update.
> >
> > Don't we need to update block_group->physical_map?
> > I don't see anywhere else updating it, it's only set when loading
> > block groups from disk at mount time or when creating a new one.
> > After a device is replaced it's never updated. So can't this
> > use-after-free happen in other code paths that use the stale chunk map
> > at block_group->physical_map?
>
> Yes I can give it a shot, but to do this conveniently we'd need to add a
> back-pointer from the chunk map to the block group.

Btw, the patch is also leaking the chunk map.

Because btrfs_find_chunk_map() adds a reference to the map, but then
there's no corresponding btrfs_free_chunk_map() call.

>
> Let me see how that'll look.
>
Damien Le Moal Feb. 29, 2024, 5:19 p.m. UTC | #4
On 2024/02/29 4:16, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Shinichiro reported the following use-after-free triggered by the device
> replace operation in fstests btrfs/070.
> 
>  BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
>  ==================================================================
>  BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
>  Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
> 
>  CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
>  Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x5b/0x90
>   print_report+0xcf/0x670
>   ? __virt_addr_valid+0x200/0x3e0
>   kasan_report+0xd8/0x110
>   ? do_zone_finish+0x91a/0xb90 [btrfs]
>   ? do_zone_finish+0x91a/0xb90 [btrfs]
>   do_zone_finish+0x91a/0xb90 [btrfs]
>   btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
>   ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
>   ? btrfs_put_root+0x2d/0x220 [btrfs]
>   ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
>   cleaner_kthread+0x21e/0x380 [btrfs]
>   ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
>   kthread+0x2e3/0x3c0
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x31/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1b/0x30
>   </TASK>
> 
>  Allocated by task 3493983:
>   kasan_save_stack+0x33/0x60
>   kasan_save_track+0x14/0x30
>   __kasan_kmalloc+0xaa/0xb0
>   btrfs_alloc_device+0xb3/0x4e0 [btrfs]
>   device_list_add.constprop.0+0x993/0x1630 [btrfs]
>   btrfs_scan_one_device+0x219/0x3d0 [btrfs]
>   btrfs_control_ioctl+0x26e/0x310 [btrfs]
>   __x64_sys_ioctl+0x134/0x1b0
>   do_syscall_64+0x99/0x190
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
>  Freed by task 3494056:
>   kasan_save_stack+0x33/0x60
>   kasan_save_track+0x14/0x30
>   kasan_save_free_info+0x3f/0x60
>   poison_slab_object+0x102/0x170
>   __kasan_slab_free+0x32/0x70
>   kfree+0x11b/0x320
>   btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
>   btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
>   btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
>   btrfs_ioctl+0xb27/0x57d0 [btrfs]
>   __x64_sys_ioctl+0x134/0x1b0
>   do_syscall_64+0x99/0x190
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
>  The buggy address belongs to the object at ffff8881543c8000
>   which belongs to the cache kmalloc-1k of size 1024
>  The buggy address is located 96 bytes inside of
>   freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
> 
>  The buggy address belongs to the physical page:
>  page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
>  head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>  page_type: 0xffffffff()
>  raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
>  raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>  page dumped because: kasan: bad access detected
> 
>  Memory state around the buggy address:
>   ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                         ^
>   ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> This UAF happens because we're accessing stale zone information of a
> already removed btrfs_device in do_zone_finish().
> 
> The sequence of events is as follows:
> 
> btrfs_dev_replace_start
>   btrfs_scrub_dev
>    btrfs_dev_replace_finishing
>     btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
>     btrfs_rm_dev_replace_free_srcdev
>      btrfs_free_device                              <-- device freed
> 
> cleaner_kthread
>  btrfs_delete_unused_bgs
>   btrfs_zone_finish
>    do_zone_finish              <-- refers the freed device
> 
> The reason for this is that we're using a cached pointer to the chunk_map
> from the block group, but on device replace this cached pointer can
> contain stale device entries.
> 
> So grab a fresh reference to the chunk map and don't rely on the cached
> version.
> 
> Many thanks to Shinichiro for analyzing the bug.
> 
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Same here... Fixes and Cc:stable tags ?

> ---
>  fs/btrfs/zoned.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 3317bebfca95..c27d2214136e 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2237,7 +2237,8 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
>  	btrfs_clear_data_reloc_bg(block_group);
>  	spin_unlock(&block_group->lock);
>  
> -	map = block_group->physical_map;
> +	map = btrfs_find_chunk_map(fs_info, block_group->start,
> +				   block_group->length);
>  	for (i = 0; i < map->num_stripes; i++) {
>  		struct btrfs_device *device = map->stripes[i].dev;
>  		const u64 physical = map->stripes[i].physical;
David Sterba Feb. 29, 2024, 7:39 p.m. UTC | #5
On Thu, Feb 29, 2024 at 09:19:21AM -0800, Damien Le Moal wrote:
> On 2024/02/29 4:16, Johannes Thumshirn wrote:
> > From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > 
> > Shinichiro reported the following use-after-free triggered by the device
> > replace operation in fstests btrfs/070.
> > 
> >  BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
> >  ==================================================================
> >  BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
> >  Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007
> > 
> >  CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
> >  Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
> >  Call Trace:
> >   <TASK>
> >   dump_stack_lvl+0x5b/0x90
> >   print_report+0xcf/0x670
> >   ? __virt_addr_valid+0x200/0x3e0
> >   kasan_report+0xd8/0x110
> >   ? do_zone_finish+0x91a/0xb90 [btrfs]
> >   ? do_zone_finish+0x91a/0xb90 [btrfs]
> >   do_zone_finish+0x91a/0xb90 [btrfs]
> >   btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
> >   ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
> >   ? btrfs_put_root+0x2d/0x220 [btrfs]
> >   ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
> >   cleaner_kthread+0x21e/0x380 [btrfs]
> >   ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
> >   kthread+0x2e3/0x3c0
> >   ? __pfx_kthread+0x10/0x10
> >   ret_from_fork+0x31/0x70
> >   ? __pfx_kthread+0x10/0x10
> >   ret_from_fork_asm+0x1b/0x30
> >   </TASK>
> > 
> >  Allocated by task 3493983:
> >   kasan_save_stack+0x33/0x60
> >   kasan_save_track+0x14/0x30
> >   __kasan_kmalloc+0xaa/0xb0
> >   btrfs_alloc_device+0xb3/0x4e0 [btrfs]
> >   device_list_add.constprop.0+0x993/0x1630 [btrfs]
> >   btrfs_scan_one_device+0x219/0x3d0 [btrfs]
> >   btrfs_control_ioctl+0x26e/0x310 [btrfs]
> >   __x64_sys_ioctl+0x134/0x1b0
> >   do_syscall_64+0x99/0x190
> >   entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > 
> >  Freed by task 3494056:
> >   kasan_save_stack+0x33/0x60
> >   kasan_save_track+0x14/0x30
> >   kasan_save_free_info+0x3f/0x60
> >   poison_slab_object+0x102/0x170
> >   __kasan_slab_free+0x32/0x70
> >   kfree+0x11b/0x320
> >   btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
> >   btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
> >   btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
> >   btrfs_ioctl+0xb27/0x57d0 [btrfs]
> >   __x64_sys_ioctl+0x134/0x1b0
> >   do_syscall_64+0x99/0x190
> >   entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > 
> >  The buggy address belongs to the object at ffff8881543c8000
> >   which belongs to the cache kmalloc-1k of size 1024
> >  The buggy address is located 96 bytes inside of
> >   freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)
> > 
> >  The buggy address belongs to the physical page:
> >  page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
> >  head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> >  flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> >  page_type: 0xffffffff()
> >  raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
> >  raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> >  page dumped because: kasan: bad access detected
> > 
> >  Memory state around the buggy address:
> >   ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >   ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >  >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >                                                         ^
> >   ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >   ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > 
> > This UAF happens because we're accessing stale zone information of a
> > already removed btrfs_device in do_zone_finish().
> > 
> > The sequence of events is as follows:
> > 
> > btrfs_dev_replace_start
> >   btrfs_scrub_dev
> >    btrfs_dev_replace_finishing
> >     btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
> >     btrfs_rm_dev_replace_free_srcdev
> >      btrfs_free_device                              <-- device freed
> > 
> > cleaner_kthread
> >  btrfs_delete_unused_bgs
> >   btrfs_zone_finish
> >    do_zone_finish              <-- refers the freed device
> > 
> > The reason for this is that we're using a cached pointer to the chunk_map
> > from the block group, but on device replace this cached pointer can
> > contain stale device entries.
> > 
> > So grab a fresh reference to the chunk map and don't rely on the cached
> > version.
> > 
> > Many thanks to Shinichiro for analyzing the bug.
> > 
> > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Same here... Fixes and Cc:stable tags ?

We don't require stable tags when posted it's not always clear where and
if it applies, I do an evaluation later.
diff mbox series

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 3317bebfca95..c27d2214136e 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2237,7 +2237,8 @@  static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
 	btrfs_clear_data_reloc_bg(block_group);
 	spin_unlock(&block_group->lock);
 
-	map = block_group->physical_map;
+	map = btrfs_find_chunk_map(fs_info, block_group->start,
+				   block_group->length);
 	for (i = 0; i < map->num_stripes; i++) {
 		struct btrfs_device *device = map->stripes[i].dev;
 		const u64 physical = map->stripes[i].physical;