[0/5,v2] bfq: Avoid use-after-free when moving processes between cgroups

Message ID	20220105143037.20542-1-jack@suse.cz (mailing list archive)
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Jan Kara <jack@suse.cz> To: <linux-block@vger.kernel.org> Cc: Paolo Valente <paolo.valente@linaro.org>, Jens Axboe <axboe@kernel.dk>, "yukuai (C)" <yukuai3@huawei.com>, Jan Kara <jack@suse.cz> Subject: [PATCH 0/5 v2] bfq: Avoid use-after-free when moving processes between cgroups Date: Wed, 5 Jan 2022 15:36:31 +0100 Message-Id: <20220105143037.20542-1-jack@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bfq: Avoid use-after-free when moving processes between cgroups \| expand [0/5,v2] bfq: Avoid use-after-free when moving processes between cgroups [1/5] bfq: Avoid false marking of bic as stably merged [2/5] bfq: Avoid merging queues with different parents [3/5] bfq: Simplify bfq_put_cooperator() [4/5] bfq: Split shared queues on move between cgroups [5/5] bfq: Update cgroup information before merging bio

Jan Kara Jan. 5, 2022, 2:36 p.m. UTC

Hello,

here is the second version of my patches to fix use-after-free issues in BFQ
when processes with merged queues get moved to different cgroups. The patches
have survived some beating in my test VM but so far I fail to reproduce the
original KASAN reports so testing from people who can reproduce them is most
welcome. Thanks!

Changes since v1:
* Added fix for bfq_put_cooperator()
* Added fix to handle move between cgroups in bfq_merge_bio()

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1

Yu Kuai Jan. 7, 2022, 9:15 a.m. UTC | #1

在 2022/01/05 22:36, Jan Kara 写道:
> Hello,
> 
> here is the second version of my patches to fix use-after-free issues in BFQ
> when processes with merged queues get moved to different cgroups. The patches
> have survived some beating in my test VM but so far I fail to reproduce the
> original KASAN reports so testing from people who can reproduce them is most
> welcome. Thanks!
> 
> Changes since v1:
> * Added fix for bfq_put_cooperator()
> * Added fix to handle move between cgroups in bfq_merge_bio()
> 
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> .
> 

Hi,

I repoduced the problem again with this patchset...

[   71.004788] BUG: KASAN: use-after-free in 
__bfq_deactivate_entity+0x21/0x290
[   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
[   71.007723]
[   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W 
5.16.0-rc5-next-2021127
[   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-4
[   71.012274] Call Trace:
[   71.012603]  <TASK>
[   71.012886]  dump_stack_lvl+0x34/0x44
[   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
[   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
[   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
[   71.015398]  kasan_report.cold+0x83/0xdf
[   71.015904]  ? _raw_read_lock_bh+0x20/0x40
[   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
[   71.017033]  __bfq_deactivate_entity+0x21/0x290
[   71.017617]  bfq_pd_offline+0xc1/0x110
[   71.018105]  blkcg_deactivate_policy+0x14b/0x210
[   71.018699]  bfq_exit_queue+0xe5/0x100
[   71.019189]  blk_mq_exit_sched+0x113/0x140
[   71.019720]  elevator_exit+0x30/0x50
[   71.020186]  blk_release_queue+0xa8/0x160
[   71.020702]  kobject_put+0xd4/0x270
[   71.021158]  disk_release+0xc5/0xf0
[   71.021609]  device_release+0x56/0xe0
[   71.022086]  kobject_put+0xd4/0x270
[   71.022546]  null_del_dev.part.0+0xe6/0x220 [null_blk]
[   71.023228]  null_exit+0x62/0xa6 [null_blk]
[   71.023792]  __x64_sys_delete_module+0x20a/0x2f0
[   71.024387]  ? __ia32_sys_delete_module+0x2f0/0x2f0
[   71.025008]  ? fpregs_assert_state_consistent+0x55/0x60
[   71.025674]  ? exit_to_user_mode_prepare+0x39/0x1e0
[   71.026304]  do_syscall_64+0x35/0x80
[   71.026763]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   71.027410] RIP: 0033:0x7fddec6bdfc7
[   71.027869] Code: 73 01 c3 48 8b 0d c1 de 2b 00 f7 d8 64 89 01 48 83 
c8 ff c3 66 2e 0f 1f 8
[   71.030206] RSP: 002b:00007fff9486bc08 EFLAGS: 00000206 ORIG_RAX: 
00000000000000b0
[   71.031160] RAX: ffffffffffffffda RBX: 00007fff9486bc68 RCX: 
00007fddec6bdfc7
[   71.032058] RDX: 000000000000000a RSI: 0000000000000800 RDI: 
000055fc863ac258
[   71.032962] RBP: 000055fc863ac1f0 R08: 00007fff9486ab81 R09: 
0000000000000000
[   71.033861] R10: 00007fddec730260 R11: 0000000000000206 R12: 
00007fff9486be30
[   71.034758] R13: 00007fff9486c45b R14: 000055fc863ab010 R15: 
000055fc863ac1f0
[   71.035659]  </TASK>
[   71.035946]
[   71.036149] Allocated by task 620:
[   71.036585]  kasan_save_stack+0x1e/0x40
[   71.037076]  __kasan_kmalloc+0x81/0xa0
[   71.037558]  bfq_pd_alloc+0x72/0x110
[   71.038015]  blkg_alloc+0x252/0x2c0
[   71.038467]  blkg_create+0x38e/0x560
[   71.038927]  bio_associate_blkg_from_css+0x3bc/0x490
[   71.039563]  bio_associate_blkg+0x3b/0x90
[   71.040076]  mpage_alloc+0x7c/0xe0
[   71.040516]  do_mpage_readpage+0xb42/0xff0
[   71.041040]  mpage_readahead+0x1fd/0x3f0
[   71.041544]  read_pages+0x144/0x770
[   71.041994]  page_cache_ra_unbounded+0x2d2/0x380
[   71.042586]  filemap_get_pages+0x1bc/0xaf0
[   71.043115]  filemap_read+0x1bf/0x5a0
[   71.043585]  aio_read+0x1ca/0x2f0
[   71.044014]  io_submit_one+0x874/0x11c0
[   71.044511]  __x64_sys_io_submit+0xfa/0x250
[   71.045048]  do_syscall_64+0x35/0x80
[   71.045513]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   71.046161]
[   71.046361] Freed by task 0:
[   71.046735]  kasan_save_stack+0x1e/0x40
[   71.047230]  kasan_set_track+0x21/0x30
[   71.047709]  kasan_set_free_info+0x20/0x30
[   71.048235]  __kasan_slab_free+0x103/0x180
[   71.048757]  kfree+0x9a/0x4b0
[   71.049144]  blkg_free.part.0+0x4a/0x90
[   71.049635]  rcu_do_batch+0x2e1/0x760
[   71.050111]  rcu_core+0x367/0x530
[   71.050536]  __do_softirq+0x119/0x3ba
[   71.051007]
[   71.051210] The buggy address belongs to the object at ffff88817a3dc000
[   71.051210]  which belongs to the cache kmalloc-2k of size 2048
[   71.052766] The buggy address is located 176 bytes inside of
[   71.052766]  2048-byte region [ffff88817a3dc000, ffff88817a3dc800)
[   71.054240] The buggy address belongs to the page:
[   71.054843] page:00000000ce62c7c2 refcount:1 mapcount:0 
mapping:0000000000000000 index:0x08
[   71.056021] head:00000000ce62c7c2 order:3 compound_mapcount:0 
compound_pincount:0
[   71.056961] flags: 
0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[   71.057894] raw: 0017ffffc0010200 dead000000000100 dead000000000122 
ffff888100042f00
[   71.058866] raw: 0000000000000000 0000000080080008 00000001ffffffff 
0000000000000000
[   71.059840] page dumped because: kasan: bad access detected
[   71.060543]
[   71.060742] Memory state around the buggy address:
[   71.061349]  ffff88817a3dbf80: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   71.062256]  ffff88817a3dc000: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   71.063163] >ffff88817a3dc080: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   71.064129]                                      ^
[   71.064734]  ffff88817a3dc100: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   71.065639]  ffff88817a3dc180: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   71.066544] 
==================================================================

Here is the caller of  __bfq_deactivate_entity:
(gdb) list *(bfq_pd_offline+0xc1)
0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
937                      * entities to the idle tree. It happens if, in some
938                      * of the calls to bfq_bfqq_move() performed by
939                      * bfq_reparent_active_queues(), the queue to 
move is
940                      * empty and gets expired.
941                      */
942                     bfq_flush_idle_tree(st);
943             }
944
945             __bfq_deactivate_entity(entity, false);

Jan Kara Jan. 7, 2022, 2:58 p.m. UTC | #2

On Fri 07-01-22 17:15:43, yukuai (C) wrote:
> 在 2022/01/05 22:36, Jan Kara 写道:
> > Hello,
> > 
> > here is the second version of my patches to fix use-after-free issues in BFQ
> > when processes with merged queues get moved to different cgroups. The patches
> > have survived some beating in my test VM but so far I fail to reproduce the
> > original KASAN reports so testing from people who can reproduce them is most
> > welcome. Thanks!
> > 
> > Changes since v1:
> > * Added fix for bfq_put_cooperator()
> > * Added fix to handle move between cgroups in bfq_merge_bio()
> > 
> > 								Honza
> > Previous versions:
> > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > .
> > 
> 
> Hi,
> 
> I repoduced the problem again with this patchset...

Thanks for testing! 

> [   71.004788] BUG: KASAN: use-after-free in
> __bfq_deactivate_entity+0x21/0x290
> [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
> [   71.007723]
> [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
> 5.16.0-rc5-next-2021127
> [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> ?-20190727_073836-4
> [   71.012274] Call Trace:
> [   71.012603]  <TASK>
> [   71.012886]  dump_stack_lvl+0x34/0x44
> [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
> [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
> [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
> [   71.015398]  kasan_report.cold+0x83/0xdf
> [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
> [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
> [   71.017033]  __bfq_deactivate_entity+0x21/0x290
> [   71.017617]  bfq_pd_offline+0xc1/0x110
> [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
...

> Here is the caller of  __bfq_deactivate_entity:
> (gdb) list *(bfq_pd_offline+0xc1)
> 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
> 937                      * entities to the idle tree. It happens if, in some
> 938                      * of the calls to bfq_bfqq_move() performed by
> 939                      * bfq_reparent_active_queues(), the queue to move
> is
> 940                      * empty and gets expired.
> 941                      */
> 942                     bfq_flush_idle_tree(st);
> 943             }
> 944
> 945             __bfq_deactivate_entity(entity, false);

So this is indeed strange. The group has in some of its idle service trees
an entity whose entity->sched_data points to already freed cgroup. In
particular it means the bfqq_entity_service_tree() leads to somewhere else
than the 'st' we called bfq_flush_idle_tree() with. This looks like a
different kind of problem AFAICT but still it needs fixing :). Can you
please run your reproducer with my patches + the attached debug patch on
top? Hopefully it should tell us more about the problematic entity and how
it got there... Thanks!

								Honza

Jan Kara Jan. 7, 2022, 5:27 p.m. UTC | #3

On Fri 07-01-22 15:58:53, Jan Kara wrote:
> On Fri 07-01-22 17:15:43, yukuai (C) wrote:
> > 在 2022/01/05 22:36, Jan Kara 写道:
> > > Hello,
> > > 
> > > here is the second version of my patches to fix use-after-free issues in BFQ
> > > when processes with merged queues get moved to different cgroups. The patches
> > > have survived some beating in my test VM but so far I fail to reproduce the
> > > original KASAN reports so testing from people who can reproduce them is most
> > > welcome. Thanks!
> > > 
> > > Changes since v1:
> > > * Added fix for bfq_put_cooperator()
> > > * Added fix to handle move between cgroups in bfq_merge_bio()
> > > 
> > > 								Honza
> > > Previous versions:
> > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > > .
> > > 
> > 
> > Hi,
> > 
> > I repoduced the problem again with this patchset...
> 
> Thanks for testing! 
> 
> > [   71.004788] BUG: KASAN: use-after-free in
> > __bfq_deactivate_entity+0x21/0x290
> > [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
> > [   71.007723]
> > [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
> > 5.16.0-rc5-next-2021127
> > [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > ?-20190727_073836-4
> > [   71.012274] Call Trace:
> > [   71.012603]  <TASK>
> > [   71.012886]  dump_stack_lvl+0x34/0x44
> > [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
> > [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
> > [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
> > [   71.015398]  kasan_report.cold+0x83/0xdf
> > [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
> > [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
> > [   71.017033]  __bfq_deactivate_entity+0x21/0x290
> > [   71.017617]  bfq_pd_offline+0xc1/0x110
> > [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
> ...
> 
> > Here is the caller of  __bfq_deactivate_entity:
> > (gdb) list *(bfq_pd_offline+0xc1)
> > 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
> > 937                      * entities to the idle tree. It happens if, in some
> > 938                      * of the calls to bfq_bfqq_move() performed by
> > 939                      * bfq_reparent_active_queues(), the queue to move
> > is
> > 940                      * empty and gets expired.
> > 941                      */
> > 942                     bfq_flush_idle_tree(st);
> > 943             }
> > 944
> > 945             __bfq_deactivate_entity(entity, false);
> 
> So this is indeed strange. The group has in some of its idle service trees
> an entity whose entity->sched_data points to already freed cgroup. In
> particular it means the bfqq_entity_service_tree() leads to somewhere else
> than the 'st' we called bfq_flush_idle_tree() with. This looks like a
> different kind of problem AFAICT but still it needs fixing :). Can you
> please run your reproducer with my patches + the attached debug patch on
> top? Hopefully it should tell us more about the problematic entity and how
> it got there... Thanks!

Forgot to attach the patch. Here it is...

								Honza

Yu Kuai Jan. 10, 2022, 1:49 a.m. UTC | #4

在 2022/01/07 22:58, Jan Kara 写道:
> On Fri 07-01-22 17:15:43, yukuai (C) wrote:
>> 在 2022/01/05 22:36, Jan Kara 写道:
>>> Hello,
>>>
>>> here is the second version of my patches to fix use-after-free issues in BFQ
>>> when processes with merged queues get moved to different cgroups. The patches
>>> have survived some beating in my test VM but so far I fail to reproduce the
>>> original KASAN reports so testing from people who can reproduce them is most
>>> welcome. Thanks!
>>>
>>> Changes since v1:
>>> * Added fix for bfq_put_cooperator()
>>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>>
>>> 								Honza
>>> Previous versions:
>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>>> .
>>>
>>
>> Hi,
>>
>> I repoduced the problem again with this patchset...
> 
> Thanks for testing!
> 
>> [   71.004788] BUG: KASAN: use-after-free in
>> __bfq_deactivate_entity+0x21/0x290
>> [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
>> [   71.007723]
>> [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
>> 5.16.0-rc5-next-2021127
>> [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> ?-20190727_073836-4
>> [   71.012274] Call Trace:
>> [   71.012603]  <TASK>
>> [   71.012886]  dump_stack_lvl+0x34/0x44
>> [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
>> [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
>> [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
>> [   71.015398]  kasan_report.cold+0x83/0xdf
>> [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
>> [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
>> [   71.017033]  __bfq_deactivate_entity+0x21/0x290
>> [   71.017617]  bfq_pd_offline+0xc1/0x110
>> [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
> ...
> 
>> Here is the caller of  __bfq_deactivate_entity:
>> (gdb) list *(bfq_pd_offline+0xc1)
>> 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
>> 937                      * entities to the idle tree. It happens if, in some
>> 938                      * of the calls to bfq_bfqq_move() performed by
>> 939                      * bfq_reparent_active_queues(), the queue to move
>> is
>> 940                      * empty and gets expired.
>> 941                      */
>> 942                     bfq_flush_idle_tree(st);
>> 943             }
>> 944
>> 945             __bfq_deactivate_entity(entity, false);
> 
> So this is indeed strange. The group has in some of its idle service trees
> an entity whose entity->sched_data points to already freed cgroup. In
> particular it means the bfqq_entity_service_tree() leads to somewhere else
> than the 'st' we called bfq_flush_idle_tree() with. This looks like a
> different kind of problem AFAICT but still it needs fixing :). Can you
> please run your reproducer with my patches + the attached debug patch on
> top? Hopefully it should tell us more about the problematic entity and how
> it got there... Thanks!

Hi,

I'm not sure I understand what you mean... I reporduced again with your
debug patch applied, however, no extra messages are printed.

I think this is exactly the same problem we discussed before:

1) bfqq->new_bfqq is set, they are under g1
2) bfqq is moved to another group, and user thread of new_bfqq exit
3) g1 is offlied
3) io issued from bfqq will end up in new_bfqq, and the offlined
g1 will be inserted to st of g1's parent.

Currently such entity should be in active tree, however, I think it's
prossible that it can end up in idle tree throuph deactivation of
entity?

4) io is done, new_bfqq is deactivated
5) new_bfqq is freed, and then g1 is freed
6) the problem is triggered when g1's parent is offlined.

Thanks,
Kuai
> 
> 								Honza
>

Yu Kuai Jan. 11, 2022, 8:28 a.m. UTC | #5

在 2022/01/10 9:49, yukuai (C) 写道:
> 在 2022/01/07 22:58, Jan Kara 写道:
>> On Fri 07-01-22 17:15:43, yukuai (C) wrote:
>>> 在 2022/01/05 22:36, Jan Kara 写道:
>>>> Hello,
>>>>
>>>> here is the second version of my patches to fix use-after-free 
>>>> issues in BFQ
>>>> when processes with merged queues get moved to different cgroups. 
>>>> The patches
>>>> have survived some beating in my test VM but so far I fail to 
>>>> reproduce the
>>>> original KASAN reports so testing from people who can reproduce them 
>>>> is most
>>>> welcome. Thanks!
>>>>
>>>> Changes since v1:
>>>> * Added fix for bfq_put_cooperator()
>>>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>>>
>>>>                                 Honza
>>>> Previous versions:
>>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>>>> .
>>>>
>>>
>>> Hi,
>>>
>>> I repoduced the problem again with this patchset...
>>
>> Thanks for testing!
>>
>>> [   71.004788] BUG: KASAN: use-after-free in
>>> __bfq_deactivate_entity+0x21/0x290
>>> [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
>>> [   71.007723]
>>> [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
>>> 5.16.0-rc5-next-2021127
>>> [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>>> BIOS
>>> ?-20190727_073836-4
>>> [   71.012274] Call Trace:
>>> [   71.012603]  <TASK>
>>> [   71.012886]  dump_stack_lvl+0x34/0x44
>>> [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
>>> [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
>>> [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
>>> [   71.015398]  kasan_report.cold+0x83/0xdf
>>> [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
>>> [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
>>> [   71.017033]  __bfq_deactivate_entity+0x21/0x290
>>> [   71.017617]  bfq_pd_offline+0xc1/0x110
>>> [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
>> ...
>>
>>> Here is the caller of  __bfq_deactivate_entity:
>>> (gdb) list *(bfq_pd_offline+0xc1)
>>> 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
>>> 937                      * entities to the idle tree. It happens if, 
>>> in some
>>> 938                      * of the calls to bfq_bfqq_move() performed by
>>> 939                      * bfq_reparent_active_queues(), the queue to 
>>> move
>>> is
>>> 940                      * empty and gets expired.
>>> 941                      */
>>> 942                     bfq_flush_idle_tree(st);
>>> 943             }
>>> 944
>>> 945             __bfq_deactivate_entity(entity, false);
>>
>> So this is indeed strange. The group has in some of its idle service 
>> trees
>> an entity whose entity->sched_data points to already freed cgroup. In
>> particular it means the bfqq_entity_service_tree() leads to somewhere 
>> else
>> than the 'st' we called bfq_flush_idle_tree() with. This looks like a
>> different kind of problem AFAICT but still it needs fixing :). Can you
>> please run your reproducer with my patches + the attached debug patch on
>> top? Hopefully it should tell us more about the problematic entity and 
>> how
>> it got there... Thanks!
> 
> Hi,
> 
> I'm not sure I understand what you mean... I reporduced again with your
> debug patch applied, however, no extra messages are printed.
> 
> I think this is exactly the same problem we discussed before:
> 
> 1) bfqq->new_bfqq is set, they are under g1
> 2) bfqq is moved to another group, and user thread of new_bfqq exit
> 3) g1 is offlied
> 3) io issued from bfqq will end up in new_bfqq, and the offlined
> g1 will be inserted to st of g1's parent.
> 
> Currently such entity should be in active tree, however, I think it's
> prossible that it can end up in idle tree throuph deactivation of
> entity?
> 
> 4) io is done, new_bfqq is deactivated
> 5) new_bfqq is freed, and then g1 is freed
> 6) the problem is triggered when g1's parent is offlined.

Hi,

I applied the following modification and the problem can't be
repoduced anymore.

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 24a5c5329bcd..9b29af66635f 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -730,8 +730,19 @@ static struct bfq_group 
*__bfq_bic_change_cgroup(struct bfq_data *bfqd,

         if (sync_bfqq) {
                 entity = &sync_bfqq->entity;
-               if (entity->sched_data != &bfqg->sched_data)
+               if (entity->sched_data != &bfqg->sched_data) {
+                       if (sync_bfqq->new_bfqq) {
+                               bfq_put_queue(sync_bfqq->new_bfqq);
+                               sync_bfqq->new_bfqq = NULL;
+                       }
+
+                       if (bfq_bfqq_coop(sync_bfqq)) {
+                               bic->stably_merged = false;
+                               bfq_mark_bfqq_split_coop(sync_bfqq);
+                       }
+
                         bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+               }

I'm not sure if we can set sync_bfqq->new_bfqq = NULL here, however,
this can make sure the problem here really is what we discussed
before...
> 
> Thanks,
> Kuai
>>
>>                                 Honza
>>

Jan Kara Jan. 11, 2022, 12:36 p.m. UTC | #6

On Tue 11-01-22 16:28:35, yukuai (C) wrote:
> 在 2022/01/10 9:49, yukuai (C) 写道:
> > 在 2022/01/07 22:58, Jan Kara 写道:
> > > On Fri 07-01-22 17:15:43, yukuai (C) wrote:
> > > > 在 2022/01/05 22:36, Jan Kara 写道:
> > > > > Hello,
> > > > > 
> > > > > here is the second version of my patches to fix
> > > > > use-after-free issues in BFQ
> > > > > when processes with merged queues get moved to different
> > > > > cgroups. The patches
> > > > > have survived some beating in my test VM but so far I fail
> > > > > to reproduce the
> > > > > original KASAN reports so testing from people who can
> > > > > reproduce them is most
> > > > > welcome. Thanks!
> > > > > 
> > > > > Changes since v1:
> > > > > * Added fix for bfq_put_cooperator()
> > > > > * Added fix to handle move between cgroups in bfq_merge_bio()
> > > > > 
> > > > >                                 Honza
> > > > > Previous versions:
> > > > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > > > > .
> > > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > I repoduced the problem again with this patchset...
> > > 
> > > Thanks for testing!
> > > 
> > > > [   71.004788] BUG: KASAN: use-after-free in
> > > > __bfq_deactivate_entity+0x21/0x290
> > > > [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
> > > > [   71.007723]
> > > > [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
> > > > 5.16.0-rc5-next-2021127
> > > > [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > > > 1996), BIOS
> > > > ?-20190727_073836-4
> > > > [   71.012274] Call Trace:
> > > > [   71.012603]  <TASK>
> > > > [   71.012886]  dump_stack_lvl+0x34/0x44
> > > > [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
> > > > [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
> > > > [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
> > > > [   71.015398]  kasan_report.cold+0x83/0xdf
> > > > [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
> > > > [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
> > > > [   71.017033]  __bfq_deactivate_entity+0x21/0x290
> > > > [   71.017617]  bfq_pd_offline+0xc1/0x110
> > > > [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
> > > ...
> > > 
> > > > Here is the caller of  __bfq_deactivate_entity:
> > > > (gdb) list *(bfq_pd_offline+0xc1)
> > > > 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
> > > > 937                      * entities to the idle tree. It happens
> > > > if, in some
> > > > 938                      * of the calls to bfq_bfqq_move() performed by
> > > > 939                      * bfq_reparent_active_queues(), the
> > > > queue to move
> > > > is
> > > > 940                      * empty and gets expired.
> > > > 941                      */
> > > > 942                     bfq_flush_idle_tree(st);
> > > > 943             }
> > > > 944
> > > > 945             __bfq_deactivate_entity(entity, false);
> > > 
> > > So this is indeed strange. The group has in some of its idle service
> > > trees
> > > an entity whose entity->sched_data points to already freed cgroup. In
> > > particular it means the bfqq_entity_service_tree() leads to
> > > somewhere else
> > > than the 'st' we called bfq_flush_idle_tree() with. This looks like a
> > > different kind of problem AFAICT but still it needs fixing :). Can you
> > > please run your reproducer with my patches + the attached debug patch on
> > > top? Hopefully it should tell us more about the problematic entity
> > > and how
> > > it got there... Thanks!
> > 
> > Hi,
> > 
> > I'm not sure I understand what you mean... I reporduced again with your
> > debug patch applied, however, no extra messages are printed.
> > 
> > I think this is exactly the same problem we discussed before:
> > 
> > 1) bfqq->new_bfqq is set, they are under g1
> > 2) bfqq is moved to another group, and user thread of new_bfqq exit
> > 3) g1 is offlied
> > 3) io issued from bfqq will end up in new_bfqq, and the offlined
> > g1 will be inserted to st of g1's parent.
> > 
> > Currently such entity should be in active tree, however, I think it's
> > prossible that it can end up in idle tree throuph deactivation of
> > entity?
> > 
> > 4) io is done, new_bfqq is deactivated
> > 5) new_bfqq is freed, and then g1 is freed
> > 6) the problem is triggered when g1's parent is offlined.
> 
> Hi,
> 
> I applied the following modification and the problem can't be
> repoduced anymore.
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 24a5c5329bcd..9b29af66635f 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -730,8 +730,19 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> bfq_data *bfqd,
> 
>         if (sync_bfqq) {
>                 entity = &sync_bfqq->entity;
> -               if (entity->sched_data != &bfqg->sched_data)
> +               if (entity->sched_data != &bfqg->sched_data) {
> +                       if (sync_bfqq->new_bfqq) {
> +                               bfq_put_queue(sync_bfqq->new_bfqq);
> +                               sync_bfqq->new_bfqq = NULL;
> +                       }
> +
> +                       if (bfq_bfqq_coop(sync_bfqq)) {
> +                               bic->stably_merged = false;
> +                               bfq_mark_bfqq_split_coop(sync_bfqq);
> +                       }
> +
>                         bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> +               }
> 
> I'm not sure if we can set sync_bfqq->new_bfqq = NULL here, however,
> this can make sure the problem here really is what we discussed
> before...

OK, if this fixed the problem for you then I'm wondering why

  WARN_ON_ONCE(sync_bfqq->new_bfqq);

in my patch "bfq: Split shared queues on move between cgroups" didn't
trigger? Now I see how sync_bfqq->new_bfqq can indeed be != NULL here so I
agree with the change in principle (it has to use bfq_put_cooperator()
though) but I'm wondering why the WARN_ON didn't trigger. Or you just
didn't notice?

								Honza

Yu Kuai Jan. 11, 2022, 1:13 p.m. UTC | #7

在 2022/01/11 20:36, Jan Kara 写道:
> On Tue 11-01-22 16:28:35, yukuai (C) wrote:
>> 在 2022/01/10 9:49, yukuai (C) 写道:
>>> 在 2022/01/07 22:58, Jan Kara 写道:
>>>> On Fri 07-01-22 17:15:43, yukuai (C) wrote:
>>>>> 在 2022/01/05 22:36, Jan Kara 写道:
>>>>>> Hello,
>>>>>>
>>>>>> here is the second version of my patches to fix
>>>>>> use-after-free issues in BFQ
>>>>>> when processes with merged queues get moved to different
>>>>>> cgroups. The patches
>>>>>> have survived some beating in my test VM but so far I fail
>>>>>> to reproduce the
>>>>>> original KASAN reports so testing from people who can
>>>>>> reproduce them is most
>>>>>> welcome. Thanks!
>>>>>>
>>>>>> Changes since v1:
>>>>>> * Added fix for bfq_put_cooperator()
>>>>>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>>>>>
>>>>>>                                  Honza
>>>>>> Previous versions:
>>>>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>>>>>> .
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I repoduced the problem again with this patchset...
>>>>
>>>> Thanks for testing!
>>>>
>>>>> [   71.004788] BUG: KASAN: use-after-free in
>>>>> __bfq_deactivate_entity+0x21/0x290
>>>>> [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
>>>>> [   71.007723]
>>>>> [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
>>>>> 5.16.0-rc5-next-2021127
>>>>> [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX,
>>>>> 1996), BIOS
>>>>> ?-20190727_073836-4
>>>>> [   71.012274] Call Trace:
>>>>> [   71.012603]  <TASK>
>>>>> [   71.012886]  dump_stack_lvl+0x34/0x44
>>>>> [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
>>>>> [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
>>>>> [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
>>>>> [   71.015398]  kasan_report.cold+0x83/0xdf
>>>>> [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
>>>>> [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
>>>>> [   71.017033]  __bfq_deactivate_entity+0x21/0x290
>>>>> [   71.017617]  bfq_pd_offline+0xc1/0x110
>>>>> [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
>>>> ...
>>>>
>>>>> Here is the caller of  __bfq_deactivate_entity:
>>>>> (gdb) list *(bfq_pd_offline+0xc1)
>>>>> 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
>>>>> 937                      * entities to the idle tree. It happens
>>>>> if, in some
>>>>> 938                      * of the calls to bfq_bfqq_move() performed by
>>>>> 939                      * bfq_reparent_active_queues(), the
>>>>> queue to move
>>>>> is
>>>>> 940                      * empty and gets expired.
>>>>> 941                      */
>>>>> 942                     bfq_flush_idle_tree(st);
>>>>> 943             }
>>>>> 944
>>>>> 945             __bfq_deactivate_entity(entity, false);
>>>>
>>>> So this is indeed strange. The group has in some of its idle service
>>>> trees
>>>> an entity whose entity->sched_data points to already freed cgroup. In
>>>> particular it means the bfqq_entity_service_tree() leads to
>>>> somewhere else
>>>> than the 'st' we called bfq_flush_idle_tree() with. This looks like a
>>>> different kind of problem AFAICT but still it needs fixing :). Can you
>>>> please run your reproducer with my patches + the attached debug patch on
>>>> top? Hopefully it should tell us more about the problematic entity
>>>> and how
>>>> it got there... Thanks!
>>>
>>> Hi,
>>>
>>> I'm not sure I understand what you mean... I reporduced again with your
>>> debug patch applied, however, no extra messages are printed.
>>>
>>> I think this is exactly the same problem we discussed before:
>>>
>>> 1) bfqq->new_bfqq is set, they are under g1
>>> 2) bfqq is moved to another group, and user thread of new_bfqq exit
>>> 3) g1 is offlied
>>> 3) io issued from bfqq will end up in new_bfqq, and the offlined
>>> g1 will be inserted to st of g1's parent.
>>>
>>> Currently such entity should be in active tree, however, I think it's
>>> prossible that it can end up in idle tree throuph deactivation of
>>> entity?
>>>
>>> 4) io is done, new_bfqq is deactivated
>>> 5) new_bfqq is freed, and then g1 is freed
>>> 6) the problem is triggered when g1's parent is offlined.
>>
>> Hi,
>>
>> I applied the following modification and the problem can't be
>> repoduced anymore.
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 24a5c5329bcd..9b29af66635f 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -730,8 +730,19 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
>> bfq_data *bfqd,
>>
>>          if (sync_bfqq) {
>>                  entity = &sync_bfqq->entity;
>> -               if (entity->sched_data != &bfqg->sched_data)
>> +               if (entity->sched_data != &bfqg->sched_data) {
>> +                       if (sync_bfqq->new_bfqq) {
>> +                               bfq_put_queue(sync_bfqq->new_bfqq);
>> +                               sync_bfqq->new_bfqq = NULL;
>> +                       }
>> +
>> +                       if (bfq_bfqq_coop(sync_bfqq)) {
>> +                               bic->stably_merged = false;
>> +                               bfq_mark_bfqq_split_coop(sync_bfqq);
>> +                       }
>> +
>>                          bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>> +               }
>>
>> I'm not sure if we can set sync_bfqq->new_bfqq = NULL here, however,
>> this can make sure the problem here really is what we discussed
>> before...
> 
> OK, if this fixed the problem for you then I'm wondering why
> 
>    WARN_ON_ONCE(sync_bfqq->new_bfqq);
> 
> in my patch "bfq: Split shared queues on move between cgroups" didn't
> trigger? Now I see how sync_bfqq->new_bfqq can indeed be != NULL here so I
> agree with the change in principle (it has to use bfq_put_cooperator()
> though) but I'm wondering why the WARN_ON didn't trigger. Or you just
> didn't notice?
Hi,

When the repoducer start running, the WARN_ON() will be triggered
immediately, sorry that the missing log lead to misunderstanding.

Thanks,
Kuai
> 
> 								Honza
>

Jan Kara Jan. 11, 2022, 2:12 p.m. UTC | #8

On Mon 10-01-22 09:49:34, yukuai (C) wrote:
> 在 2022/01/07 22:58, Jan Kara 写道:
> > On Fri 07-01-22 17:15:43, yukuai (C) wrote:
> > > 在 2022/01/05 22:36, Jan Kara 写道:
> > > > Hello,
> > > > 
> > > > here is the second version of my patches to fix use-after-free issues in BFQ
> > > > when processes with merged queues get moved to different cgroups. The patches
> > > > have survived some beating in my test VM but so far I fail to reproduce the
> > > > original KASAN reports so testing from people who can reproduce them is most
> > > > welcome. Thanks!
> > > > 
> > > > Changes since v1:
> > > > * Added fix for bfq_put_cooperator()
> > > > * Added fix to handle move between cgroups in bfq_merge_bio()
> > > > 
> > > > 								Honza
> > > > Previous versions:
> > > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > > > .
> > > > 
> > > 
> > > Hi,
> > > 
> > > I repoduced the problem again with this patchset...
> > 
> > Thanks for testing!
> > 
> > > [   71.004788] BUG: KASAN: use-after-free in
> > > __bfq_deactivate_entity+0x21/0x290
> > > [   71.006328] Read of size 1 at addr ffff88817a3dc0b0 by task rmmod/801
> > > [   71.007723]
> > > [   71.008068] CPU: 7 PID: 801 Comm: rmmod Tainted: G        W
> > > 5.16.0-rc5-next-2021127
> > > [   71.009995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > ?-20190727_073836-4
> > > [   71.012274] Call Trace:
> > > [   71.012603]  <TASK>
> > > [   71.012886]  dump_stack_lvl+0x34/0x44
> > > [   71.013379]  print_address_description.constprop.0.cold+0xab/0x36b
> > > [   71.014182]  ? __bfq_deactivate_entity+0x21/0x290
> > > [   71.014795]  ? __bfq_deactivate_entity+0x21/0x290
> > > [   71.015398]  kasan_report.cold+0x83/0xdf
> > > [   71.015904]  ? _raw_read_lock_bh+0x20/0x40
> > > [   71.016433]  ? __bfq_deactivate_entity+0x21/0x290
> > > [   71.017033]  __bfq_deactivate_entity+0x21/0x290
> > > [   71.017617]  bfq_pd_offline+0xc1/0x110
> > > [   71.018105]  blkcg_deactivate_policy+0x14b/0x210
> > ...
> > 
> > > Here is the caller of  __bfq_deactivate_entity:
> > > (gdb) list *(bfq_pd_offline+0xc1)
> > > 0xffffffff81c504f1 is in bfq_pd_offline (block/bfq-cgroup.c:942).
> > > 937                      * entities to the idle tree. It happens if, in some
> > > 938                      * of the calls to bfq_bfqq_move() performed by
> > > 939                      * bfq_reparent_active_queues(), the queue to move
> > > is
> > > 940                      * empty and gets expired.
> > > 941                      */
> > > 942                     bfq_flush_idle_tree(st);
> > > 943             }
> > > 944
> > > 945             __bfq_deactivate_entity(entity, false);
> > 
> > So this is indeed strange. The group has in some of its idle service trees
> > an entity whose entity->sched_data points to already freed cgroup. In
> > particular it means the bfqq_entity_service_tree() leads to somewhere else
> > than the 'st' we called bfq_flush_idle_tree() with. This looks like a
> > different kind of problem AFAICT but still it needs fixing :). Can you
> > please run your reproducer with my patches + the attached debug patch on
> > top? Hopefully it should tell us more about the problematic entity and how
> > it got there... Thanks!
> 
> Hi,
> 
> I'm not sure I understand what you mean... I reporduced again with your
> debug patch applied, however, no extra messages are printed.
> 
> I think this is exactly the same problem we discussed before:
> 
> 1) bfqq->new_bfqq is set, they are under g1
> 2) bfqq is moved to another group, and user thread of new_bfqq exit
> 3) g1 is offlied
> 3) io issued from bfqq will end up in new_bfqq, and the offlined
> g1 will be inserted to st of g1's parent.

Hmm, you are right. I was confused by the fact that bfq_setup_merge() is
always immediately (under big bfq lock) followed by bfq_merge_bfqqs() but
that redirects BIC of just one process pointing to the new bfqq. So the
following is a bit more detailed and graphical version of your scenario for
future reference :):

Initial state, bfqq2 is shared by Process 2 & Process 3:

Process 1 (blkcg1)	Process 2 (blkcg1)	Process 3 (blkcg1)
 (BIC)			 (BIC)			 (BIC)
   |			   |			   |
   |			   |			  /
   v			   v			 /
 bfqq1			bfqq2<-------------------
   \			  /
    \			 /
     \			/
      ----> BFQ group1<-

Now while processing request for Process 2 we decide to merge bfqq2 to
bfqq1. Situation after the merge:

Process 1 (blkcg1)	Process 2 (blkcg1)	Process 3 (blkcg1)
 (BIC)			 (BIC)			 (BIC)
   |			   /			   |
   |/---------------------/   			  /
   vv		new_bfqq   			 /
 bfqq1<-----------------bfqq2<-------------------
   \			  /
    \			 /
     \			/
      ----> BFQ group1<-

Processes 1 and 2 exit:
					Process 3 (blkcg1)
					 (BIC)
					   |
					  /
		new_bfqq		 /
 bfqq1<-----------------bfqq2<-----------
   \			  /
    \			 /
     \			/
      ----> BFQ group1<-

Process 3 is moved to blkcg2 and submits IO, blkcg1 is offlined.
bfq_bic_update_cgroup() will change the picture to:

					Process 3 (blkcg2)
					 (BIC)
					   |
					  /
		new_bfqq		 /
 bfqq1<-----------------bfqq2<-----------
   |			  |
   |			  |
   v 			  v
BFQ group1		BFQ group2

and following bfq_merge_bfqqs() when submitting the request will further
modify the picture to:
					Process 3 (blkcg2)
					 (BIC)
					   |
     /-------------------------------------/
     v		new_bfqq
 bfqq1<-----------------bfqq2
   |			  |
   |			  |
   v 			  v
BFQ group1		BFQ group2

and boom, we queue again offlined BFQ group1.

								Honza

[0/5,v2] bfq: Avoid use-after-free when moving processes between cgroups

Message

Comments