diff mbox series

[4/4] block, bfq: update pos_root for idle bfq_queue in bfq_bfqq_move()

Message ID 20211221032135.878550-5-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series block, bfq: minor cleanup and fix | expand

Commit Message

Yu Kuai Dec. 21, 2021, 3:21 a.m. UTC
During code review, we found that if bfqq is not busy in
bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
thus bfqq->pos_root still points to the old bfqg. However, the ref
that bfqq hold for the old bfqg will be released, so it's possible
that the old bfqg can be freed. This is problematic because the freed
bfqg can still be accessed by bfqq->pos_root.

Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
as well.

Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-cgroup.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Jan Kara Dec. 21, 2021, 11:50 a.m. UTC | #1
On Tue 21-12-21 11:21:35, Yu Kuai wrote:
> During code review, we found that if bfqq is not busy in
> bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
> thus bfqq->pos_root still points to the old bfqg. However, the ref
> that bfqq hold for the old bfqg will be released, so it's possible
> that the old bfqg can be freed. This is problematic because the freed
> bfqg can still be accessed by bfqq->pos_root.
> 
> Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
> as well.
> 
> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
pos_tree? Because bfq_remove_request() takes care to remove bfqq from the
pos_tree...

								Honza

> ---
>  block/bfq-cgroup.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 8e8cf6b3d946..822dd28ecf53 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -677,7 +677,6 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>  		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
>  	else if (entity->on_st_or_in_serv)
>  		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
> -	bfqg_and_blkg_put(old_parent);
>  
>  	if (entity->parent &&
>  	    entity->parent->last_bfqq_created == bfqq)
> @@ -690,11 +689,16 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>  	/* pin down bfqg and its associated blkg  */
>  	bfqg_and_blkg_get(bfqg);
>  
> -	if (bfq_bfqq_busy(bfqq)) {
> -		if (unlikely(!bfqd->nonrot_with_queueing))
> -			bfq_pos_tree_add_move(bfqd, bfqq);
> +	/*
> +	 * Don't leave the pos_root to old bfqg, since the ref to old bfqg will
> +	 * be released and the bfqg might be freed.
> +	 */
> +	if (unlikely(!bfqd->nonrot_with_queueing))
> +		bfq_pos_tree_add_move(bfqd, bfqq);
> +	bfqg_and_blkg_put(old_parent);
> +
> +	if (bfq_bfqq_busy(bfqq))
>  		bfq_activate_bfqq(bfqd, bfqq);
> -	}
>  
>  	if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
>  		bfq_schedule_dispatch(bfqd);
> -- 
> 2.31.1
>
Yu Kuai Dec. 22, 2021, 3:12 a.m. UTC | #2
在 2021/12/21 19:50, Jan Kara 写道:
> On Tue 21-12-21 11:21:35, Yu Kuai wrote:
>> During code review, we found that if bfqq is not busy in
>> bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
>> thus bfqq->pos_root still points to the old bfqg. However, the ref
>> that bfqq hold for the old bfqg will be released, so it's possible
>> that the old bfqg can be freed. This is problematic because the freed
>> bfqg can still be accessed by bfqq->pos_root.
>>
>> Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
>> as well.
>>
>> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
> pos_tree? Because bfq_remove_request() takes care to remove bfqq from the
> pos_tree...

Hi,

It's right this is not a problem in common case. The problem seems to
relate to queue merging and task migration. Because I once reporduced
it with the same reporducer for the problem that offlined bfqg can be
inserted into service tree. The uaf is exactly in
bfq_remove_request->rb_rease(). However I didn't save the stack...

I guess this is because bfq_del_bfqq_busy() is called from
bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
freed, thus such bfqq is not in service tree, and it's pos_root can
point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.

I haven't confirmed this, however, this patch itself only cleared
bfqq->pos_root for idle bfqq, there should be no harm.

Thanks,
Kuai
> 
> 								Honza
> 
>> ---
>>   block/bfq-cgroup.c | 14 +++++++++-----
>>   1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 8e8cf6b3d946..822dd28ecf53 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -677,7 +677,6 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>   		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
>>   	else if (entity->on_st_or_in_serv)
>>   		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
>> -	bfqg_and_blkg_put(old_parent);
>>   
>>   	if (entity->parent &&
>>   	    entity->parent->last_bfqq_created == bfqq)
>> @@ -690,11 +689,16 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>   	/* pin down bfqg and its associated blkg  */
>>   	bfqg_and_blkg_get(bfqg);
>>   
>> -	if (bfq_bfqq_busy(bfqq)) {
>> -		if (unlikely(!bfqd->nonrot_with_queueing))
>> -			bfq_pos_tree_add_move(bfqd, bfqq);
>> +	/*
>> +	 * Don't leave the pos_root to old bfqg, since the ref to old bfqg will
>> +	 * be released and the bfqg might be freed.
>> +	 */
>> +	if (unlikely(!bfqd->nonrot_with_queueing))
>> +		bfq_pos_tree_add_move(bfqd, bfqq);
>> +	bfqg_and_blkg_put(old_parent);
>> +
>> +	if (bfq_bfqq_busy(bfqq))
>>   		bfq_activate_bfqq(bfqd, bfqq);
>> -	}
>>   
>>   	if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
>>   		bfq_schedule_dispatch(bfqd);
>> -- 
>> 2.31.1
>>
Jan Kara Dec. 22, 2021, 2:17 p.m. UTC | #3
On Wed 22-12-21 11:12:45, yukuai (C) wrote:
> 在 2021/12/21 19:50, Jan Kara 写道:
> > On Tue 21-12-21 11:21:35, Yu Kuai wrote:
> > > During code review, we found that if bfqq is not busy in
> > > bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
> > > thus bfqq->pos_root still points to the old bfqg. However, the ref
> > > that bfqq hold for the old bfqg will be released, so it's possible
> > > that the old bfqg can be freed. This is problematic because the freed
> > > bfqg can still be accessed by bfqq->pos_root.
> > > 
> > > Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
> > > as well.
> > > 
> > > Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
> > > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > 
> > I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
> > pos_tree? Because bfq_remove_request() takes care to remove bfqq from the
> > pos_tree...
> 
> Hi,
> 
> It's right this is not a problem in common case. The problem seems to
> relate to queue merging and task migration. Because I once reporduced
> it with the same reporducer for the problem that offlined bfqg can be
> inserted into service tree. The uaf is exactly in
> bfq_remove_request->rb_rease(). However I didn't save the stack...
> 
> I guess this is because bfq_del_bfqq_busy() is called from
> bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
> freed, thus such bfqq is not in service tree, and it's pos_root can
> point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.
> 
> I haven't confirmed this, however, this patch itself only cleared
> bfqq->pos_root for idle bfqq, there should be no harm.

Well, I agree this patch does no harm but in my opinion it is just papering
over the real problem which is that we leave bfqq without any request in
the pos_tree which can have also other unexpected consequences. I don't
think your scenario with bfq_release_process_ref() calling
bfq_del_bfqq_busy() really answers my question because we call
bfq_del_bfqq_busy() only if RB_EMPTY_ROOT(&bfqq->sort_list) (i.e., bfqq has
no requests) and when sort_list was becoming empty, bfq_remove_request()
should have removed bfqq from the pos_tree. So I think proper fix lies
elsewhere and I would not merge this patch until we better understand what
is happening in this case.

								Honza
Yu Kuai Dec. 23, 2021, 1:03 a.m. UTC | #4
在 2021/12/22 22:17, Jan Kara 写道:
> On Wed 22-12-21 11:12:45, yukuai (C) wrote:
>> 在 2021/12/21 19:50, Jan Kara 写道:
>>> On Tue 21-12-21 11:21:35, Yu Kuai wrote:
>>>> During code review, we found that if bfqq is not busy in
>>>> bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
>>>> thus bfqq->pos_root still points to the old bfqg. However, the ref
>>>> that bfqq hold for the old bfqg will be released, so it's possible
>>>> that the old bfqg can be freed. This is problematic because the freed
>>>> bfqg can still be accessed by bfqq->pos_root.
>>>>
>>>> Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
>>>> as well.
>>>>
>>>> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>
>>> I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
>>> pos_tree? Because bfq_remove_request() takes care to remove bfqq from the
>>> pos_tree...
>>
>> Hi,
>>
>> It's right this is not a problem in common case. The problem seems to
>> relate to queue merging and task migration. Because I once reporduced
>> it with the same reporducer for the problem that offlined bfqg can be
>> inserted into service tree. The uaf is exactly in
>> bfq_remove_request->rb_rease(). However I didn't save the stack...
>>
>> I guess this is because bfq_del_bfqq_busy() is called from
>> bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
>> freed, thus such bfqq is not in service tree, and it's pos_root can
>> point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.
>>
>> I haven't confirmed this, however, this patch itself only cleared
>> bfqq->pos_root for idle bfqq, there should be no harm.
> 
> Well, I agree this patch does no harm but in my opinion it is just papering
> over the real problem which is that we leave bfqq without any request in
> the pos_tree which can have also other unexpected consequences. I don't
> think your scenario with bfq_release_process_ref() calling
> bfq_del_bfqq_busy() really answers my question because we call
> bfq_del_bfqq_busy() only if RB_EMPTY_ROOT(&bfqq->sort_list) (i.e., bfqq has
> no requests) and when sort_list was becoming empty, bfq_remove_request()
> should have removed bfqq from the pos_tree. So I think proper fix lies
> elsewhere and I would not merge this patch until we better understand what
> is happening in this case.

Hi,

I'll try to reporduce the UAF, and take a look at it.

Thanks,
Kuai
> 
> 								Honza
>
Yu Kuai Dec. 25, 2021, 1:19 a.m. UTC | #5
在 2021/12/22 22:17, Jan Kara 写道:
> On Wed 22-12-21 11:12:45, yukuai (C) wrote:
>> 在 2021/12/21 19:50, Jan Kara 写道:
>>> On Tue 21-12-21 11:21:35, Yu Kuai wrote:
>>>> During code review, we found that if bfqq is not busy in
>>>> bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
>>>> thus bfqq->pos_root still points to the old bfqg. However, the ref
>>>> that bfqq hold for the old bfqg will be released, so it's possible
>>>> that the old bfqg can be freed. This is problematic because the freed
>>>> bfqg can still be accessed by bfqq->pos_root.
>>>>
>>>> Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
>>>> as well.
>>>>
>>>> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>
>>> I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
>>> pos_tree? Because bfq_remove_request() takes care to remove bfqq from the
>>> pos_tree...
>>
>> Hi,
>>
>> It's right this is not a problem in common case. The problem seems to
>> relate to queue merging and task migration. Because I once reporduced
>> it with the same reporducer for the problem that offlined bfqg can be
>> inserted into service tree. The uaf is exactly in
>> bfq_remove_request->rb_rease(). However I didn't save the stack...
>>
>> I guess this is because bfq_del_bfqq_busy() is called from
>> bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
>> freed, thus such bfqq is not in service tree, and it's pos_root can
>> point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.
>>
>> I haven't confirmed this, however, this patch itself only cleared
>> bfqq->pos_root for idle bfqq, there should be no harm.
> 
> Well, I agree this patch does no harm but in my opinion it is just papering
> over the real problem which is that we leave bfqq without any request in
> the pos_tree which can have also other unexpected consequences. I don't
> think your scenario with bfq_release_process_ref() calling
> bfq_del_bfqq_busy() really answers my question because we call
> bfq_del_bfqq_busy() only if RB_EMPTY_ROOT(&bfqq->sort_list) (i.e., bfqq has
> no requests) and when sort_list was becoming empty, bfq_remove_request()
> should have removed bfqq from the pos_tree. So I think proper fix lies
> elsewhere and I would not merge this patch until we better understand what
> is happening in this case.
> 

Hi,

I reporduced this problem on v4.19, here is the stack:

[34094.992162] 
==================================================================
[34094.993121] BUG: KASAN: use-after-free in rb_erase+0x4e0/0x8c0
[34094.993121] Write of size 8 at addr ffff888126528258 by task 
kworker/3:1H/554
[34094.993121]
[34094.993121] CPU: 3 PID: 554 Comm: kworker/3:1H Not tainted 4.19.90+ #2
[34094.993121] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-4
[34094.993121] Workqueue: kblockd blk_mq_run_work_fn
[34094.993121] Call Trace:
[34094.993121]  dump_stack+0x76/0xa0
[34094.993121]  print_address_description+0x6c/0x237
[34094.993121]  ? rb_erase+0x4e0/0x8c0
[34094.993121]  kasan_report.cold+0x88/0x2a0
[34094.993121]  rb_erase+0x4e0/0x8c0
[34094.993121]  bfq_remove_request+0x239/0x4c0
[34094.993121]  bfq_dispatch_request+0x658/0x17b0
[34094.993121]  blk_mq_do_dispatch_sched+0x183/0x220
[34094.993121]  ? blk_mq_sched_free_hctx_data+0xe0/0xe0
[34094.993121]  ? __switch_to+0x3b2/0x6c0
[34094.993121]  blk_mq_sched_dispatch_requests+0x2ac/0x310
[34094.993121]  ? finish_task_switch+0xa4/0x370
[34094.993121]  ? dequeue_task_fair+0x216/0x360
[34094.993121]  ? blk_mq_sched_restart+0x40/0x40
[34094.993121]  ? __schedule+0x588/0xc10
[34094.993121]  __blk_mq_run_hw_queue+0x82/0x140
[34094.993121]  process_one_work+0x39d/0x770
[34094.993121]  worker_thread+0x78/0x5c0
[34094.993121]  ? process_one_work+0x770/0x770
[34094.993121]  kthread+0x1af/0x1d0
[34094.993121]  ? kthread_create_worker_on_cpu+0xd0/0xd0
[34094.993121]  ret_from_fork+0x1f/0x30
[34094.993121]
[34094.993121] Allocated by task 19184:
[34094.993121]  kasan_kmalloc+0xc2/0xe0
[34094.993121]  kmem_cache_alloc_node_trace+0xf9/0x220
[34094.993121]  bfq_pd_alloc+0x4c/0x510
[34094.993121]  blkg_alloc+0x237/0x310
[34094.993121]  blkg_create+0x499/0x5f0
[34094.993121]  blkg_lookup_create+0x140/0x1b0
[34094.993121]  generic_make_request_checks+0x5ce/0xad0
[34094.993121]  generic_make_request+0xd9/0x6b0
[34094.993121]  submit_bio+0xa6/0x240
[34094.993121]  mpage_readpages+0x29e/0x3b0
[34094.993121]  read_pages+0xdf/0x3a0
[34094.993121]  __do_page_cache_readahead+0x278/0x290
[34094.993121]  ondemand_readahead+0x275/0x460
[34094.993121]  generic_file_read_iter+0xc4a/0x1790
[34094.993121]  blkdev_read_iter+0x8c/0xc0
[34094.993121]  aio_read+0x174/0x260
[34094.993121]  io_submit_one+0x7d3/0x14b0
[34094.993121]  __x64_sys_io_submit+0xfe/0x230
[34094.993121]  do_syscall_64+0x6f/0x280
[34094.993121]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[34094.993121]
[34094.993121] Freed by task 9:
[34094.993121]  __kasan_slab_free+0x12f/0x180
[34094.993121]  kfree+0x92/0x1b0
[34094.993121]  blkg_free.part.0+0x4a/0xe0
[34094.993121]  rcu_process_callbacks+0x420/0x6c0
[34094.993121]  __do_softirq+0x109/0x36c
[34094.993121]
[34094.993121] The buggy address belongs to the object at ffff888126528000
[34094.993121]  which belongs to the cache kmalloc-2048 of size 2048
[34094.993121] The buggy address is located 600 bytes inside of
[34094.993121]  2048-byte region [ffff888126528000, ffff888126528800)
[34094.993121] The buggy address belongs to the page:
[34094.993121] page:ffffea0004994a00 count:1 mapcount:0 
mapping:ffff88810000e800 index:0xffff0
[34094.993121] flags: 0x17ffffc0008100(slab|head)
[34094.993121] raw: 0017ffffc0008100 dead000000000100 dead000000000200 
ffff88810000e800
[34094.993121] raw: ffff88812652c400 00000000800f0009 00000001ffffffff 
0000000000000000
[34094.993121] page dumped because: kasan: bad access detected
[34094.993121]
[34094.993121] Memory state around the buggy address:
[34094.993121]  ffff888126528100: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[34094.993121]  ffff888126528180: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[34094.993121] >ffff888126528200: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[34094.993121]                                                     ^
[34094.993121]  ffff888126528280: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[34094.993121]  ffff888126528300: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[34094.993121] 
==================================================================

I'll try to figure out the root cause, in the meantime, feel free to
kick around if you have any througts.

Thansk,
Kuai
> 								Honza
>
Yu Kuai Dec. 25, 2021, 7:44 a.m. UTC | #6
在 2021/12/25 9:19, yukuai (C) 写道:
> 在 2021/12/22 22:17, Jan Kara 写道:
>> On Wed 22-12-21 11:12:45, yukuai (C) wrote:
>>> 在 2021/12/21 19:50, Jan Kara 写道:
>>>> On Tue 21-12-21 11:21:35, Yu Kuai wrote:
>>>>> During code review, we found that if bfqq is not busy in
>>>>> bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
>>>>> thus bfqq->pos_root still points to the old bfqg. However, the ref
>>>>> that bfqq hold for the old bfqg will be released, so it's possible
>>>>> that the old bfqg can be freed. This is problematic because the freed
>>>>> bfqg can still be accessed by bfqq->pos_root.
>>>>>
>>>>> Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
>>>>> as well.
>>>>>
>>>>> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling 
>>>>> and cgroups support")
>>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
>>>> pos_tree? Because bfq_remove_request() takes care to remove bfqq 
>>>> from the
>>>> pos_tree...
>>>
>>> Hi,
>>>
>>> It's right this is not a problem in common case. The problem seems to
>>> relate to queue merging and task migration. Because I once reporduced
>>> it with the same reporducer for the problem that offlined bfqg can be
>>> inserted into service tree. The uaf is exactly in
>>> bfq_remove_request->rb_rease(). However I didn't save the stack...
>>>
>>> I guess this is because bfq_del_bfqq_busy() is called from
>>> bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
>>> freed, thus such bfqq is not in service tree, and it's pos_root can
>>> point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.
>>>
>>> I haven't confirmed this, however, this patch itself only cleared
>>> bfqq->pos_root for idle bfqq, there should be no harm.
>>
>> Well, I agree this patch does no harm but in my opinion it is just 
>> papering
>> over the real problem which is that we leave bfqq without any request in
>> the pos_tree which can have also other unexpected consequences. I don't
>> think your scenario with bfq_release_process_ref() calling
>> bfq_del_bfqq_busy() really answers my question because we call
>> bfq_del_bfqq_busy() only if RB_EMPTY_ROOT(&bfqq->sort_list) (i.e., 
>> bfqq has
>> no requests) and when sort_list was becoming empty, bfq_remove_request()
>> should have removed bfqq from the pos_tree. So I think proper fix lies
>> elsewhere and I would not merge this patch until we better understand 
>> what
>> is happening in this case.
>>
> 
> Hi,
> 
> I reporduced this problem on v4.19, here is the stack:
> 
> [34094.992162] 
> ==================================================================
> [34094.993121] BUG: KASAN: use-after-free in rb_erase+0x4e0/0x8c0
> [34094.993121] Write of size 8 at addr ffff888126528258 by task 
> kworker/3:1H/554
> [34094.993121]
> [34094.993121] CPU: 3 PID: 554 Comm: kworker/3:1H Not tainted 4.19.90+ #2
> [34094.993121] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS ?-20190727_073836-4
> [34094.993121] Workqueue: kblockd blk_mq_run_work_fn
> [34094.993121] Call Trace:
> [34094.993121]  dump_stack+0x76/0xa0
> [34094.993121]  print_address_description+0x6c/0x237
> [34094.993121]  ? rb_erase+0x4e0/0x8c0
> [34094.993121]  kasan_report.cold+0x88/0x2a0
> [34094.993121]  rb_erase+0x4e0/0x8c0
> [34094.993121]  bfq_remove_request+0x239/0x4c0
> [34094.993121]  bfq_dispatch_request+0x658/0x17b0
> [34094.993121]  blk_mq_do_dispatch_sched+0x183/0x220
> [34094.993121]  ? blk_mq_sched_free_hctx_data+0xe0/0xe0
> [34094.993121]  ? __switch_to+0x3b2/0x6c0
> [34094.993121]  blk_mq_sched_dispatch_requests+0x2ac/0x310
> [34094.993121]  ? finish_task_switch+0xa4/0x370
> [34094.993121]  ? dequeue_task_fair+0x216/0x360
> [34094.993121]  ? blk_mq_sched_restart+0x40/0x40
> [34094.993121]  ? __schedule+0x588/0xc10
> [34094.993121]  __blk_mq_run_hw_queue+0x82/0x140
> [34094.993121]  process_one_work+0x39d/0x770
> [34094.993121]  worker_thread+0x78/0x5c0
> [34094.993121]  ? process_one_work+0x770/0x770
> [34094.993121]  kthread+0x1af/0x1d0
> [34094.993121]  ? kthread_create_worker_on_cpu+0xd0/0xd0
> [34094.993121]  ret_from_fork+0x1f/0x30
> [34094.993121]
> [34094.993121] Allocated by task 19184:
> [34094.993121]  kasan_kmalloc+0xc2/0xe0
> [34094.993121]  kmem_cache_alloc_node_trace+0xf9/0x220
> [34094.993121]  bfq_pd_alloc+0x4c/0x510
> [34094.993121]  blkg_alloc+0x237/0x310
> [34094.993121]  blkg_create+0x499/0x5f0
> [34094.993121]  blkg_lookup_create+0x140/0x1b0
> [34094.993121]  generic_make_request_checks+0x5ce/0xad0
> [34094.993121]  generic_make_request+0xd9/0x6b0
> [34094.993121]  submit_bio+0xa6/0x240
> [34094.993121]  mpage_readpages+0x29e/0x3b0
> [34094.993121]  read_pages+0xdf/0x3a0
> [34094.993121]  __do_page_cache_readahead+0x278/0x290
> [34094.993121]  ondemand_readahead+0x275/0x460
> [34094.993121]  generic_file_read_iter+0xc4a/0x1790
> [34094.993121]  blkdev_read_iter+0x8c/0xc0
> [34094.993121]  aio_read+0x174/0x260
> [34094.993121]  io_submit_one+0x7d3/0x14b0
> [34094.993121]  __x64_sys_io_submit+0xfe/0x230
> [34094.993121]  do_syscall_64+0x6f/0x280
> [34094.993121]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [34094.993121]
> [34094.993121] Freed by task 9:
> [34094.993121]  __kasan_slab_free+0x12f/0x180
> [34094.993121]  kfree+0x92/0x1b0
> [34094.993121]  blkg_free.part.0+0x4a/0xe0
> [34094.993121]  rcu_process_callbacks+0x420/0x6c0
> [34094.993121]  __do_softirq+0x109/0x36c
> [34094.993121]
> [34094.993121] The buggy address belongs to the object at ffff888126528000
> [34094.993121]  which belongs to the cache kmalloc-2048 of size 2048
> [34094.993121] The buggy address is located 600 bytes inside of
> [34094.993121]  2048-byte region [ffff888126528000, ffff888126528800)
> [34094.993121] The buggy address belongs to the page:
> [34094.993121] page:ffffea0004994a00 count:1 mapcount:0 
> mapping:ffff88810000e800 index:0xffff0
> [34094.993121] flags: 0x17ffffc0008100(slab|head)
> [34094.993121] raw: 0017ffffc0008100 dead000000000100 dead000000000200 
> ffff88810000e800
> [34094.993121] raw: ffff88812652c400 00000000800f0009 00000001ffffffff 
> 0000000000000000
> [34094.993121] page dumped because: kasan: bad access detected
> [34094.993121]
> [34094.993121] Memory state around the buggy address:
> [34094.993121]  ffff888126528100: fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb fb
> [34094.993121]  ffff888126528180: fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb fb
> [34094.993121] >ffff888126528200: fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb fb
> [34094.993121]                                                     ^
> [34094.993121]  ffff888126528280: fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb fb
> [34094.993121]  ffff888126528300: fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb fb
> [34094.993121] 
> ==================================================================
> 
> I'll try to figure out the root cause, in the meantime, feel free to
> kick around if you have any througts.
> 
> Thansk,
> Kuai
>>                                 Honza
>>

Hi,

I finally figure out the root cause... This is introduced by a temporary
fix of the problem that offlined bfqg is reinserted into service tree.

The temporary fix is as follow:

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 24a5c5329bcd..ee1933cd9a43 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -935,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)

  put_async_queues:
         bfq_put_async_queues(bfqd, bfqg);
+       pd->plid = BLKCG_MAX_POLS;

         spin_unlock_irqrestore(&bfqd->lock, flags);
         /*
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index b74cc0da118e..fa2062244805 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -1692,6 +1692,15 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, 
struct bfq_queue *bfqq,
   */
  void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
  {
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+       /* If parent group is offlined, move the bfqq to root group */
+       if (bfqq->entity.parent) {
+               struct bfq_group *bfqg = bfq_bfqq_to_bfqg(bfqq);
+
+               if (bfqg->pd.plid >= BLKCG_MAX_POLS)
+                       bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
+       }
+#endif
         bfq_log_bfqq(bfqd, bfqq, "add to busy");

I add bfq_bfqq_move() here before bfq_mark_bfqq_busy(), which cause
the problem...

Thanks,
Kuai
Jan Kara Jan. 3, 2022, 3:37 p.m. UTC | #7
On Sat 25-12-21 15:44:54, yukuai (C) wrote:
> 在 2021/12/25 9:19, yukuai (C) 写道:
> > 在 2021/12/22 22:17, Jan Kara 写道:
> > > On Wed 22-12-21 11:12:45, yukuai (C) wrote:
> > > > 在 2021/12/21 19:50, Jan Kara 写道:
> > > > > On Tue 21-12-21 11:21:35, Yu Kuai wrote:
> > > > > > During code review, we found that if bfqq is not busy in
> > > > > > bfq_bfqq_move(), bfq_pos_tree_add_move() won't be called for the bfqq,
> > > > > > thus bfqq->pos_root still points to the old bfqg. However, the ref
> > > > > > that bfqq hold for the old bfqg will be released, so it's possible
> > > > > > that the old bfqg can be freed. This is problematic because the freed
> > > > > > bfqg can still be accessed by bfqq->pos_root.
> > > > > > 
> > > > > > Fix the problem by calling bfq_pos_tree_add_move() for idle bfqq
> > > > > > as well.
> > > > > > 
> > > > > > Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical
> > > > > > scheduling and cgroups support")
> > > > > > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > > > > 
> > > > > I'm just wondering, how can it happen that !bfq_bfqq_busy() queue is in
> > > > > pos_tree? Because bfq_remove_request() takes care to remove
> > > > > bfqq from the
> > > > > pos_tree...
> > > > 
> > > > Hi,
> > > > 
> > > > It's right this is not a problem in common case. The problem seems to
> > > > relate to queue merging and task migration. Because I once reporduced
> > > > it with the same reporducer for the problem that offlined bfqg can be
> > > > inserted into service tree. The uaf is exactly in
> > > > bfq_remove_request->rb_rease(). However I didn't save the stack...
> > > > 
> > > > I guess this is because bfq_del_bfqq_busy() is called from
> > > > bfq_release_process_ref(), and queue merging prevert sunch bfqq to be
> > > > freed, thus such bfqq is not in service tree, and it's pos_root can
> > > > point to the old bfqg after bfq_bic_update_cgroup->bfq_bfqq_move.
> > > > 
> > > > I haven't confirmed this, however, this patch itself only cleared
> > > > bfqq->pos_root for idle bfqq, there should be no harm.
> > > 
> > > Well, I agree this patch does no harm but in my opinion it is just
> > > papering
> > > over the real problem which is that we leave bfqq without any request in
> > > the pos_tree which can have also other unexpected consequences. I don't
> > > think your scenario with bfq_release_process_ref() calling
> > > bfq_del_bfqq_busy() really answers my question because we call
> > > bfq_del_bfqq_busy() only if RB_EMPTY_ROOT(&bfqq->sort_list) (i.e.,
> > > bfqq has
> > > no requests) and when sort_list was becoming empty, bfq_remove_request()
> > > should have removed bfqq from the pos_tree. So I think proper fix lies
> > > elsewhere and I would not merge this patch until we better
> > > understand what
> > > is happening in this case.
> > > 
> > 
> > Hi,
> > 
> > I reporduced this problem on v4.19, here is the stack:
> > 
> > [34094.992162]
> > ==================================================================
> > [34094.993121] BUG: KASAN: use-after-free in rb_erase+0x4e0/0x8c0
> > [34094.993121] Write of size 8 at addr ffff888126528258 by task
> > kworker/3:1H/554
> > [34094.993121]
> > [34094.993121] CPU: 3 PID: 554 Comm: kworker/3:1H Not tainted 4.19.90+ #2
> > [34094.993121] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS ?-20190727_073836-4
> > [34094.993121] Workqueue: kblockd blk_mq_run_work_fn
> > [34094.993121] Call Trace:
> > [34094.993121]  dump_stack+0x76/0xa0
> > [34094.993121]  print_address_description+0x6c/0x237
> > [34094.993121]  ? rb_erase+0x4e0/0x8c0
> > [34094.993121]  kasan_report.cold+0x88/0x2a0
> > [34094.993121]  rb_erase+0x4e0/0x8c0
> > [34094.993121]  bfq_remove_request+0x239/0x4c0
> > [34094.993121]  bfq_dispatch_request+0x658/0x17b0
> > [34094.993121]  blk_mq_do_dispatch_sched+0x183/0x220
> > [34094.993121]  ? blk_mq_sched_free_hctx_data+0xe0/0xe0
> > [34094.993121]  ? __switch_to+0x3b2/0x6c0
> > [34094.993121]  blk_mq_sched_dispatch_requests+0x2ac/0x310
> > [34094.993121]  ? finish_task_switch+0xa4/0x370
> > [34094.993121]  ? dequeue_task_fair+0x216/0x360
> > [34094.993121]  ? blk_mq_sched_restart+0x40/0x40
> > [34094.993121]  ? __schedule+0x588/0xc10
> > [34094.993121]  __blk_mq_run_hw_queue+0x82/0x140
> > [34094.993121]  process_one_work+0x39d/0x770
> > [34094.993121]  worker_thread+0x78/0x5c0
> > [34094.993121]  ? process_one_work+0x770/0x770
> > [34094.993121]  kthread+0x1af/0x1d0
> > [34094.993121]  ? kthread_create_worker_on_cpu+0xd0/0xd0
> > [34094.993121]  ret_from_fork+0x1f/0x30
> > [34094.993121]
> > [34094.993121] Allocated by task 19184:
> > [34094.993121]  kasan_kmalloc+0xc2/0xe0
> > [34094.993121]  kmem_cache_alloc_node_trace+0xf9/0x220
> > [34094.993121]  bfq_pd_alloc+0x4c/0x510
> > [34094.993121]  blkg_alloc+0x237/0x310
> > [34094.993121]  blkg_create+0x499/0x5f0
> > [34094.993121]  blkg_lookup_create+0x140/0x1b0
> > [34094.993121]  generic_make_request_checks+0x5ce/0xad0
> > [34094.993121]  generic_make_request+0xd9/0x6b0
> > [34094.993121]  submit_bio+0xa6/0x240
> > [34094.993121]  mpage_readpages+0x29e/0x3b0
> > [34094.993121]  read_pages+0xdf/0x3a0
> > [34094.993121]  __do_page_cache_readahead+0x278/0x290
> > [34094.993121]  ondemand_readahead+0x275/0x460
> > [34094.993121]  generic_file_read_iter+0xc4a/0x1790
> > [34094.993121]  blkdev_read_iter+0x8c/0xc0
> > [34094.993121]  aio_read+0x174/0x260
> > [34094.993121]  io_submit_one+0x7d3/0x14b0
> > [34094.993121]  __x64_sys_io_submit+0xfe/0x230
> > [34094.993121]  do_syscall_64+0x6f/0x280
> > [34094.993121]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [34094.993121]
> > [34094.993121] Freed by task 9:
> > [34094.993121]  __kasan_slab_free+0x12f/0x180
> > [34094.993121]  kfree+0x92/0x1b0
> > [34094.993121]  blkg_free.part.0+0x4a/0xe0
> > [34094.993121]  rcu_process_callbacks+0x420/0x6c0
> > [34094.993121]  __do_softirq+0x109/0x36c
> > [34094.993121]
> > [34094.993121] The buggy address belongs to the object at ffff888126528000
> > [34094.993121]  which belongs to the cache kmalloc-2048 of size 2048
> > [34094.993121] The buggy address is located 600 bytes inside of
> > [34094.993121]  2048-byte region [ffff888126528000, ffff888126528800)
> > [34094.993121] The buggy address belongs to the page:
> > [34094.993121] page:ffffea0004994a00 count:1 mapcount:0
> > mapping:ffff88810000e800 index:0xffff0
> > [34094.993121] flags: 0x17ffffc0008100(slab|head)
> > [34094.993121] raw: 0017ffffc0008100 dead000000000100 dead000000000200
> > ffff88810000e800
> > [34094.993121] raw: ffff88812652c400 00000000800f0009 00000001ffffffff
> > 0000000000000000
> > [34094.993121] page dumped because: kasan: bad access detected
> > [34094.993121]
> > [34094.993121] Memory state around the buggy address:
> > [34094.993121]  ffff888126528100: fb fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb
> > [34094.993121]  ffff888126528180: fb fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb
> > [34094.993121] >ffff888126528200: fb fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb
> > [34094.993121]                                                     ^
> > [34094.993121]  ffff888126528280: fb fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb
> > [34094.993121]  ffff888126528300: fb fb fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb
> > [34094.993121]
> > ==================================================================
> > 
> > I'll try to figure out the root cause, in the meantime, feel free to
> > kick around if you have any througts.
> > 
> > Thansk,
> > Kuai
> > >                                 Honza
> > > 
> 
> Hi,
> 
> I finally figure out the root cause... This is introduced by a temporary
> fix of the problem that offlined bfqg is reinserted into service tree.
> 
> The temporary fix is as follow:
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 24a5c5329bcd..ee1933cd9a43 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -935,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
> 
>  put_async_queues:
>         bfq_put_async_queues(bfqd, bfqg);
> +       pd->plid = BLKCG_MAX_POLS;
> 
>         spin_unlock_irqrestore(&bfqd->lock, flags);
>         /*
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index b74cc0da118e..fa2062244805 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -1692,6 +1692,15 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct
> bfq_queue *bfqq,
>   */
>  void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>  {
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> +       /* If parent group is offlined, move the bfqq to root group */
> +       if (bfqq->entity.parent) {
> +               struct bfq_group *bfqg = bfq_bfqq_to_bfqg(bfqq);
> +
> +               if (bfqg->pd.plid >= BLKCG_MAX_POLS)
> +                       bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
> +       }
> +#endif
>         bfq_log_bfqq(bfqd, bfqq, "add to busy");
> 
> I add bfq_bfqq_move() here before bfq_mark_bfqq_busy(), which cause
> the problem...

OK, thanks for following up on this. So do I understand you correctly that
the problem with empty bfqq being in pos_tree does not exist in current
upstream kernel?

								Honza
Yu Kuai Jan. 7, 2022, 1:04 a.m. UTC | #8
在 2022/01/03 23:37, Jan Kara 写道:
> OK, thanks for following up on this. So do I understand you correctly that
> the problem with empty bfqq being in pos_tree does not exist in current
> upstream kernel?

Yes, I had this patch removed in v2 patchset.

Thanks,
Kuai
diff mbox series

Patch

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 8e8cf6b3d946..822dd28ecf53 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -677,7 +677,6 @@  void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
 	else if (entity->on_st_or_in_serv)
 		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
-	bfqg_and_blkg_put(old_parent);
 
 	if (entity->parent &&
 	    entity->parent->last_bfqq_created == bfqq)
@@ -690,11 +689,16 @@  void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	/* pin down bfqg and its associated blkg  */
 	bfqg_and_blkg_get(bfqg);
 
-	if (bfq_bfqq_busy(bfqq)) {
-		if (unlikely(!bfqd->nonrot_with_queueing))
-			bfq_pos_tree_add_move(bfqd, bfqq);
+	/*
+	 * Don't leave the pos_root to old bfqg, since the ref to old bfqg will
+	 * be released and the bfqg might be freed.
+	 */
+	if (unlikely(!bfqd->nonrot_with_queueing))
+		bfq_pos_tree_add_move(bfqd, bfqq);
+	bfqg_and_blkg_put(old_parent);
+
+	if (bfq_bfqq_busy(bfqq))
 		bfq_activate_bfqq(bfqd, bfqq);
-	}
 
 	if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
 		bfq_schedule_dispatch(bfqd);