diff mbox series

[RFC,1/2] blk-mq: test tags bitmap before get request

Message ID 20210301021444.4134047-2-yuyufen@huawei.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: improve blk_mq_tag_to_rq() | expand

Commit Message

Yufen Yu March 1, 2021, 2:14 a.m. UTC
For now, we set hctx->tags->rqs[i] when get driver tag successfully.
The request either comes from sched_tags->static_rqs[] with scheduler,
or comes from tags->static_rqs[] with no scheduler. But, the value won't
be clear when put driver tag. Thus, tags->rqs[i] still remain old request.

We can free these sched_tags->static_rqs[] requests when switch elevator,
update nr_requests or update nr_hw_queues. After that, unexpected access
of tags->rqs[i] may cause use-after-free crash.

For example, we reported use-after-free of request in nbd device
by syzkaller:

BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
Call trace:
 dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x144/0x1b4 lib/dump_stack.c:118
 print_address_description+0x68/0x2d0 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x134/0x2f0 mm/kasan/report.c:409
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
 __read_once_size include/linux/compiler.h:193 [inline]
 blk_mq_rq_state block/blk-mq.h:106 [inline]
 blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
 nbd_read_stat drivers/block/nbd.c:670 [inline]
 recv_work+0x1bc/0x890 drivers/block/nbd.c:749
 process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
 worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
 kthread+0x1d8/0x1e0 kernel/kthread.c:255
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174

The syzkaller test program sended a reply package to client
without client sending request. After receiving the package,
recv_work() try to get the remained request in tags->rqs[]
by tag, which have been free.

To avoid this type of problem, we may need to ensure the request
valid when get it by tag.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/blk-mq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Damien Le Moal March 1, 2021, 2:49 a.m. UTC | #1
On 2021/03/01 11:13, Yufen Yu wrote:
> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
> The request either comes from sched_tags->static_rqs[] with scheduler,
> or comes from tags->static_rqs[] with no scheduler. But, the value won't
> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
> 
> We can free these sched_tags->static_rqs[] requests when switch elevator,
> update nr_requests or update nr_hw_queues. After that, unexpected access
> of tags->rqs[i] may cause use-after-free crash.
> 
> For example, we reported use-after-free of request in nbd device
> by syzkaller:
> 
> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
> Call trace:
>  dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>  show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>  print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>  __read_once_size include/linux/compiler.h:193 [inline]
>  blk_mq_rq_state block/blk-mq.h:106 [inline]
>  blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>  nbd_read_stat drivers/block/nbd.c:670 [inline]
>  recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>  process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>  worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>  kthread+0x1d8/0x1e0 kernel/kthread.c:255
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
> 
> The syzkaller test program sended a reply package to client
> without client sending request. After receiving the package,
> recv_work() try to get the remained request in tags->rqs[]
> by tag, which have been free.
> 
> To avoid this type of problem, we may need to ensure the request
> valid when get it by tag.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d4d7c1caa439..5362a7958b74 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
> +{
> +	if (!blk_mq_tag_is_reserved(tags, tag))
> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
> +	else

No need for else after a return.

> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
> +}
> +
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> -	if (tag < tags->nr_tags) {
> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
>  	}
>
Bart Van Assche March 1, 2021, 3:49 a.m. UTC | #2
On 2/28/21 6:14 PM, Yufen Yu wrote:
> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
> The request either comes from sched_tags->static_rqs[] with scheduler,
> or comes from tags->static_rqs[] with no scheduler. But, the value won't
> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
> 
> We can free these sched_tags->static_rqs[] requests when switch elevator,
> update nr_requests or update nr_hw_queues. After that, unexpected access
> of tags->rqs[i] may cause use-after-free crash.
> 
> For example, we reported use-after-free of request in nbd device
> by syzkaller:
> 
> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
> Call trace:
>  dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>  show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>  print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>  __read_once_size include/linux/compiler.h:193 [inline]
>  blk_mq_rq_state block/blk-mq.h:106 [inline]
>  blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>  nbd_read_stat drivers/block/nbd.c:670 [inline]
>  recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>  process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>  worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>  kthread+0x1d8/0x1e0 kernel/kthread.c:255
>  ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
> 
> The syzkaller test program sended a reply package to client
> without client sending request. After receiving the package,
> recv_work() try to get the remained request in tags->rqs[]
> by tag, which have been free.
> 
> To avoid this type of problem, we may need to ensure the request
> valid when get it by tag.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/blk-mq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d4d7c1caa439..5362a7958b74 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
> +{
> +	if (!blk_mq_tag_is_reserved(tags, tag))
> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
> +	else
> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
> +}
> +
>  struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
> -	if (tag < tags->nr_tags) {
> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>  		prefetch(tags->rqs[tag]);
>  		return tags->rqs[tag];
>  	}

Please do not slow down the hot path by inserting additional code in the
hot path. I am convinced that the race described in the patch
description can be fixed without changing the hot path. See also the
conversation I had recently with John Garry on linux-block.

Thanks,

Bart.

Bart.
Hannes Reinecke March 1, 2021, 7:54 a.m. UTC | #3
On 3/1/21 4:49 AM, Bart Van Assche wrote:
> On 2/28/21 6:14 PM, Yufen Yu wrote:
>> For now, we set hctx->tags->rqs[i] when get driver tag successfully.
>> The request either comes from sched_tags->static_rqs[] with scheduler,
>> or comes from tags->static_rqs[] with no scheduler. But, the value won't
>> be clear when put driver tag. Thus, tags->rqs[i] still remain old request.
>>
>> We can free these sched_tags->static_rqs[] requests when switch elevator,
>> update nr_requests or update nr_hw_queues. After that, unexpected access
>> of tags->rqs[i] may cause use-after-free crash.
>>
>> For example, we reported use-after-free of request in nbd device
>> by syzkaller:
>>
>> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
>> Call trace:
>>   dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>>   show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>>   print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>   kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>>   __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>>   __read_once_size include/linux/compiler.h:193 [inline]
>>   blk_mq_rq_state block/blk-mq.h:106 [inline]
>>   blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>>   nbd_read_stat drivers/block/nbd.c:670 [inline]
>>   recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>>   process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>>   worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>>   kthread+0x1d8/0x1e0 kernel/kthread.c:255
>>   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
>>
>> The syzkaller test program sended a reply package to client
>> without client sending request. After receiving the package,
>> recv_work() try to get the remained request in tags->rqs[]
>> by tag, which have been free.
>>
>> To avoid this type of problem, we may need to ensure the request
>> valid when get it by tag.
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>> ---
>>   block/blk-mq.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index d4d7c1caa439..5362a7958b74 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>>   }
>>   EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>>   
>> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
>> +{
>> +	if (!blk_mq_tag_is_reserved(tags, tag))
>> +		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
>> +	else
>> +		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
>> +}
>> +
>>   struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>>   {
>> -	if (tag < tags->nr_tags) {
>> +	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>>   		prefetch(tags->rqs[tag]);
>>   		return tags->rqs[tag];
>>   	}
> 
> Please do not slow down the hot path by inserting additional code in the
> hot path. I am convinced that the race described in the patch
> description can be fixed without changing the hot path. See also the
> conversation I had recently with John Garry on linux-block.
> 
Seems to be cropping up everywhere now; anyway, I do agree with Bart here.
For the hot path (typically when looking up the associated command from 
within the interrupt routine) we really should not add any further code 
to not slow down processing.
Additionally, this is typically a firmware response so we can be 
reasonably certain that this is a response to valid command, so in 
nearly all cases the bit will be set.
(Pathological cases like spoofed response frames aside).

However, there another use case where blk_mq_tag_to_rq() is used, and 
that is for traversing outstanding commands eg during a device reset.
There we _have_ to ensure that the request is valid lest we run into
uninitialized values.

So I would advocate to have a slow path variant here which would 
validate the bitmap before trying to access the request.

Or, really, converting those drivers to use blk_mq_tagset_busy_iter().

Cheers,

Hannes
John Garry March 1, 2021, 12:20 p.m. UTC | #4
On 01/03/2021 07:54, Hannes Reinecke wrote:
>>>
>>> For example, we reported use-after-free of request in nbd device
>>> by syzkaller:
>>>
>>> BUG: KASAN: use-after-free in blk_mq_request_started+0x24/0x40 
>>> block/blk-mq.c:644
>>> Read of size 4 at addr ffff80036b77f9d4 by task kworker/u9:0/10086
>>> Call trace:
>>>   dump_backtrace+0x0/0x310 arch/arm64/kernel/time.c:78
>>>   show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
>>>   __dump_stack lib/dump_stack.c:77 [inline]
>>>   dump_stack+0x144/0x1b4 lib/dump_stack.c:118
>>>   print_address_description+0x68/0x2d0 mm/kasan/report.c:253
>>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>>   kasan_report+0x134/0x2f0 mm/kasan/report.c:409
>>>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>>>   __asan_load4+0x88/0xb0 mm/kasan/kasan.c:699
>>>   __read_once_size include/linux/compiler.h:193 [inline]
>>>   blk_mq_rq_state block/blk-mq.h:106 [inline]
>>>   blk_mq_request_started+0x24/0x40 block/blk-mq.c:644
>>>   nbd_read_stat drivers/block/nbd.c:670 [inline]
>>>   recv_work+0x1bc/0x890 drivers/block/nbd.c:749
>>>   process_one_work+0x3ec/0x9e0 kernel/workqueue.c:2156
>>>   worker_thread+0x80/0x9d0 kernel/workqueue.c:2311
>>>   kthread+0x1d8/0x1e0 kernel/kthread.c:255
>>>   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1174
>>>
>>> The syzkaller test program sended a reply package to client
>>> without client sending request. After receiving the package,
>>> recv_work() try to get the remained request in tags->rqs[]
>>> by tag, which have been free.
>>>
>>> To avoid this type of problem, we may need to ensure the request
>>> valid when get it by tag.
>>>
>>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>>> ---
>>>   block/blk-mq.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index d4d7c1caa439..5362a7958b74 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -836,9 +836,17 @@ void blk_mq_delay_kick_requeue_list(struct 
>>> request_queue *q,
>>>   }
>>>   EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>>> +static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned 
>>> int tag)
>>> +{
>>> +    if (!blk_mq_tag_is_reserved(tags, tag))
>>> +        return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
>>> +    else
>>> +        return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
>>> +}
>>> +
>>>   struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned 
>>> int tag)
>>>   {
>>> -    if (tag < tags->nr_tags) {
>>> +    if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
>>>           prefetch(tags->rqs[tag]);
>>>           return tags->rqs[tag];
>>>       }
>>
>> Please do not slow down the hot path by inserting additional code in the
>> hot path. I am convinced that the race described in the patch
>> description can be fixed without changing the hot path. See also the
>> conversation I had recently with John Garry on linux-block.
>>

I plan to send an updated series later this week:
https://lore.kernel.org/linux-block/28be6446-7e06-b03c-a373-39c5eef89c8a@huawei.com/

I'm inclined to say that the first patch there may solve all reasonable 
scenarios, but we should see how to fix all scenarios and decide on the 
path forward.

Thanks,
john

> Seems to be cropping up everywhere now; anyway, I do agree with Bart here.
> For the hot path (typically when looking up the associated command from 
> within the interrupt routine) we really should not add any further code 
> to not slow down processing.
> Additionally, this is typically a firmware response so we can be 
> reasonably certain that this is a response to valid command, so in 
> nearly all cases the bit will be set.
> (Pathological cases like spoofed response frames aside).
> 
> However, there another use case where blk_mq_tag_to_rq() is used, and 
> that is for traversing outstanding commands eg during a device reset.
> There we _have_ to ensure that the request is valid lest we run into
> uninitialized values.
> 
> So I would advocate to have a slow path variant here which would 
> validate the bitmap before trying to access the request.
> 
> Or, really, converting those drivers to use blk_mq_tagset_busy_iter().
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d4d7c1caa439..5362a7958b74 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -836,9 +836,17 @@  void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
+static int blk_mq_test_tag_bit(struct blk_mq_tags *tags, unsigned int tag)
+{
+	if (!blk_mq_tag_is_reserved(tags, tag))
+		return sbitmap_test_bit(&tags->bitmap_tags->sb, tag);
+	else
+		return sbitmap_test_bit(&tags->breserved_tags->sb, tag);
+}
+
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
-	if (tag < tags->nr_tags) {
+	if (tag < tags->nr_tags && blk_mq_test_tag_bit(tags, tag)) {
 		prefetch(tags->rqs[tag]);
 		return tags->rqs[tag];
 	}