diff mbox series

[v5,5/6] nbd: convert to use blk_mq_find_and_get_req()

Message ID 20210909141256.2606682-6-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series handle unexpected message from server | expand

Commit Message

Yu Kuai Sept. 9, 2021, 2:12 p.m. UTC
blk_mq_tag_to_rq() can only ensure to return valid request in
following situation:

1) client send request message to server first
submit_bio
...
 blk_mq_get_tag
 ...
 blk_mq_get_driver_tag
 ...
 nbd_queue_rq
  nbd_handle_cmd
   nbd_send_cmd

2) client receive respond message from server
recv_work
 nbd_read_stat
  blk_mq_tag_to_rq

If step 1) is missing, blk_mq_tag_to_rq() will return a stale
request, which might be freed. Thus convert to use
blk_mq_find_and_get_req() to make sure the returned request is not
freed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Ming Lei Sept. 14, 2021, 1:11 a.m. UTC | #1
On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> blk_mq_tag_to_rq() can only ensure to return valid request in
> following situation:
> 
> 1) client send request message to server first
> submit_bio
> ...
>  blk_mq_get_tag
>  ...
>  blk_mq_get_driver_tag
>  ...
>  nbd_queue_rq
>   nbd_handle_cmd
>    nbd_send_cmd
> 
> 2) client receive respond message from server
> recv_work
>  nbd_read_stat
>   blk_mq_tag_to_rq
> 
> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> request, which might be freed. Thus convert to use
> blk_mq_find_and_get_req() to make sure the returned request is not
> freed.

But NBD_CMD_INFLIGHT has been added for checking if the reply is
expected, do we still need blk_mq_find_and_get_req() for covering
this issue? BTW, request and its payload is pre-allocated, so there
isn't real use-after-free.


Thanks,
Ming
Yu Kuai Sept. 14, 2021, 3:11 a.m. UTC | #2
On 2021/09/14 9:11, Ming Lei wrote:
> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>> blk_mq_tag_to_rq() can only ensure to return valid request in
>> following situation:
>>
>> 1) client send request message to server first
>> submit_bio
>> ...
>>   blk_mq_get_tag
>>   ...
>>   blk_mq_get_driver_tag
>>   ...
>>   nbd_queue_rq
>>    nbd_handle_cmd
>>     nbd_send_cmd
>>
>> 2) client receive respond message from server
>> recv_work
>>   nbd_read_stat
>>    blk_mq_tag_to_rq
>>
>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>> request, which might be freed. Thus convert to use
>> blk_mq_find_and_get_req() to make sure the returned request is not
>> freed.
> 
> But NBD_CMD_INFLIGHT has been added for checking if the reply is
> expected, do we still need blk_mq_find_and_get_req() for covering
> this issue? BTW, request and its payload is pre-allocated, so there
> isn't real use-after-free.

Hi, Ming

Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
not the other way round.

nbd_read_stat
  req = blk_mq_tag_to_rq()
  cmd = blk_mq_rq_to_pdu(req)
  mutex_lock(cmd->lock)
  checking NBD_CMD_INFLIGHT

The checking doesn't have any effect on blk_mq_tag_to_rq().

Thanks,
Kuai
Ming Lei Sept. 14, 2021, 6:44 a.m. UTC | #3
On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> On 2021/09/14 9:11, Ming Lei wrote:
> > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > following situation:
> > > 
> > > 1) client send request message to server first
> > > submit_bio
> > > ...
> > >   blk_mq_get_tag
> > >   ...
> > >   blk_mq_get_driver_tag
> > >   ...
> > >   nbd_queue_rq
> > >    nbd_handle_cmd
> > >     nbd_send_cmd
> > > 
> > > 2) client receive respond message from server
> > > recv_work
> > >   nbd_read_stat
> > >    blk_mq_tag_to_rq
> > > 
> > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > request, which might be freed. Thus convert to use
> > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > freed.
> > 
> > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > expected, do we still need blk_mq_find_and_get_req() for covering
> > this issue? BTW, request and its payload is pre-allocated, so there
> > isn't real use-after-free.
> 
> Hi, Ming
> 
> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> not the other way round.
> 
> nbd_read_stat
>  req = blk_mq_tag_to_rq()
>  cmd = blk_mq_rq_to_pdu(req)
>  mutex_lock(cmd->lock)
>  checking NBD_CMD_INFLIGHT

Request and its payload is pre-allocated, and either req->ref or cmd->lock can
serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
inflight or not. If it isn't inflight, just return -ENOENT. Is there any
problem to handle in this way?


Thanks,
Ming
Yu Kuai Sept. 14, 2021, 7:13 a.m. UTC | #4
On 2021/09/14 14:44, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>> On 2021/09/14 9:11, Ming Lei wrote:
>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>> following situation:
>>>>
>>>> 1) client send request message to server first
>>>> submit_bio
>>>> ...
>>>>    blk_mq_get_tag
>>>>    ...
>>>>    blk_mq_get_driver_tag
>>>>    ...
>>>>    nbd_queue_rq
>>>>     nbd_handle_cmd
>>>>      nbd_send_cmd
>>>>
>>>> 2) client receive respond message from server
>>>> recv_work
>>>>    nbd_read_stat
>>>>     blk_mq_tag_to_rq
>>>>
>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>> request, which might be freed. Thus convert to use
>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>> freed.
>>>
>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>> this issue? BTW, request and its payload is pre-allocated, so there
>>> isn't real use-after-free.
>>
>> Hi, Ming
>>
>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
>> not the other way round.
>>
>> nbd_read_stat
>>   req = blk_mq_tag_to_rq()
>>   cmd = blk_mq_rq_to_pdu(req)
>>   mutex_lock(cmd->lock)
>>   checking NBD_CMD_INFLIGHT
> 
> Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> problem to handle in this way?

Hi, Ming

in nbd_read_stat:

1) get a request by tag first
2) get nbd_cmd by the request
3) hold cmd->lock and check if cmd is inflight

If we want to check if the cmd is inflight in step 3), we have to do
setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
can't make sure the returned request is not freed:

nbd_read_stat
			blk_mq_sched_free_requests
			 blk_mq_free_rqs
   blk_mq_tag_to_rq
   -> get rq before clear mapping
			  blk_mq_clear_rq_mapping
			  __free_pages -> rq is freed
   blk_mq_request_started -> UAF


Thanks,
Kuai
Ming Lei Sept. 14, 2021, 7:46 a.m. UTC | #5
On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
> On 2021/09/14 14:44, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> > > On 2021/09/14 9:11, Ming Lei wrote:
> > > > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > > > following situation:
> > > > > 
> > > > > 1) client send request message to server first
> > > > > submit_bio
> > > > > ...
> > > > >    blk_mq_get_tag
> > > > >    ...
> > > > >    blk_mq_get_driver_tag
> > > > >    ...
> > > > >    nbd_queue_rq
> > > > >     nbd_handle_cmd
> > > > >      nbd_send_cmd
> > > > > 
> > > > > 2) client receive respond message from server
> > > > > recv_work
> > > > >    nbd_read_stat
> > > > >     blk_mq_tag_to_rq
> > > > > 
> > > > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > > > request, which might be freed. Thus convert to use
> > > > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > > > freed.
> > > > 
> > > > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > > > expected, do we still need blk_mq_find_and_get_req() for covering
> > > > this issue? BTW, request and its payload is pre-allocated, so there
> > > > isn't real use-after-free.
> > > 
> > > Hi, Ming
> > > 
> > > Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> > > not the other way round.
> > > 
> > > nbd_read_stat
> > >   req = blk_mq_tag_to_rq()
> > >   cmd = blk_mq_rq_to_pdu(req)
> > >   mutex_lock(cmd->lock)
> > >   checking NBD_CMD_INFLIGHT
> > 
> > Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> > serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> > inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> > problem to handle in this way?
> 
> Hi, Ming
> 
> in nbd_read_stat:
> 
> 1) get a request by tag first
> 2) get nbd_cmd by the request
> 3) hold cmd->lock and check if cmd is inflight
> 
> If we want to check if the cmd is inflight in step 3), we have to do
> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
> can't make sure the returned request is not freed:
> 
> nbd_read_stat
> 			blk_mq_sched_free_requests
> 			 blk_mq_free_rqs
>   blk_mq_tag_to_rq
>   -> get rq before clear mapping
> 			  blk_mq_clear_rq_mapping
> 			  __free_pages -> rq is freed
>   blk_mq_request_started -> UAF

If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
wondering why not take the following simpler way for avoiding the UAF?

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..dfa5cce71f66 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
 	struct nbd_cmd *cmd;
 	struct request *rq;
 
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+                return;
+
 	while (1) {
 		cmd = nbd_read_stat(nbd, args->index);
 		if (IS_ERR(cmd)) {
@@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
 	}
+	blk_queue_exit(q);
 	nbd_config_put(nbd);
 	atomic_dec(&config->recv_threads);
 	wake_up(&config->recv_wq);

Thanks,
Ming
Yu Kuai Sept. 14, 2021, 9:08 a.m. UTC | #6
On 2021/09/14 15:46, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
>> On 2021/09/14 14:44, Ming Lei wrote:
>>> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>>>> On 2021/09/14 9:11, Ming Lei wrote:
>>>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>>>> following situation:
>>>>>>
>>>>>> 1) client send request message to server first
>>>>>> submit_bio
>>>>>> ...
>>>>>>     blk_mq_get_tag
>>>>>>     ...
>>>>>>     blk_mq_get_driver_tag
>>>>>>     ...
>>>>>>     nbd_queue_rq
>>>>>>      nbd_handle_cmd
>>>>>>       nbd_send_cmd
>>>>>>
>>>>>> 2) client receive respond message from server
>>>>>> recv_work
>>>>>>     nbd_read_stat
>>>>>>      blk_mq_tag_to_rq
>>>>>>
>>>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>>>> request, which might be freed. Thus convert to use
>>>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>>>> freed.
>>>>>
>>>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>>>> this issue? BTW, request and its payload is pre-allocated, so there
>>>>> isn't real use-after-free.
>>>>
>>>> Hi, Ming
>>>>
>>>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
>>>> not the other way round.
>>>>
>>>> nbd_read_stat
>>>>    req = blk_mq_tag_to_rq()
>>>>    cmd = blk_mq_rq_to_pdu(req)
>>>>    mutex_lock(cmd->lock)
>>>>    checking NBD_CMD_INFLIGHT
>>>
>>> Request and its payload is pre-allocated, and either req->ref or cmd->lock can
>>> serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
>>> inflight or not. If it isn't inflight, just return -ENOENT. Is there any
>>> problem to handle in this way?
>>
>> Hi, Ming
>>
>> in nbd_read_stat:
>>
>> 1) get a request by tag first
>> 2) get nbd_cmd by the request
>> 3) hold cmd->lock and check if cmd is inflight
>>
>> If we want to check if the cmd is inflight in step 3), we have to do
>> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
>> can't make sure the returned request is not freed:
>>
>> nbd_read_stat
>> 			blk_mq_sched_free_requests
>> 			 blk_mq_free_rqs
>>    blk_mq_tag_to_rq
>>    -> get rq before clear mapping
>> 			  blk_mq_clear_rq_mapping
>> 			  __free_pages -> rq is freed
>>    blk_mq_request_started -> UAF
> 
> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just

Hi, Ming

Why can't blk_mq_find_and_get_req() fix it? I can't think of any
scenario that might have problem currently.

> wondering why not take the following simpler way for avoiding the UAF?
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..dfa5cce71f66 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +                return;
> +

We can't make sure freeze_queue is called before this, thus this approch
can't fix the problem, right?
  nbd_read_stat
     blk_mq_tag_to_rq
			elevator_switch
			 blk_mq_freeze_queue(q);
			 elevator_switch_mq
			  elevator_exit
			   blk_mq_sched_free_requests
     blk_mq_request_started -> UAF

Thanks,
Kuai

>   	while (1) {
>   		cmd = nbd_read_stat(nbd, args->index);
>   		if (IS_ERR(cmd)) {
> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
>   	}
> +	blk_queue_exit(q);
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 
> Thanks,
> Ming
> 
> .
>
Yu Kuai Sept. 14, 2021, 9:12 a.m. UTC | #7
On 2021/09/14 17:08, yukuai (C) wrote:
> On 2021/09/14 15:46, Ming Lei wrote:
>> On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
>>> On 2021/09/14 14:44, Ming Lei wrote:
>>>> On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
>>>>> On 2021/09/14 9:11, Ming Lei wrote:
>>>>>> On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
>>>>>>> blk_mq_tag_to_rq() can only ensure to return valid request in
>>>>>>> following situation:
>>>>>>>
>>>>>>> 1) client send request message to server first
>>>>>>> submit_bio
>>>>>>> ...
>>>>>>>     blk_mq_get_tag
>>>>>>>     ...
>>>>>>>     blk_mq_get_driver_tag
>>>>>>>     ...
>>>>>>>     nbd_queue_rq
>>>>>>>      nbd_handle_cmd
>>>>>>>       nbd_send_cmd
>>>>>>>
>>>>>>> 2) client receive respond message from server
>>>>>>> recv_work
>>>>>>>     nbd_read_stat
>>>>>>>      blk_mq_tag_to_rq
>>>>>>>
>>>>>>> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
>>>>>>> request, which might be freed. Thus convert to use
>>>>>>> blk_mq_find_and_get_req() to make sure the returned request is not
>>>>>>> freed.
>>>>>>
>>>>>> But NBD_CMD_INFLIGHT has been added for checking if the reply is
>>>>>> expected, do we still need blk_mq_find_and_get_req() for covering
>>>>>> this issue? BTW, request and its payload is pre-allocated, so there
>>>>>> isn't real use-after-free.
>>>>>
>>>>> Hi, Ming
>>>>>
>>>>> Checking NBD_CMD_INFLIGHT relied on the request founded by tag is 
>>>>> valid,
>>>>> not the other way round.
>>>>>
>>>>> nbd_read_stat
>>>>>    req = blk_mq_tag_to_rq()
>>>>>    cmd = blk_mq_rq_to_pdu(req)
>>>>>    mutex_lock(cmd->lock)
>>>>>    checking NBD_CMD_INFLIGHT
>>>>
>>>> Request and its payload is pre-allocated, and either req->ref or 
>>>> cmd->lock can
>>>> serve the same purpose here. Once cmd->lock is held, you can check 
>>>> if the cmd is
>>>> inflight or not. If it isn't inflight, just return -ENOENT. Is there 
>>>> any
>>>> problem to handle in this way?
>>>
>>> Hi, Ming
>>>
>>> in nbd_read_stat:
>>>
>>> 1) get a request by tag first
>>> 2) get nbd_cmd by the request
>>> 3) hold cmd->lock and check if cmd is inflight
>>>
>>> If we want to check if the cmd is inflight in step 3), we have to do
>>> setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
>>> can't make sure the returned request is not freed:
>>>
>>> nbd_read_stat
>>>             blk_mq_sched_free_requests
>>>              blk_mq_free_rqs
>>>    blk_mq_tag_to_rq
>>>    -> get rq before clear mapping
>>>               blk_mq_clear_rq_mapping
>>>               __free_pages -> rq is freed
>>>    blk_mq_request_started -> UAF
>>
>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, 
>> just
> 
> Hi, Ming
> 
> Why can't blk_mq_find_and_get_req() fix it? I can't think of any
> scenario that might have problem currently.
> 
>> wondering why not take the following simpler way for avoiding the UAF?
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 5170a630778d..dfa5cce71f66 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>                                work);
>>       struct nbd_device *nbd = args->nbd;
>>       struct nbd_config *config = nbd->config;
>> +    struct request_queue *q = nbd->disk->queue;
>>       struct nbd_cmd *cmd;
>>       struct request *rq;
>> +    if (!percpu_ref_tryget(&q->q_usage_counter))
>> +                return;
>> +
> 
> We can't make sure freeze_queue is called before this, thus this approch
> can't fix the problem, right?
>   nbd_read_stat
>      blk_mq_tag_to_rq
>              elevator_switch
>               blk_mq_freeze_queue(q);
>               elevator_switch_mq
>                elevator_exit
>                 blk_mq_sched_free_requests
>      blk_mq_request_started -> UAF

Hi, Ming

I forgot that if percpu_ref_tryget succeed here, blk_mq_free_queue()
will block untill blk_queue_exit() in nbd_read_stat().

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
>>       while (1) {
>>           cmd = nbd_read_stat(nbd, args->index);
>>           if (IS_ERR(cmd)) {
>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>           if (likely(!blk_should_fake_timeout(rq->q)))
>>               blk_mq_complete_request(rq);
>>       }
>> +    blk_queue_exit(q);
>>       nbd_config_put(nbd);
>>       atomic_dec(&config->recv_threads);
>>       wake_up(&config->recv_wq);
>>
>> Thanks,
>> Ming
>>
>> .
>>
Yu Kuai Sept. 14, 2021, 9:19 a.m. UTC | #8
On 在 2021/09/14 15:46, Ming Lei wrote:

> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> wondering why not take the following simpler way for avoiding the UAF?
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..dfa5cce71f66 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +                return;
> +
>   	while (1) {
>   		cmd = nbd_read_stat(nbd, args->index);
>   		if (IS_ERR(cmd)) {
> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
>   	}
> +	blk_queue_exit(q);
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 

Hi, Ming

This apporch is wrong.

If blk_mq_freeze_queue() is called, and nbd is waiting for all
request to complete. percpu_ref_tryget() will fail here, and deadlock
will occur because request can't complete in recv_work().

Thanks,
Kuai
Ming Lei Sept. 14, 2021, 2:33 p.m. UTC | #9
On Tue, Sep 14, 2021 at 05:08:00PM +0800, yukuai (C) wrote:
> On 2021/09/14 15:46, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 03:13:38PM +0800, yukuai (C) wrote:
> > > On 2021/09/14 14:44, Ming Lei wrote:
> > > > On Tue, Sep 14, 2021 at 11:11:06AM +0800, yukuai (C) wrote:
> > > > > On 2021/09/14 9:11, Ming Lei wrote:
> > > > > > On Thu, Sep 09, 2021 at 10:12:55PM +0800, Yu Kuai wrote:
> > > > > > > blk_mq_tag_to_rq() can only ensure to return valid request in
> > > > > > > following situation:
> > > > > > > 
> > > > > > > 1) client send request message to server first
> > > > > > > submit_bio
> > > > > > > ...
> > > > > > >     blk_mq_get_tag
> > > > > > >     ...
> > > > > > >     blk_mq_get_driver_tag
> > > > > > >     ...
> > > > > > >     nbd_queue_rq
> > > > > > >      nbd_handle_cmd
> > > > > > >       nbd_send_cmd
> > > > > > > 
> > > > > > > 2) client receive respond message from server
> > > > > > > recv_work
> > > > > > >     nbd_read_stat
> > > > > > >      blk_mq_tag_to_rq
> > > > > > > 
> > > > > > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> > > > > > > request, which might be freed. Thus convert to use
> > > > > > > blk_mq_find_and_get_req() to make sure the returned request is not
> > > > > > > freed.
> > > > > > 
> > > > > > But NBD_CMD_INFLIGHT has been added for checking if the reply is
> > > > > > expected, do we still need blk_mq_find_and_get_req() for covering
> > > > > > this issue? BTW, request and its payload is pre-allocated, so there
> > > > > > isn't real use-after-free.
> > > > > 
> > > > > Hi, Ming
> > > > > 
> > > > > Checking NBD_CMD_INFLIGHT relied on the request founded by tag is valid,
> > > > > not the other way round.
> > > > > 
> > > > > nbd_read_stat
> > > > >    req = blk_mq_tag_to_rq()
> > > > >    cmd = blk_mq_rq_to_pdu(req)
> > > > >    mutex_lock(cmd->lock)
> > > > >    checking NBD_CMD_INFLIGHT
> > > > 
> > > > Request and its payload is pre-allocated, and either req->ref or cmd->lock can
> > > > serve the same purpose here. Once cmd->lock is held, you can check if the cmd is
> > > > inflight or not. If it isn't inflight, just return -ENOENT. Is there any
> > > > problem to handle in this way?
> > > 
> > > Hi, Ming
> > > 
> > > in nbd_read_stat:
> > > 
> > > 1) get a request by tag first
> > > 2) get nbd_cmd by the request
> > > 3) hold cmd->lock and check if cmd is inflight
> > > 
> > > If we want to check if the cmd is inflight in step 3), we have to do
> > > setp 1) and 2) first. As I explained in patch 0, blk_mq_tag_to_rq()
> > > can't make sure the returned request is not freed:
> > > 
> > > nbd_read_stat
> > > 			blk_mq_sched_free_requests
> > > 			 blk_mq_free_rqs
> > >    blk_mq_tag_to_rq
> > >    -> get rq before clear mapping
> > > 			  blk_mq_clear_rq_mapping
> > > 			  __free_pages -> rq is freed
> > >    blk_mq_request_started -> UAF
> > 
> > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> 
> Hi, Ming
> 
> Why can't blk_mq_find_and_get_req() fix it? I can't think of any
> scenario that might have problem currently.

The principle behind blk_mq_find_and_get_req() is that if one request's
ref is grabbed, the queue's usage counter is guaranteed to be grabbed,
and this way isn't straight-forward.

Yeah, it can fix the issue, but I don't think it is good to call it in
fast path cause tags->lock is required.

> 
> > wondering why not take the following simpler way for avoiding the UAF?
> > 
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index 5170a630778d..dfa5cce71f66 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> >   						     work);
> >   	struct nbd_device *nbd = args->nbd;
> >   	struct nbd_config *config = nbd->config;
> > +	struct request_queue *q = nbd->disk->queue;
> >   	struct nbd_cmd *cmd;
> >   	struct request *rq;
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +                return;
> > +
> 
> We can't make sure freeze_queue is called before this, thus this approch
> can't fix the problem, right?
>  nbd_read_stat
>     blk_mq_tag_to_rq
> 			elevator_switch
> 			 blk_mq_freeze_queue(q);
> 			 elevator_switch_mq
> 			  elevator_exit
> 			   blk_mq_sched_free_requests
>     blk_mq_request_started -> UAF

No, blk_mq_freeze_queue() waits until .q_usage_counter becomes zero, so
there won't be any concurrent nbd_read_stat() during switching elevator
if ->q_usage_counter is grabbed in recv_work().

Thanks,
Ming
Ming Lei Sept. 14, 2021, 2:37 p.m. UTC | #10
On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> On 在 2021/09/14 15:46, Ming Lei wrote:
> 
> > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > wondering why not take the following simpler way for avoiding the UAF?
> > 
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index 5170a630778d..dfa5cce71f66 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> >   						     work);
> >   	struct nbd_device *nbd = args->nbd;
> >   	struct nbd_config *config = nbd->config;
> > +	struct request_queue *q = nbd->disk->queue;
> >   	struct nbd_cmd *cmd;
> >   	struct request *rq;
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +                return;
> > +
> >   	while (1) {
> >   		cmd = nbd_read_stat(nbd, args->index);
> >   		if (IS_ERR(cmd)) {
> > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> >   		if (likely(!blk_should_fake_timeout(rq->q)))
> >   			blk_mq_complete_request(rq);
> >   	}
> > +	blk_queue_exit(q);
> >   	nbd_config_put(nbd);
> >   	atomic_dec(&config->recv_threads);
> >   	wake_up(&config->recv_wq);
> > 
> 
> Hi, Ming
> 
> This apporch is wrong.
> 
> If blk_mq_freeze_queue() is called, and nbd is waiting for all
> request to complete. percpu_ref_tryget() will fail here, and deadlock
> will occur because request can't complete in recv_work().

No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
it is perfectly fine to do nothing in recv_work().

Thanks,
Ming
Yu Kuai Sept. 15, 2021, 1:54 a.m. UTC | #11
On 2021/09/14 22:37, Ming Lei wrote:
> On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
>> On 在 2021/09/14 15:46, Ming Lei wrote:
>>
>>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
>>> wondering why not take the following simpler way for avoiding the UAF?
>>>
>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>> index 5170a630778d..dfa5cce71f66 100644
>>> --- a/drivers/block/nbd.c
>>> +++ b/drivers/block/nbd.c
>>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>>    						     work);
>>>    	struct nbd_device *nbd = args->nbd;
>>>    	struct nbd_config *config = nbd->config;
>>> +	struct request_queue *q = nbd->disk->queue;
>>>    	struct nbd_cmd *cmd;
>>>    	struct request *rq;
>>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>>> +                return;
>>> +
>>>    	while (1) {
>>>    		cmd = nbd_read_stat(nbd, args->index);
>>>    		if (IS_ERR(cmd)) {
>>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>>    		if (likely(!blk_should_fake_timeout(rq->q)))
>>>    			blk_mq_complete_request(rq);
>>>    	}
>>> +	blk_queue_exit(q);
>>>    	nbd_config_put(nbd);
>>>    	atomic_dec(&config->recv_threads);
>>>    	wake_up(&config->recv_wq);
>>>
>>
>> Hi, Ming
>>
>> This apporch is wrong.
>>
>> If blk_mq_freeze_queue() is called, and nbd is waiting for all
>> request to complete. percpu_ref_tryget() will fail here, and deadlock
>> will occur because request can't complete in recv_work().
> 
> No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> it is perfectly fine to do nothing in recv_work().
> 

Hi Ming

This apporch is a good idea, however we should not get q_usage_counter
in reccv_work(), because It will block freeze queue.

How about get q_usage_counter in nbd_read_stat(), and put in error path
or after request completion?

Thanks
Kuai
Ming Lei Sept. 15, 2021, 3:16 a.m. UTC | #12
On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
> On 2021/09/14 22:37, Ming Lei wrote:
> > On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> > > On 在 2021/09/14 15:46, Ming Lei wrote:
> > > 
> > > > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > > > wondering why not take the following simpler way for avoiding the UAF?
> > > > 
> > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > > index 5170a630778d..dfa5cce71f66 100644
> > > > --- a/drivers/block/nbd.c
> > > > +++ b/drivers/block/nbd.c
> > > > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> > > >    						     work);
> > > >    	struct nbd_device *nbd = args->nbd;
> > > >    	struct nbd_config *config = nbd->config;
> > > > +	struct request_queue *q = nbd->disk->queue;
> > > >    	struct nbd_cmd *cmd;
> > > >    	struct request *rq;
> > > > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > > > +                return;
> > > > +
> > > >    	while (1) {
> > > >    		cmd = nbd_read_stat(nbd, args->index);
> > > >    		if (IS_ERR(cmd)) {
> > > > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> > > >    		if (likely(!blk_should_fake_timeout(rq->q)))
> > > >    			blk_mq_complete_request(rq);
> > > >    	}
> > > > +	blk_queue_exit(q);
> > > >    	nbd_config_put(nbd);
> > > >    	atomic_dec(&config->recv_threads);
> > > >    	wake_up(&config->recv_wq);
> > > > 
> > > 
> > > Hi, Ming
> > > 
> > > This apporch is wrong.
> > > 
> > > If blk_mq_freeze_queue() is called, and nbd is waiting for all
> > > request to complete. percpu_ref_tryget() will fail here, and deadlock
> > > will occur because request can't complete in recv_work().
> > 
> > No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> > it is perfectly fine to do nothing in recv_work().
> > 
> 
> Hi Ming
> 
> This apporch is a good idea, however we should not get q_usage_counter
> in reccv_work(), because It will block freeze queue.
> 
> How about get q_usage_counter in nbd_read_stat(), and put in error path
> or after request completion?

OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
first, so how about the following change by partitioning nbd_read_stat()
into nbd_read_reply() and nbd_handle_reply()?

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..477fe057fc93 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -683,38 +683,47 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
-/* NULL returned = something went wrong, inform userspace */
-static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
+static int nbd_read_reply(struct nbd_device *nbd, int index,
+		struct nbd_reply *reply)
 {
-	struct nbd_config *config = nbd->config;
 	int result;
-	struct nbd_reply reply;
-	struct nbd_cmd *cmd;
-	struct request *req = NULL;
-	u64 handle;
-	u16 hwq;
-	u32 tag;
-	struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)};
+	struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)};
 	struct iov_iter to;
-	int ret = 0;
 
-	reply.magic = 0;
+	reply->magic = 0;
 	iov_iter_kvec(&to, READ, &iov, 1, sizeof(reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
-	if (result <= 0) {
-		if (!nbd_disconnected(config))
+	if (result < 0) {
+		if (!nbd_disconnected(nbd->config))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
-		return ERR_PTR(result);
+		return result;
 	}
 
-	if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
+	if (ntohl(reply->magic) != NBD_REPLY_MAGIC) {
 		dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
-				(unsigned long)ntohl(reply.magic));
-		return ERR_PTR(-EPROTO);
+				(unsigned long)ntohl(reply->magic));
+		return -EPROTO;
 	}
 
-	memcpy(&handle, reply.handle, sizeof(handle));
+	return 0;
+}
+
+/* NULL returned = something went wrong, inform userspace */
+static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
+		struct nbd_reply *reply)
+{
+	struct nbd_config *config = nbd->config;
+	int result;
+	struct nbd_cmd *cmd;
+	struct request *req = NULL;
+	u64 handle;
+	u16 hwq;
+	u32 tag;
+	struct iov_iter to;
+	int ret = 0;
+
+	memcpy(&handle, reply->handle, sizeof(handle));
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
@@ -747,9 +756,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 		ret = -ENOENT;
 		goto out;
 	}
-	if (ntohl(reply.error)) {
+	if (ntohl(reply->error)) {
 		dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n",
-			ntohl(reply.error));
+			ntohl(reply->error));
 		cmd->status = BLK_STS_IOERR;
 		goto out;
 	}
@@ -795,24 +804,36 @@ static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
+	struct nbd_sock *nsock;
 	struct nbd_cmd *cmd;
 	struct request *rq;
 
 	while (1) {
-		cmd = nbd_read_stat(nbd, args->index);
-		if (IS_ERR(cmd)) {
-			struct nbd_sock *nsock = config->socks[args->index];
+		struct nbd_reply reply;
 
-			mutex_lock(&nsock->tx_lock);
-			nbd_mark_nsock_dead(nbd, nsock, 1);
-			mutex_unlock(&nsock->tx_lock);
+		if (nbd_read_reply(nbd, args->index, &reply))
 			break;
-		}
 
+		if (!percpu_ref_tryget(&q->q_usage_counter))
+			break;
+
+		cmd = nbd_handle_reply(nbd, args->index, &reply);
+		if (IS_ERR(cmd)) {
+			blk_queue_exit(q);
+			break;
+		}
 		rq = blk_mq_rq_from_pdu(cmd);
 		if (likely(!blk_should_fake_timeout(rq->q)))
 			blk_mq_complete_request(rq);
+		blk_queue_exit(q);
 	}
+
+	nsock = config->socks[args->index];
+	mutex_lock(&nsock->tx_lock);
+	nbd_mark_nsock_dead(nbd, nsock, 1);
+	mutex_unlock(&nsock->tx_lock);
+
 	nbd_config_put(nbd);
 	atomic_dec(&config->recv_threads);
 	wake_up(&config->recv_wq);


Thanks,
Ming
Yu Kuai Sept. 15, 2021, 3:36 a.m. UTC | #13
On 2021/09/15 11:16, Ming Lei wrote:
> On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
>> On 2021/09/14 22:37, Ming Lei wrote:
>>> On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
>>>> On 在 2021/09/14 15:46, Ming Lei wrote:
>>>>
>>>>> If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
>>>>> wondering why not take the following simpler way for avoiding the UAF?
>>>>>
>>>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>>>> index 5170a630778d..dfa5cce71f66 100644
>>>>> --- a/drivers/block/nbd.c
>>>>> +++ b/drivers/block/nbd.c
>>>>> @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
>>>>>     						     work);
>>>>>     	struct nbd_device *nbd = args->nbd;
>>>>>     	struct nbd_config *config = nbd->config;
>>>>> +	struct request_queue *q = nbd->disk->queue;
>>>>>     	struct nbd_cmd *cmd;
>>>>>     	struct request *rq;
>>>>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>>>>> +                return;
>>>>> +
>>>>>     	while (1) {
>>>>>     		cmd = nbd_read_stat(nbd, args->index);
>>>>>     		if (IS_ERR(cmd)) {
>>>>> @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
>>>>>     		if (likely(!blk_should_fake_timeout(rq->q)))
>>>>>     			blk_mq_complete_request(rq);
>>>>>     	}
>>>>> +	blk_queue_exit(q);
>>>>>     	nbd_config_put(nbd);
>>>>>     	atomic_dec(&config->recv_threads);
>>>>>     	wake_up(&config->recv_wq);
>>>>>
>>>>
>>>> Hi, Ming
>>>>
>>>> This apporch is wrong.
>>>>
>>>> If blk_mq_freeze_queue() is called, and nbd is waiting for all
>>>> request to complete. percpu_ref_tryget() will fail here, and deadlock
>>>> will occur because request can't complete in recv_work().
>>>
>>> No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
>>> it is perfectly fine to do nothing in recv_work().
>>>
>>
>> Hi Ming
>>
>> This apporch is a good idea, however we should not get q_usage_counter
>> in reccv_work(), because It will block freeze queue.
>>
>> How about get q_usage_counter in nbd_read_stat(), and put in error path
>> or after request completion?
> 
> OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
> first, so how about the following change by partitioning nbd_read_stat()
> into nbd_read_reply() and nbd_handle_reply()?

Hi, Ming

The change looks good to me.

Do you want to send a patch to fix this?

Thanks,
Kuai
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..477fe057fc93 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -683,38 +683,47 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
>   	return 0;
>   }
>   
> -/* NULL returned = something went wrong, inform userspace */
> -static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
> +static int nbd_read_reply(struct nbd_device *nbd, int index,
> +		struct nbd_reply *reply)
>   {
> -	struct nbd_config *config = nbd->config;
>   	int result;
> -	struct nbd_reply reply;
> -	struct nbd_cmd *cmd;
> -	struct request *req = NULL;
> -	u64 handle;
> -	u16 hwq;
> -	u32 tag;
> -	struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)};
> +	struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)};
>   	struct iov_iter to;
> -	int ret = 0;
>   
> -	reply.magic = 0;
> +	reply->magic = 0;
>   	iov_iter_kvec(&to, READ, &iov, 1, sizeof(reply));
>   	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
> -	if (result <= 0) {
> -		if (!nbd_disconnected(config))
> +	if (result < 0) {
> +		if (!nbd_disconnected(nbd->config))
>   			dev_err(disk_to_dev(nbd->disk),
>   				"Receive control failed (result %d)\n", result);
> -		return ERR_PTR(result);
> +		return result;
>   	}
>   
> -	if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
> +	if (ntohl(reply->magic) != NBD_REPLY_MAGIC) {
>   		dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
> -				(unsigned long)ntohl(reply.magic));
> -		return ERR_PTR(-EPROTO);
> +				(unsigned long)ntohl(reply->magic));
> +		return -EPROTO;
>   	}
>   
> -	memcpy(&handle, reply.handle, sizeof(handle));
> +	return 0;
> +}
> +
> +/* NULL returned = something went wrong, inform userspace */
> +static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index,
> +		struct nbd_reply *reply)
> +{
> +	struct nbd_config *config = nbd->config;
> +	int result;
> +	struct nbd_cmd *cmd;
> +	struct request *req = NULL;
> +	u64 handle;
> +	u16 hwq;
> +	u32 tag;
> +	struct iov_iter to;
> +	int ret = 0;
> +
> +	memcpy(&handle, reply->handle, sizeof(handle));
>   	tag = nbd_handle_to_tag(handle);
>   	hwq = blk_mq_unique_tag_to_hwq(tag);
>   	if (hwq < nbd->tag_set.nr_hw_queues)
> @@ -747,9 +756,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
>   		ret = -ENOENT;
>   		goto out;
>   	}
> -	if (ntohl(reply.error)) {
> +	if (ntohl(reply->error)) {
>   		dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n",
> -			ntohl(reply.error));
> +			ntohl(reply->error));
>   		cmd->status = BLK_STS_IOERR;
>   		goto out;
>   	}
> @@ -795,24 +804,36 @@ static void recv_work(struct work_struct *work)
>   						     work);
>   	struct nbd_device *nbd = args->nbd;
>   	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
> +	struct nbd_sock *nsock;
>   	struct nbd_cmd *cmd;
>   	struct request *rq;
>   
>   	while (1) {
> -		cmd = nbd_read_stat(nbd, args->index);
> -		if (IS_ERR(cmd)) {
> -			struct nbd_sock *nsock = config->socks[args->index];
> +		struct nbd_reply reply;
>   
> -			mutex_lock(&nsock->tx_lock);
> -			nbd_mark_nsock_dead(nbd, nsock, 1);
> -			mutex_unlock(&nsock->tx_lock);
> +		if (nbd_read_reply(nbd, args->index, &reply))
>   			break;
> -		}
>   
> +		if (!percpu_ref_tryget(&q->q_usage_counter))
> +			break;
> +
> +		cmd = nbd_handle_reply(nbd, args->index, &reply);
> +		if (IS_ERR(cmd)) {
> +			blk_queue_exit(q);
> +			break;
> +		}
>   		rq = blk_mq_rq_from_pdu(cmd);
>   		if (likely(!blk_should_fake_timeout(rq->q)))
>   			blk_mq_complete_request(rq);
> +		blk_queue_exit(q);
>   	}
> +
> +	nsock = config->socks[args->index];
> +	mutex_lock(&nsock->tx_lock);
> +	nbd_mark_nsock_dead(nbd, nsock, 1);
> +	mutex_unlock(&nsock->tx_lock);
> +
>   	nbd_config_put(nbd);
>   	atomic_dec(&config->recv_threads);
>   	wake_up(&config->recv_wq);
> 
> 
> Thanks,
> Ming
> 
> .
>
Ming Lei Sept. 15, 2021, 3:46 a.m. UTC | #14
On Wed, Sep 15, 2021 at 11:36:47AM +0800, yukuai (C) wrote:
> On 2021/09/15 11:16, Ming Lei wrote:
> > On Wed, Sep 15, 2021 at 09:54:09AM +0800, yukuai (C) wrote:
> > > On 2021/09/14 22:37, Ming Lei wrote:
> > > > On Tue, Sep 14, 2021 at 05:19:31PM +0800, yukuai (C) wrote:
> > > > > On 在 2021/09/14 15:46, Ming Lei wrote:
> > > > > 
> > > > > > If the above can happen, blk_mq_find_and_get_req() may not fix it too, just
> > > > > > wondering why not take the following simpler way for avoiding the UAF?
> > > > > > 
> > > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > > > > index 5170a630778d..dfa5cce71f66 100644
> > > > > > --- a/drivers/block/nbd.c
> > > > > > +++ b/drivers/block/nbd.c
> > > > > > @@ -795,9 +795,13 @@ static void recv_work(struct work_struct *work)
> > > > > >     						     work);
> > > > > >     	struct nbd_device *nbd = args->nbd;
> > > > > >     	struct nbd_config *config = nbd->config;
> > > > > > +	struct request_queue *q = nbd->disk->queue;
> > > > > >     	struct nbd_cmd *cmd;
> > > > > >     	struct request *rq;
> > > > > > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > > > > > +                return;
> > > > > > +
> > > > > >     	while (1) {
> > > > > >     		cmd = nbd_read_stat(nbd, args->index);
> > > > > >     		if (IS_ERR(cmd)) {
> > > > > > @@ -813,6 +817,7 @@ static void recv_work(struct work_struct *work)
> > > > > >     		if (likely(!blk_should_fake_timeout(rq->q)))
> > > > > >     			blk_mq_complete_request(rq);
> > > > > >     	}
> > > > > > +	blk_queue_exit(q);
> > > > > >     	nbd_config_put(nbd);
> > > > > >     	atomic_dec(&config->recv_threads);
> > > > > >     	wake_up(&config->recv_wq);
> > > > > > 
> > > > > 
> > > > > Hi, Ming
> > > > > 
> > > > > This apporch is wrong.
> > > > > 
> > > > > If blk_mq_freeze_queue() is called, and nbd is waiting for all
> > > > > request to complete. percpu_ref_tryget() will fail here, and deadlock
> > > > > will occur because request can't complete in recv_work().
> > > > 
> > > > No, percpu_ref_tryget() won't fail until ->q_usage_counter is zero, when
> > > > it is perfectly fine to do nothing in recv_work().
> > > > 
> > > 
> > > Hi Ming
> > > 
> > > This apporch is a good idea, however we should not get q_usage_counter
> > > in reccv_work(), because It will block freeze queue.
> > > 
> > > How about get q_usage_counter in nbd_read_stat(), and put in error path
> > > or after request completion?
> > 
> > OK, looks I missed that nbd_read_stat() needs to wait for incoming reply
> > first, so how about the following change by partitioning nbd_read_stat()
> > into nbd_read_reply() and nbd_handle_reply()?
> 
> Hi, Ming
> 
> The change looks good to me.
> 
> Do you want to send a patch to fix this?

I guess you may add inflight check or sort of change in nbd_read_stat(), so feel
free to fold it into your series.


Thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 6d8cbf8be231..d298e2b9e6ee 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -729,12 +729,13 @@  static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
-		req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq],
-				       blk_mq_unique_tag_to_tag(tag));
+		req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq],
+					      blk_mq_unique_tag_to_tag(tag));
 	if (!req || !blk_mq_request_started(req)) {
 		dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n",
 			tag, req);
-		return ERR_PTR(-ENOENT);
+		ret = -ENOENT;
+		goto put_req;
 	}
 	trace_nbd_header_received(req, handle);
 	cmd = blk_mq_rq_to_pdu(req);
@@ -806,6 +807,14 @@  static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 out:
 	trace_nbd_payload_received(req, handle);
 	mutex_unlock(&cmd->lock);
+put_req:
+	/*
+	 * It's safe to drop refcnt here because request completion won't
+	 * concurent, thus if nbd_read_stat() successd, the request refcnt
+	 * won't drop to zero here.
+	 */
+	if (req)
+		blk_mq_put_rq_ref(req);
 	return ret ? ERR_PTR(ret) : cmd;
 }