diff mbox series

[v4,2/6] nbd: convert to use blk_mq_find_and_get_req()

Message ID 20210907140154.2134091-3-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series handle unexpected message from server | expand

Commit Message

Yu Kuai Sept. 7, 2021, 2:01 p.m. UTC
blk_mq_tag_to_rq() can only ensure to return valid request in
following situation:

1) client send request message to server first
submit_bio
...
 blk_mq_get_tag
 ...
 blk_mq_get_driver_tag
 ...
 nbd_queue_rq
  nbd_handle_cmd
   nbd_send_cmd

2) client receive respond message from server
recv_work
 nbd_read_stat
  blk_mq_tag_to_rq

If step 1) is missing, blk_mq_tag_to_rq() will return a stale
request, which might be freed. Thus convert to use
blk_mq_find_and_get_req() to make sure the returned request is not
freed. However, there are still some problems if the request is
started, and this will be fixed in next patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Ming Lei Sept. 8, 2021, 7:30 a.m. UTC | #1
On Tue, Sep 07, 2021 at 10:01:50PM +0800, Yu Kuai wrote:
> blk_mq_tag_to_rq() can only ensure to return valid request in
> following situation:
> 
> 1) client send request message to server first
> submit_bio
> ...
>  blk_mq_get_tag
>  ...
>  blk_mq_get_driver_tag
>  ...
>  nbd_queue_rq
>   nbd_handle_cmd
>    nbd_send_cmd
> 
> 2) client receive respond message from server
> recv_work
>  nbd_read_stat
>   blk_mq_tag_to_rq
> 
> If step 1) is missing, blk_mq_tag_to_rq() will return a stale
> request, which might be freed. Thus convert to use
> blk_mq_find_and_get_req() to make sure the returned request is not
> freed. However, there are still some problems if the request is
> started, and this will be fixed in next patch.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/block/nbd.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5170a630778d..920da390635c 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -718,12 +718,13 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
>  	tag = nbd_handle_to_tag(handle);
>  	hwq = blk_mq_unique_tag_to_hwq(tag);
>  	if (hwq < nbd->tag_set.nr_hw_queues)
> -		req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq],
> -				       blk_mq_unique_tag_to_tag(tag));
> +		req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq],
> +					      blk_mq_unique_tag_to_tag(tag));
>  	if (!req || !blk_mq_request_started(req)) {
>  		dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n",
>  			tag, req);
> -		return ERR_PTR(-ENOENT);
> +		ret = -ENOENT;
> +		goto put_req;
>  	}
>  	trace_nbd_header_received(req, handle);
>  	cmd = blk_mq_rq_to_pdu(req);
> @@ -785,6 +786,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
>  out:
>  	trace_nbd_payload_received(req, handle);
>  	mutex_unlock(&cmd->lock);
> +put_req:
> +	if (req)
> +		blk_mq_put_rq_ref(req);
>  	return ret ? ERR_PTR(ret) : cmd;

After the request's refcnt is dropped, it can be freed immediately, then
one stale command is returned to caller.

Thanks,
Ming
Yu Kuai Sept. 8, 2021, 7:37 a.m. UTC | #2
On 2021/09/08 15:30, Ming Lei wrote:

>> +put_req:
>> +	if (req)
>> +		blk_mq_put_rq_ref(req);
>>   	return ret ? ERR_PTR(ret) : cmd;
> 
> After the request's refcnt is dropped, it can be freed immediately, then
> one stale command is returned to caller.

Hi, Ming

It's right this patch leave this problem. Please take a look at patch 3
and patch 4, the problem should be fixed with these patches.

Thanks,
Kuai
> 
> Thanks,
> Ming
> 
> .
>
Ming Lei Sept. 8, 2021, 8 a.m. UTC | #3
On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote:
> On 2021/09/08 15:30, Ming Lei wrote:
> 
> > > +put_req:
> > > +	if (req)
> > > +		blk_mq_put_rq_ref(req);
> > >   	return ret ? ERR_PTR(ret) : cmd;
> > 
> > After the request's refcnt is dropped, it can be freed immediately, then
> > one stale command is returned to caller.
> 
> Hi, Ming
> 
> It's right this patch leave this problem. Please take a look at patch 3
> and patch 4, the problem should be fixed with these patches.

Not see it is addressed in patch 3 & 4, and it is one obvious fault in
patch 2, please fix it from beginning by moving the refcnt drop
into recv_work().

BTW, the approach in patch 3 looks fine, which is very similar with
SCSI's handling.

Thanks,
Ming
Yu Kuai Sept. 8, 2021, 8:29 a.m. UTC | #4
On 2021/09/08 16:00, Ming Lei wrote:
> On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote:
>> On 2021/09/08 15:30, Ming Lei wrote:
>>
>>>> +put_req:
>>>> +	if (req)
>>>> +		blk_mq_put_rq_ref(req);
>>>>    	return ret ? ERR_PTR(ret) : cmd;
>>>
>>> After the request's refcnt is dropped, it can be freed immediately, then
>>> one stale command is returned to caller.
>>
>> Hi, Ming
>>
>> It's right this patch leave this problem. Please take a look at patch 3
>> and patch 4, the problem should be fixed with these patches.
> 
> Not see it is addressed in patch 3 & 4, and it is one obvious fault in
> patch 2, please fix it from beginning by moving the refcnt drop
> into recv_work().

Hi, Ming

With patch 3 & 4:

if nbd_read_stat() return a valid cmd, then the refcnt should not drop
to 0 before blk_mq_complete_request() in recv_work().

if nbd_read_stat() failed, it won't be a problem if the request is freed
immediately when refcnt is dropped in nbd_read_stat().

That's why I said that the problem will be fixed.

BTW, if we move the refcnt drop into recv_work, we can only do that if
nbd_read_stat() return a valid cmd. If we get a valid rq and failed in
the following checkings in nbd_read_stat(), it's better to drop the
refcnt in nbd_read_stat().

> 
> BTW, the approach in patch 3 looks fine, which is very similar with
> SCSI's handling.

Thanks for taking time reviewing these patches.
Kuai
> 
> Thanks,
> Ming
> 
> .
>
Ming Lei Sept. 8, 2021, 9:27 a.m. UTC | #5
On Wed, Sep 08, 2021 at 04:29:57PM +0800, yukuai (C) wrote:
> On 2021/09/08 16:00, Ming Lei wrote:
> > On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote:
> > > On 2021/09/08 15:30, Ming Lei wrote:
> > > 
> > > > > +put_req:
> > > > > +	if (req)
> > > > > +		blk_mq_put_rq_ref(req);
> > > > >    	return ret ? ERR_PTR(ret) : cmd;
> > > > 
> > > > After the request's refcnt is dropped, it can be freed immediately, then
> > > > one stale command is returned to caller.
> > > 
> > > Hi, Ming
> > > 
> > > It's right this patch leave this problem. Please take a look at patch 3
> > > and patch 4, the problem should be fixed with these patches.
> > 
> > Not see it is addressed in patch 3 & 4, and it is one obvious fault in
> > patch 2, please fix it from beginning by moving the refcnt drop
> > into recv_work().
> 
> Hi, Ming
> 
> With patch 3 & 4:
> 
> if nbd_read_stat() return a valid cmd, then the refcnt should not drop
> to 0 before blk_mq_complete_request() in recv_work().

The valid cmd won't be timed out just between nbd_read_stat() and
calling blk_mq_complete_request()?

Yeah, the issue is addressed by patch 4, then please put 2 after
3 & 4, and suggest to add comment why request ref won't drop to zero
in recv_work().

> 
> if nbd_read_stat() failed, it won't be a problem if the request is freed
> immediately when refcnt is dropped in nbd_read_stat().
> 
> That's why I said that the problem will be fixed.
> 
> BTW, if we move the refcnt drop into recv_work, we can only do that if
> nbd_read_stat() return a valid cmd. If we get a valid rq and failed in
> the following checkings in nbd_read_stat(), it's better to drop the
> refcnt in nbd_read_stat().

The usual pattern is to drop the refcnt on when the object isn't used
any more, so it is perfectly fine to hold the ref until that time.


Thanks,
Ming
Yu Kuai Sept. 8, 2021, 11:03 a.m. UTC | #6
On 2021/09/08 17:27, Ming Lei wrote:
> On Wed, Sep 08, 2021 at 04:29:57PM +0800, yukuai (C) wrote:
>> On 2021/09/08 16:00, Ming Lei wrote:
>>> On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote:
>>>> On 2021/09/08 15:30, Ming Lei wrote:
>>>>
>>>>>> +put_req:
>>>>>> +	if (req)
>>>>>> +		blk_mq_put_rq_ref(req);
>>>>>>     	return ret ? ERR_PTR(ret) : cmd;
>>>>>
>>>>> After the request's refcnt is dropped, it can be freed immediately, then
>>>>> one stale command is returned to caller.
>>>>
>>>> Hi, Ming
>>>>
>>>> It's right this patch leave this problem. Please take a look at patch 3
>>>> and patch 4, the problem should be fixed with these patches.
>>>
>>> Not see it is addressed in patch 3 & 4, and it is one obvious fault in
>>> patch 2, please fix it from beginning by moving the refcnt drop
>>> into recv_work().
>>
>> Hi, Ming
>>
>> With patch 3 & 4:
>>
>> if nbd_read_stat() return a valid cmd, then the refcnt should not drop
>> to 0 before blk_mq_complete_request() in recv_work().
> 
> The valid cmd won't be timed out just between nbd_read_stat() and
> calling blk_mq_complete_request()?
> 
> Yeah, the issue is addressed by patch 4, then please put 2 after
> 3 & 4, and suggest to add comment why request ref won't drop to zero
> in recv_work().

Hi, Ming

Thanks for the advice, will do this in next iteration.

Best regards
Kuai
> 
>>
>> if nbd_read_stat() failed, it won't be a problem if the request is freed
>> immediately when refcnt is dropped in nbd_read_stat().
>>
>> That's why I said that the problem will be fixed.
>>
>> BTW, if we move the refcnt drop into recv_work, we can only do that if
>> nbd_read_stat() return a valid cmd. If we get a valid rq and failed in
>> the following checkings in nbd_read_stat(), it's better to drop the
>> refcnt in nbd_read_stat().
> 
> The usual pattern is to drop the refcnt on when the object isn't used
> any more, so it is perfectly fine to hold the ref until that time.
> 
> 
> Thanks,
> Ming
> 
> .
>
diff mbox series

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..920da390635c 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -718,12 +718,13 @@  static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	tag = nbd_handle_to_tag(handle);
 	hwq = blk_mq_unique_tag_to_hwq(tag);
 	if (hwq < nbd->tag_set.nr_hw_queues)
-		req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq],
-				       blk_mq_unique_tag_to_tag(tag));
+		req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq],
+					      blk_mq_unique_tag_to_tag(tag));
 	if (!req || !blk_mq_request_started(req)) {
 		dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n",
 			tag, req);
-		return ERR_PTR(-ENOENT);
+		ret = -ENOENT;
+		goto put_req;
 	}
 	trace_nbd_header_received(req, handle);
 	cmd = blk_mq_rq_to_pdu(req);
@@ -785,6 +786,9 @@  static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 out:
 	trace_nbd_payload_received(req, handle);
 	mutex_unlock(&cmd->lock);
+put_req:
+	if (req)
+		blk_mq_put_rq_ref(req);
 	return ret ? ERR_PTR(ret) : cmd;
 }