Message ID | 20210907140154.2134091-3-yukuai3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | handle unexpected message from server | expand |
On Tue, Sep 07, 2021 at 10:01:50PM +0800, Yu Kuai wrote: > blk_mq_tag_to_rq() can only ensure to return valid request in > following situation: > > 1) client send request message to server first > submit_bio > ... > blk_mq_get_tag > ... > blk_mq_get_driver_tag > ... > nbd_queue_rq > nbd_handle_cmd > nbd_send_cmd > > 2) client receive respond message from server > recv_work > nbd_read_stat > blk_mq_tag_to_rq > > If step 1) is missing, blk_mq_tag_to_rq() will return a stale > request, which might be freed. Thus convert to use > blk_mq_find_and_get_req() to make sure the returned request is not > freed. However, there are still some problems if the request is > started, and this will be fixed in next patch. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/block/nbd.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 5170a630778d..920da390635c 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -718,12 +718,13 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) > tag = nbd_handle_to_tag(handle); > hwq = blk_mq_unique_tag_to_hwq(tag); > if (hwq < nbd->tag_set.nr_hw_queues) > - req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq], > - blk_mq_unique_tag_to_tag(tag)); > + req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq], > + blk_mq_unique_tag_to_tag(tag)); > if (!req || !blk_mq_request_started(req)) { > dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n", > tag, req); > - return ERR_PTR(-ENOENT); > + ret = -ENOENT; > + goto put_req; > } > trace_nbd_header_received(req, handle); > cmd = blk_mq_rq_to_pdu(req); > @@ -785,6 +786,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) > out: > trace_nbd_payload_received(req, handle); > mutex_unlock(&cmd->lock); > +put_req: > + if (req) > + blk_mq_put_rq_ref(req); > return ret ? ERR_PTR(ret) : cmd; After the request's refcnt is dropped, it can be freed immediately, then one stale command is returned to caller. Thanks, Ming
On 2021/09/08 15:30, Ming Lei wrote: >> +put_req: >> + if (req) >> + blk_mq_put_rq_ref(req); >> return ret ? ERR_PTR(ret) : cmd; > > After the request's refcnt is dropped, it can be freed immediately, then > one stale command is returned to caller. Hi, Ming It's right this patch leave this problem. Please take a look at patch 3 and patch 4, the problem should be fixed with these patches. Thanks, Kuai > > Thanks, > Ming > > . >
On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote: > On 2021/09/08 15:30, Ming Lei wrote: > > > > +put_req: > > > + if (req) > > > + blk_mq_put_rq_ref(req); > > > return ret ? ERR_PTR(ret) : cmd; > > > > After the request's refcnt is dropped, it can be freed immediately, then > > one stale command is returned to caller. > > Hi, Ming > > It's right this patch leave this problem. Please take a look at patch 3 > and patch 4, the problem should be fixed with these patches. Not see it is addressed in patch 3 & 4, and it is one obvious fault in patch 2, please fix it from beginning by moving the refcnt drop into recv_work(). BTW, the approach in patch 3 looks fine, which is very similar with SCSI's handling. Thanks, Ming
On 2021/09/08 16:00, Ming Lei wrote: > On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote: >> On 2021/09/08 15:30, Ming Lei wrote: >> >>>> +put_req: >>>> + if (req) >>>> + blk_mq_put_rq_ref(req); >>>> return ret ? ERR_PTR(ret) : cmd; >>> >>> After the request's refcnt is dropped, it can be freed immediately, then >>> one stale command is returned to caller. >> >> Hi, Ming >> >> It's right this patch leave this problem. Please take a look at patch 3 >> and patch 4, the problem should be fixed with these patches. > > Not see it is addressed in patch 3 & 4, and it is one obvious fault in > patch 2, please fix it from beginning by moving the refcnt drop > into recv_work(). Hi, Ming With patch 3 & 4: if nbd_read_stat() return a valid cmd, then the refcnt should not drop to 0 before blk_mq_complete_request() in recv_work(). if nbd_read_stat() failed, it won't be a problem if the request is freed immediately when refcnt is dropped in nbd_read_stat(). That's why I said that the problem will be fixed. BTW, if we move the refcnt drop into recv_work, we can only do that if nbd_read_stat() return a valid cmd. If we get a valid rq and failed in the following checkings in nbd_read_stat(), it's better to drop the refcnt in nbd_read_stat(). > > BTW, the approach in patch 3 looks fine, which is very similar with > SCSI's handling. Thanks for taking time reviewing these patches. Kuai > > Thanks, > Ming > > . >
On Wed, Sep 08, 2021 at 04:29:57PM +0800, yukuai (C) wrote: > On 2021/09/08 16:00, Ming Lei wrote: > > On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote: > > > On 2021/09/08 15:30, Ming Lei wrote: > > > > > > > > +put_req: > > > > > + if (req) > > > > > + blk_mq_put_rq_ref(req); > > > > > return ret ? ERR_PTR(ret) : cmd; > > > > > > > > After the request's refcnt is dropped, it can be freed immediately, then > > > > one stale command is returned to caller. > > > > > > Hi, Ming > > > > > > It's right this patch leave this problem. Please take a look at patch 3 > > > and patch 4, the problem should be fixed with these patches. > > > > Not see it is addressed in patch 3 & 4, and it is one obvious fault in > > patch 2, please fix it from beginning by moving the refcnt drop > > into recv_work(). > > Hi, Ming > > With patch 3 & 4: > > if nbd_read_stat() return a valid cmd, then the refcnt should not drop > to 0 before blk_mq_complete_request() in recv_work(). The valid cmd won't be timed out just between nbd_read_stat() and calling blk_mq_complete_request()? Yeah, the issue is addressed by patch 4, then please put 2 after 3 & 4, and suggest to add comment why request ref won't drop to zero in recv_work(). > > if nbd_read_stat() failed, it won't be a problem if the request is freed > immediately when refcnt is dropped in nbd_read_stat(). > > That's why I said that the problem will be fixed. > > BTW, if we move the refcnt drop into recv_work, we can only do that if > nbd_read_stat() return a valid cmd. If we get a valid rq and failed in > the following checkings in nbd_read_stat(), it's better to drop the > refcnt in nbd_read_stat(). The usual pattern is to drop the refcnt on when the object isn't used any more, so it is perfectly fine to hold the ref until that time. Thanks, Ming
On 2021/09/08 17:27, Ming Lei wrote: > On Wed, Sep 08, 2021 at 04:29:57PM +0800, yukuai (C) wrote: >> On 2021/09/08 16:00, Ming Lei wrote: >>> On Wed, Sep 08, 2021 at 03:37:06PM +0800, yukuai (C) wrote: >>>> On 2021/09/08 15:30, Ming Lei wrote: >>>> >>>>>> +put_req: >>>>>> + if (req) >>>>>> + blk_mq_put_rq_ref(req); >>>>>> return ret ? ERR_PTR(ret) : cmd; >>>>> >>>>> After the request's refcnt is dropped, it can be freed immediately, then >>>>> one stale command is returned to caller. >>>> >>>> Hi, Ming >>>> >>>> It's right this patch leave this problem. Please take a look at patch 3 >>>> and patch 4, the problem should be fixed with these patches. >>> >>> Not see it is addressed in patch 3 & 4, and it is one obvious fault in >>> patch 2, please fix it from beginning by moving the refcnt drop >>> into recv_work(). >> >> Hi, Ming >> >> With patch 3 & 4: >> >> if nbd_read_stat() return a valid cmd, then the refcnt should not drop >> to 0 before blk_mq_complete_request() in recv_work(). > > The valid cmd won't be timed out just between nbd_read_stat() and > calling blk_mq_complete_request()? > > Yeah, the issue is addressed by patch 4, then please put 2 after > 3 & 4, and suggest to add comment why request ref won't drop to zero > in recv_work(). Hi, Ming Thanks for the advice, will do this in next iteration. Best regards Kuai > >> >> if nbd_read_stat() failed, it won't be a problem if the request is freed >> immediately when refcnt is dropped in nbd_read_stat(). >> >> That's why I said that the problem will be fixed. >> >> BTW, if we move the refcnt drop into recv_work, we can only do that if >> nbd_read_stat() return a valid cmd. If we get a valid rq and failed in >> the following checkings in nbd_read_stat(), it's better to drop the >> refcnt in nbd_read_stat(). > > The usual pattern is to drop the refcnt on when the object isn't used > any more, so it is perfectly fine to hold the ref until that time. > > > Thanks, > Ming > > . >
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 5170a630778d..920da390635c 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -718,12 +718,13 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) tag = nbd_handle_to_tag(handle); hwq = blk_mq_unique_tag_to_hwq(tag); if (hwq < nbd->tag_set.nr_hw_queues) - req = blk_mq_tag_to_rq(nbd->tag_set.tags[hwq], - blk_mq_unique_tag_to_tag(tag)); + req = blk_mq_find_and_get_req(nbd->tag_set.tags[hwq], + blk_mq_unique_tag_to_tag(tag)); if (!req || !blk_mq_request_started(req)) { dev_err(disk_to_dev(nbd->disk), "Unexpected reply (%d) %p\n", tag, req); - return ERR_PTR(-ENOENT); + ret = -ENOENT; + goto put_req; } trace_nbd_header_received(req, handle); cmd = blk_mq_rq_to_pdu(req); @@ -785,6 +786,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) out: trace_nbd_payload_received(req, handle); mutex_unlock(&cmd->lock); +put_req: + if (req) + blk_mq_put_rq_ref(req); return ret ? ERR_PTR(ret) : cmd; }
blk_mq_tag_to_rq() can only ensure to return valid request in following situation: 1) client send request message to server first submit_bio ... blk_mq_get_tag ... blk_mq_get_driver_tag ... nbd_queue_rq nbd_handle_cmd nbd_send_cmd 2) client receive respond message from server recv_work nbd_read_stat blk_mq_tag_to_rq If step 1) is missing, blk_mq_tag_to_rq() will return a stale request, which might be freed. Thus convert to use blk_mq_find_and_get_req() to make sure the returned request is not freed. However, there are still some problems if the request is started, and this will be fixed in next patch. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/block/nbd.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)