Message ID | 20210107033149.15701-5-lengchao@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | avoid repeated request completion and IO error | expand |
> When a request is queued failed, blk_status_t is directly returned > to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, > BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call > blk_mq_end_request to complete the request with BLK_STS_IOERR. > In two scenarios, the request should be retried and may succeed. > First, if work with nvme multipath, the request may be retried > successfully in another path, because the error is probably related to > the path. Second, if work without multipath software, the request may > be retried successfully after error recovery. > If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list. > The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the > request asynchronously such as in nvme_submit_user_cmd, in extreme > scenario the request will be repeated freed in tear down. > If a non-resource error occurs in queue_rq, should directly call > nvme_complete_rq to complete request and set the state of request to > MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end > the request. > > Signed-off-by: Chao Leng <lengchao@huawei.com> > --- > drivers/nvme/host/rdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index df9f6f4549f1..4a89bf44ecdc 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, > unmap_qe: > ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command), > DMA_TO_DEVICE); > - return ret; > + return nvme_try_complete_failed_req(rq, ret); I don't understand this. There are errors that may not be related to anything that is pathing related (sw bug, memory leak, mapping error, etc, etc) why should we return this one-shot error?
On 2021/1/14 8:19, Sagi Grimberg wrote: > >> When a request is queued failed, blk_status_t is directly returned >> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, >> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call >> blk_mq_end_request to complete the request with BLK_STS_IOERR. >> In two scenarios, the request should be retried and may succeed. >> First, if work with nvme multipath, the request may be retried >> successfully in another path, because the error is probably related to >> the path. Second, if work without multipath software, the request may >> be retried successfully after error recovery. >> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list. >> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the >> request asynchronously such as in nvme_submit_user_cmd, in extreme >> scenario the request will be repeated freed in tear down. >> If a non-resource error occurs in queue_rq, should directly call >> nvme_complete_rq to complete request and set the state of request to >> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end >> the request. >> >> Signed-off-by: Chao Leng <lengchao@huawei.com> >> --- >> drivers/nvme/host/rdma.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >> index df9f6f4549f1..4a89bf44ecdc 100644 >> --- a/drivers/nvme/host/rdma.c >> +++ b/drivers/nvme/host/rdma.c >> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, >> unmap_qe: >> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command), >> DMA_TO_DEVICE); >> - return ret; >> + return nvme_try_complete_failed_req(rq, ret); > > I don't understand this. There are errors that may not be related to > anything that is pathing related (sw bug, memory leak, mapping error, > etc, etc) why should we return this one-shot error? Although fail over retry is not required, if we return the error to blk-mq, a low probability crash may happen. because blk-mq do not set the state of request to MQ_RQ_COMPLETE before complete the request, the request may be freed asynchronously such as in nvme_submit_user_cmd. If race with error recovery, request double completion may happens. So we can not return the error to blk-mq if the blk_status_t is not BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE. > .
>>> When a request is queued failed, blk_status_t is directly returned >>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, >>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call >>> blk_mq_end_request to complete the request with BLK_STS_IOERR. >>> In two scenarios, the request should be retried and may succeed. >>> First, if work with nvme multipath, the request may be retried >>> successfully in another path, because the error is probably related to >>> the path. Second, if work without multipath software, the request may >>> be retried successfully after error recovery. >>> If the request is complete with BLK_STS_IOERR in >>> blk_mq_dispatch_rq_list. >>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the >>> request asynchronously such as in nvme_submit_user_cmd, in extreme >>> scenario the request will be repeated freed in tear down. >>> If a non-resource error occurs in queue_rq, should directly call >>> nvme_complete_rq to complete request and set the state of request to >>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end >>> the request. >>> >>> Signed-off-by: Chao Leng <lengchao@huawei.com> >>> --- >>> drivers/nvme/host/rdma.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >>> index df9f6f4549f1..4a89bf44ecdc 100644 >>> --- a/drivers/nvme/host/rdma.c >>> +++ b/drivers/nvme/host/rdma.c >>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct >>> blk_mq_hw_ctx *hctx, >>> unmap_qe: >>> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct >>> nvme_command), >>> DMA_TO_DEVICE); >>> - return ret; >>> + return nvme_try_complete_failed_req(rq, ret); >> >> I don't understand this. There are errors that may not be related to >> anything that is pathing related (sw bug, memory leak, mapping error, >> etc, etc) why should we return this one-shot error? > Although fail over retry is not required, if we return the error to > blk-mq, a low probability crash may happen. because blk-mq do not set > the state of request to MQ_RQ_COMPLETE before complete the request, > the request may be freed asynchronously such as in nvme_submit_user_cmd. > If race with error recovery, request double completion may happens. Then fix that, don't work around it. > > So we can not return the error to blk-mq if the blk_status_t is not > BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE. This is not something we should be handling in nvme. block drivers should be able to fail queue_rq, and this all should live in the block layer.
On 2021/1/15 5:25, Sagi Grimberg wrote: > >>>> When a request is queued failed, blk_status_t is directly returned >>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, >>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call >>>> blk_mq_end_request to complete the request with BLK_STS_IOERR. >>>> In two scenarios, the request should be retried and may succeed. >>>> First, if work with nvme multipath, the request may be retried >>>> successfully in another path, because the error is probably related to >>>> the path. Second, if work without multipath software, the request may >>>> be retried successfully after error recovery. >>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list. >>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the >>>> request asynchronously such as in nvme_submit_user_cmd, in extreme >>>> scenario the request will be repeated freed in tear down. >>>> If a non-resource error occurs in queue_rq, should directly call >>>> nvme_complete_rq to complete request and set the state of request to >>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end >>>> the request. >>>> >>>> Signed-off-by: Chao Leng <lengchao@huawei.com> >>>> --- >>>> drivers/nvme/host/rdma.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >>>> index df9f6f4549f1..4a89bf44ecdc 100644 >>>> --- a/drivers/nvme/host/rdma.c >>>> +++ b/drivers/nvme/host/rdma.c >>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, >>>> unmap_qe: >>>> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command), >>>> DMA_TO_DEVICE); >>>> - return ret; >>>> + return nvme_try_complete_failed_req(rq, ret); >>> >>> I don't understand this. There are errors that may not be related to >>> anything that is pathing related (sw bug, memory leak, mapping error, >>> etc, etc) why should we return this one-shot error? >> Although fail over retry is not required, if we return the error to >> blk-mq, a low probability crash may happen. because blk-mq do not set >> the state of request to MQ_RQ_COMPLETE before complete the request, >> the request may be freed asynchronously such as in nvme_submit_user_cmd. >> If race with error recovery, request double completion may happens. > > Then fix that, don't work around it. I'm not trying to work around it. The purpose of this is to solve the problem of nvme native multipathing at the same time. > >> >> So we can not return the error to blk-mq if the blk_status_t is not >> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE. > > This is not something we should be handling in nvme. block drivers > should be able to fail queue_rq, and this all should live in the > block layer. Of course, it is also an idea to repair the block drivers directly. However, block layer is unaware of nvme native multipathing, will cause the request return error which should be avoided. The scenario: use two HBAs for nvme native multipath, and then one HBA fault, the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call blk_mq_end_request to complete the request which bypass name native multipath. We expect the request fail over to normal HBA, but the request is directly completed with BLK_STS_IOERR. The two scenarios can be fixed by directly completing the request in queue_rq. > .
>>>>> When a request is queued failed, blk_status_t is directly returned >>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, >>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call >>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR. >>>>> In two scenarios, the request should be retried and may succeed. >>>>> First, if work with nvme multipath, the request may be retried >>>>> successfully in another path, because the error is probably related to >>>>> the path. Second, if work without multipath software, the request may >>>>> be retried successfully after error recovery. >>>>> If the request is complete with BLK_STS_IOERR in >>>>> blk_mq_dispatch_rq_list. >>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the >>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme >>>>> scenario the request will be repeated freed in tear down. >>>>> If a non-resource error occurs in queue_rq, should directly call >>>>> nvme_complete_rq to complete request and set the state of request to >>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or >>>>> end >>>>> the request. >>>>> >>>>> Signed-off-by: Chao Leng <lengchao@huawei.com> >>>>> --- >>>>> drivers/nvme/host/rdma.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >>>>> index df9f6f4549f1..4a89bf44ecdc 100644 >>>>> --- a/drivers/nvme/host/rdma.c >>>>> +++ b/drivers/nvme/host/rdma.c >>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct >>>>> blk_mq_hw_ctx *hctx, >>>>> unmap_qe: >>>>> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct >>>>> nvme_command), >>>>> DMA_TO_DEVICE); >>>>> - return ret; >>>>> + return nvme_try_complete_failed_req(rq, ret); >>>> >>>> I don't understand this. There are errors that may not be related to >>>> anything that is pathing related (sw bug, memory leak, mapping error, >>>> etc, etc) why should we return this one-shot error? >>> Although fail over retry is not required, if we return the error to >>> blk-mq, a low probability crash may happen. because blk-mq do not set >>> the state of request to MQ_RQ_COMPLETE before complete the request, >>> the request may be freed asynchronously such as in nvme_submit_user_cmd. >>> If race with error recovery, request double completion may happens. >> >> Then fix that, don't work around it. > I'm not trying to work around it. The purpose of this is to solve > the problem of nvme native multipathing at the same time. Please explain how this is an nvme-multipath issue? >> >>> >>> So we can not return the error to blk-mq if the blk_status_t is not >>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE. >> >> This is not something we should be handling in nvme. block drivers >> should be able to fail queue_rq, and this all should live in the >> block layer. > Of course, it is also an idea to repair the block drivers directly. > However, block layer is unaware of nvme native multipathing, Nor it should be > will cause the request return error which should be avoided. Not sure I understand.. requests should failover for path related errors, what queue_rq errors are expected to be failed over from your perspective? > The scenario: use two HBAs for nvme native multipath, and then one HBA > fault, What is the specific error the driver sees? > the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call > blk_mq_end_request to complete the request which bypass name native > multipath. We expect the request fail over to normal HBA, but the request > is directly completed with BLK_STS_IOERR. > The two scenarios can be fixed by directly completing the request in > queue_rq. Well, certainly this one-shot always return 0 and complete the command with HOST_PATH error is not a good approach IMO
On 2021/1/16 9:18, Sagi Grimberg wrote: > >>>>>> When a request is queued failed, blk_status_t is directly returned >>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, >>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call >>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR. >>>>>> In two scenarios, the request should be retried and may succeed. >>>>>> First, if work with nvme multipath, the request may be retried >>>>>> successfully in another path, because the error is probably related to >>>>>> the path. Second, if work without multipath software, the request may >>>>>> be retried successfully after error recovery. >>>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list. >>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the >>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme >>>>>> scenario the request will be repeated freed in tear down. >>>>>> If a non-resource error occurs in queue_rq, should directly call >>>>>> nvme_complete_rq to complete request and set the state of request to >>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end >>>>>> the request. >>>>>> >>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com> >>>>>> --- >>>>>> drivers/nvme/host/rdma.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >>>>>> index df9f6f4549f1..4a89bf44ecdc 100644 >>>>>> --- a/drivers/nvme/host/rdma.c >>>>>> +++ b/drivers/nvme/host/rdma.c >>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>> unmap_qe: >>>>>> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command), >>>>>> DMA_TO_DEVICE); >>>>>> - return ret; >>>>>> + return nvme_try_complete_failed_req(rq, ret); >>>>> >>>>> I don't understand this. There are errors that may not be related to >>>>> anything that is pathing related (sw bug, memory leak, mapping error, >>>>> etc, etc) why should we return this one-shot error? >>>> Although fail over retry is not required, if we return the error to >>>> blk-mq, a low probability crash may happen. because blk-mq do not set >>>> the state of request to MQ_RQ_COMPLETE before complete the request, >>>> the request may be freed asynchronously such as in nvme_submit_user_cmd. >>>> If race with error recovery, request double completion may happens. >>> >>> Then fix that, don't work around it. >> I'm not trying to work around it. The purpose of this is to solve >> the problem of nvme native multipathing at the same time. > > Please explain how this is an nvme-multipath issue? > >>> >>>> >>>> So we can not return the error to blk-mq if the blk_status_t is not >>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE. >>> >>> This is not something we should be handling in nvme. block drivers >>> should be able to fail queue_rq, and this all should live in the >>> block layer. >> Of course, it is also an idea to repair the block drivers directly. >> However, block layer is unaware of nvme native multipathing, > > Nor it should be > >> will cause the request return error which should be avoided. > > Not sure I understand.. > requests should failover for path related errors, > what queue_rq errors are expected to be failed over from your > perspective? Although fail over for only path related errors is the best choice, it's almost impossible to achieve. The probability of non-path-related errors is very low. Although these errors do not require fail over retry, the cost of fail over retry is complete the request with error delay a bit long time(retry several times). It's not the best choice, but I think it's acceptable, because HBA driver does not have path-related error codes but only general error codes. It is difficult to identify whether the general error codes are path-related. > >> The scenario: use two HBAs for nvme native multipath, and then one HBA >> fault, > > What is the specific error the driver sees? The path related error code is closely related to HBA driver implementation. In general it is EIO. I don't think it's a good idea to assume what general error code the driver returns in the event of a path error. > >> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call >> blk_mq_end_request to complete the request which bypass name native >> multipath. We expect the request fail over to normal HBA, but the request >> is directly completed with BLK_STS_IOERR. >> The two scenarios can be fixed by directly completing the request in queue_rq. > Well, certainly this one-shot always return 0 and complete the command > with HOST_PATH error is not a good approach IMO So what's the better option? Just complete the request with host path error for non-ENOMEM and EAGAIN returned by the HBA driver?
On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote: >> Well, certainly this one-shot always return 0 and complete the command >> with HOST_PATH error is not a good approach IMO > So what's the better option? Just complete the request with host path > error for non-ENOMEM and EAGAIN returned by the HBA driver? what HBA driver?
On 2021/1/19 1:49, Christoph Hellwig wrote: > On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote: >>> Well, certainly this one-shot always return 0 and complete the command >>> with HOST_PATH error is not a good approach IMO >> So what's the better option? Just complete the request with host path >> error for non-ENOMEM and EAGAIN returned by the HBA driver? > > what HBA driver? mlx4 and mlx5. > . >
is not something we should be handling in nvme. block drivers >>>> should be able to fail queue_rq, and this all should live in the >>>> block layer. >>> Of course, it is also an idea to repair the block drivers directly. >>> However, block layer is unaware of nvme native multipathing, >> >> Nor it should be >> >>> will cause the request return error which should be avoided. >> >> Not sure I understand.. >> requests should failover for path related errors, >> what queue_rq errors are expected to be failed over from your >> perspective? > Although fail over for only path related errors is the best choice, it's > almost impossible to achieve. > The probability of non-path-related errors is very low. Although these > errors do not require fail over retry, the cost of fail over retry > is complete the request with error delay a bit long time(retry several > times). It's not the best choice, but I think it's acceptable, because > HBA driver does not have path-related error codes but only general error > codes. It is difficult to identify whether the general error codes are > path-related. If we have a SW bug or breakage that can happen occasionally, this can result in a constant failover rather than a simple failure. This is just not a good approach IMO. >>> The scenario: use two HBAs for nvme native multipath, and then one HBA >>> fault, >> >> What is the specific error the driver sees? > The path related error code is closely related to HBA driver > implementation. In general it is EIO. I don't think it's a good idea to > assume what general error code the driver returns in the event of a path > error. But assuming every error is a path error a good idea? >>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call >>> blk_mq_end_request to complete the request which bypass name native >>> multipath. We expect the request fail over to normal HBA, but the >>> request >>> is directly completed with BLK_STS_IOERR. >>> The two scenarios can be fixed by directly completing the request in >>> queue_rq. >> Well, certainly this one-shot always return 0 and complete the command >> with HOST_PATH error is not a good approach IMO > So what's the better option? Just complete the request with host path > error for non-ENOMEM and EAGAIN returned by the HBA driver? Well, the correct thing to do here would be to clone the bio and failover if the end_io error status is BLK_STS_IOERR. That sucks because it adds overhead, but this proposal doesn't sit well. it looks wrong to me. Alternatively, a more creative idea would be to encode the error status somehow in the cookie returned from submit_bio, but that also feels like a small(er) hack..
On 2021/1/21 5:35, Sagi Grimberg wrote: > > is not something we should be handling in nvme. block drivers >>>>> should be able to fail queue_rq, and this all should live in the >>>>> block layer. >>>> Of course, it is also an idea to repair the block drivers directly. >>>> However, block layer is unaware of nvme native multipathing, >>> >>> Nor it should be >>> >>>> will cause the request return error which should be avoided. >>> >>> Not sure I understand.. >>> requests should failover for path related errors, >>> what queue_rq errors are expected to be failed over from your >>> perspective? >> Although fail over for only path related errors is the best choice, it's >> almost impossible to achieve. >> The probability of non-path-related errors is very low. Although these >> errors do not require fail over retry, the cost of fail over retry >> is complete the request with error delay a bit long time(retry several >> times). It's not the best choice, but I think it's acceptable, because >> HBA driver does not have path-related error codes but only general error >> codes. It is difficult to identify whether the general error codes are >> path-related. > > If we have a SW bug or breakage that can happen occasionally, this can > result in a constant failover rather than a simple failure. This is just > not a good approach IMO. > >>>> The scenario: use two HBAs for nvme native multipath, and then one HBA >>>> fault, >>> >>> What is the specific error the driver sees? >> The path related error code is closely related to HBA driver >> implementation. In general it is EIO. I don't think it's a good idea to >> assume what general error code the driver returns in the event of a path >> error. > > But assuming every error is a path error a good idea? Of course not, according to the old code logic, assuming !ENOMEM && !EAGIAN for HBA drivers is a path error. I think it might be reasonable. > >>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call >>>> blk_mq_end_request to complete the request which bypass name native >>>> multipath. We expect the request fail over to normal HBA, but the request >>>> is directly completed with BLK_STS_IOERR. >>>> The two scenarios can be fixed by directly completing the request in queue_rq. >>> Well, certainly this one-shot always return 0 and complete the command >>> with HOST_PATH error is not a good approach IMO >> So what's the better option? Just complete the request with host path >> error for non-ENOMEM and EAGAIN returned by the HBA driver? > > Well, the correct thing to do here would be to clone the bio and > failover if the end_io error status is BLK_STS_IOERR. That sucks > because it adds overhead, but this proposal doesn't sit well. it > looks wrong to me. > > Alternatively, a more creative idea would be to encode the error > status somehow in the cookie returned from submit_bio, but that > also feels like a small(er) hack. If HBA drivers return !ENOMEM && !EAGIAN, queue_rq Directly call nvme_complete_rq with NVME_SC_HOST_PATH_ERROR like nvmf_fail_nonready_command. nvme_complete_rq will decide to retry, fail over or end the request. This may not be the best, but there seems to be no better choice. I will try to send the patch v2.
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index df9f6f4549f1..4a89bf44ecdc 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, unmap_qe: ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command), DMA_TO_DEVICE); - return ret; + return nvme_try_complete_failed_req(rq, ret); } static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
When a request is queued failed, blk_status_t is directly returned to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call blk_mq_end_request to complete the request with BLK_STS_IOERR. In two scenarios, the request should be retried and may succeed. First, if work with nvme multipath, the request may be retried successfully in another path, because the error is probably related to the path. Second, if work without multipath software, the request may be retried successfully after error recovery. If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list. The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the request asynchronously such as in nvme_submit_user_cmd, in extreme scenario the request will be repeated freed in tear down. If a non-resource error occurs in queue_rq, should directly call nvme_complete_rq to complete request and set the state of request to MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end the request. Signed-off-by: Chao Leng <lengchao@huawei.com> --- drivers/nvme/host/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)