diff mbox series

[v2,2/4] nvme: allow local retry for requests with REQ_FAILFAST_TRANSPORT set

Message ID 20210415231530.95464-3-snitzer@redhat.com (mailing list archive)
State New
Headers show
Series nvme: improve error handling and ana_state to work well with dm-multipath | expand

Commit Message

Mike Snitzer April 15, 2021, 11:15 p.m. UTC
From: Chao Leng <lengchao@huawei.com>

REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
does not define the local retry mechanism. SCSI implements a fuzzy
local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
higher-level multipathing software to perform failover/retry.

NVMe is different with SCSI about this. It defines a local retry
mechanism and path error codes, so NVMe should retry local for non
path error. If path related error, whether to retry and how to retry
is still determined by higher-level multipathing's failover.

Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
because NVMe's local retry is needed -- as is NVMe specific logic to
categorize whether an error is path related.

In this way, the mechanism of NVMe multipath or other multipath are
now equivalent. The mechanism is: non path related error will be
retried locally, path related error is handled by multipath.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[snitzer: edited header for grammar and clarity, also added code comment]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/nvme/host/core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Hannes Reinecke April 16, 2021, 2:01 p.m. UTC | #1
On 4/16/21 1:15 AM, Mike Snitzer wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
> does not define the local retry mechanism. SCSI implements a fuzzy
> local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
> higher-level multipathing software to perform failover/retry.
> 
> NVMe is different with SCSI about this. It defines a local retry
> mechanism and path error codes, so NVMe should retry local for non
> path error. If path related error, whether to retry and how to retry
> is still determined by higher-level multipathing's failover.
> 
> Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
> because NVMe's local retry is needed -- as is NVMe specific logic to
> categorize whether an error is path related.
> 
> In this way, the mechanism of NVMe multipath or other multipath are
> now equivalent. The mechanism is: non path related error will be
> retried locally, path related error is handled by multipath.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [snitzer: edited header for grammar and clarity, also added code comment]
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/nvme/host/core.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 540d6fd8ffef..4134cf3c7e48 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -306,7 +306,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
>  	if (likely(nvme_req(req)->status == 0))
>  		return COMPLETE;
>  
> -	if (blk_noretry_request(req) ||
> +	/*
> +	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
> +	 * handles multipathing. Unlike SCSI, NVMe's error handling was
> +	 * specifically designed to handle local retry for non-path errors.
> +	 * As such, allow NVMe's local retry mechanism to be used for
> +	 * requests marked with REQ_FAILFAST_TRANSPORT.
> +	 */
> +	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
>  	    (nvme_req(req)->status & NVME_SC_DNR) ||
>  	    nvme_req(req)->retries >= nvme_max_retries)
>  		return COMPLETE;
> 
Huh?

#define blk_noretry_request(rq) \
        ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
                             REQ_FAILFAST_DRIVER))

making the only _actual_ change in your patch _not_ evaluating the
REQ_FAILFAST_DRIVER, which incidentally is only used by the NVMe core.
So what is it you're trying to solve?

Cheers,

Hannes
Mike Snitzer April 16, 2021, 2:53 p.m. UTC | #2
On Fri, Apr 16 2021 at 10:01am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 4/16/21 1:15 AM, Mike Snitzer wrote:
> > From: Chao Leng <lengchao@huawei.com>
> > 
> > REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
> > does not define the local retry mechanism. SCSI implements a fuzzy
> > local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
> > higher-level multipathing software to perform failover/retry.
> > 
> > NVMe is different with SCSI about this. It defines a local retry
> > mechanism and path error codes, so NVMe should retry local for non
> > path error. If path related error, whether to retry and how to retry
> > is still determined by higher-level multipathing's failover.
> > 
> > Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
> > because NVMe's local retry is needed -- as is NVMe specific logic to
> > categorize whether an error is path related.
> > 
> > In this way, the mechanism of NVMe multipath or other multipath are
> > now equivalent. The mechanism is: non path related error will be
> > retried locally, path related error is handled by multipath.
> > 
> > Signed-off-by: Chao Leng <lengchao@huawei.com>
> > [snitzer: edited header for grammar and clarity, also added code comment]
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  drivers/nvme/host/core.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 540d6fd8ffef..4134cf3c7e48 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -306,7 +306,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
> >  	if (likely(nvme_req(req)->status == 0))
> >  		return COMPLETE;
> >  
> > -	if (blk_noretry_request(req) ||
> > +	/*
> > +	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
> > +	 * handles multipathing. Unlike SCSI, NVMe's error handling was
> > +	 * specifically designed to handle local retry for non-path errors.
> > +	 * As such, allow NVMe's local retry mechanism to be used for
> > +	 * requests marked with REQ_FAILFAST_TRANSPORT.
> > +	 */
> > +	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
> >  	    (nvme_req(req)->status & NVME_SC_DNR) ||
> >  	    nvme_req(req)->retries >= nvme_max_retries)
> >  		return COMPLETE;
> > 
> Huh?
> 
> #define blk_noretry_request(rq) \
>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
>                              REQ_FAILFAST_DRIVER))
> 
> making the only _actual_ change in your patch _not_ evaluating the
> REQ_FAILFAST_DRIVER, which incidentally is only used by the NVMe core.

No, not sure how you got there. I'd have thought the 5 references to
"REQ_FAILFAST_TRANSPORT" would've been sufficient ;)

This patch makes it so requests marked with REQ_FAILFAST_TRANSPORT are
allowed to use NVMe's local retry (that is required for non-transport
errors).

> So what is it you're trying to solve?

What the patch header, code and code comment detail.

Mike
Hannes Reinecke April 16, 2021, 3:20 p.m. UTC | #3
On 4/16/21 4:53 PM, Mike Snitzer wrote:
> On Fri, Apr 16 2021 at 10:01am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 4/16/21 1:15 AM, Mike Snitzer wrote:
>>> From: Chao Leng <lengchao@huawei.com>
>>>
>>> REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
>>> does not define the local retry mechanism. SCSI implements a fuzzy
>>> local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
>>> higher-level multipathing software to perform failover/retry.
>>>
>>> NVMe is different with SCSI about this. It defines a local retry
>>> mechanism and path error codes, so NVMe should retry local for non
>>> path error. If path related error, whether to retry and how to retry
>>> is still determined by higher-level multipathing's failover.
>>>
>>> Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
>>> because NVMe's local retry is needed -- as is NVMe specific logic to
>>> categorize whether an error is path related.
>>>
>>> In this way, the mechanism of NVMe multipath or other multipath are
>>> now equivalent. The mechanism is: non path related error will be
>>> retried locally, path related error is handled by multipath.
>>>
>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>> [snitzer: edited header for grammar and clarity, also added code comment]
>>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>>> ---
>>>  drivers/nvme/host/core.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 540d6fd8ffef..4134cf3c7e48 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -306,7 +306,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
>>>  	if (likely(nvme_req(req)->status == 0))
>>>  		return COMPLETE;
>>>  
>>> -	if (blk_noretry_request(req) ||
>>> +	/*
>>> +	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
>>> +	 * handles multipathing. Unlike SCSI, NVMe's error handling was
>>> +	 * specifically designed to handle local retry for non-path errors.
>>> +	 * As such, allow NVMe's local retry mechanism to be used for
>>> +	 * requests marked with REQ_FAILFAST_TRANSPORT.
>>> +	 */
>>> +	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
>>>  	    (nvme_req(req)->status & NVME_SC_DNR) ||
>>>  	    nvme_req(req)->retries >= nvme_max_retries)
>>>  		return COMPLETE;
>>>
>> Huh?
>>
>> #define blk_noretry_request(rq) \
>>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
>>                              REQ_FAILFAST_DRIVER))
>>
>> making the only _actual_ change in your patch _not_ evaluating the
>> REQ_FAILFAST_DRIVER, which incidentally is only used by the NVMe core.
> 
> No, not sure how you got there. I'd have thought the 5 references to
> "REQ_FAILFAST_TRANSPORT" would've been sufficient ;)
> 

Ah. Misread stuff. You're excluding the REQ_FAILFAST_TRANSPORT here.
But then it's _actually_ similar to the next patch (which I've also
commented).

Wouldn't it be better to fold them into one patch and discuss things
together; especially as my comment to the next one might actually
achieve the same thing?

Cheers,

Hannes
Mike Snitzer April 16, 2021, 3:32 p.m. UTC | #4
On Fri, Apr 16 2021 at 11:20am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 4/16/21 4:53 PM, Mike Snitzer wrote:
> > On Fri, Apr 16 2021 at 10:01am -0400,
> > Hannes Reinecke <hare@suse.de> wrote:
> > 
> >> On 4/16/21 1:15 AM, Mike Snitzer wrote:
> >>> From: Chao Leng <lengchao@huawei.com>
> >>>
> >>> REQ_FAILFAST_TRANSPORT was designed for SCSI, because the SCSI protocol
> >>> does not define the local retry mechanism. SCSI implements a fuzzy
> >>> local retry mechanism, so REQ_FAILFAST_TRANSPORT is needed to allow
> >>> higher-level multipathing software to perform failover/retry.
> >>>
> >>> NVMe is different with SCSI about this. It defines a local retry
> >>> mechanism and path error codes, so NVMe should retry local for non
> >>> path error. If path related error, whether to retry and how to retry
> >>> is still determined by higher-level multipathing's failover.
> >>>
> >>> Unlike SCSI, NVMe shouldn't prevent retry if REQ_FAILFAST_TRANSPORT
> >>> because NVMe's local retry is needed -- as is NVMe specific logic to
> >>> categorize whether an error is path related.
> >>>
> >>> In this way, the mechanism of NVMe multipath or other multipath are
> >>> now equivalent. The mechanism is: non path related error will be
> >>> retried locally, path related error is handled by multipath.
> >>>
> >>> Signed-off-by: Chao Leng <lengchao@huawei.com>
> >>> [snitzer: edited header for grammar and clarity, also added code comment]
> >>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> >>> ---
> >>>  drivers/nvme/host/core.c | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> >>> index 540d6fd8ffef..4134cf3c7e48 100644
> >>> --- a/drivers/nvme/host/core.c
> >>> +++ b/drivers/nvme/host/core.c
> >>> @@ -306,7 +306,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
> >>>  	if (likely(nvme_req(req)->status == 0))
> >>>  		return COMPLETE;
> >>>  
> >>> -	if (blk_noretry_request(req) ||
> >>> +	/*
> >>> +	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
> >>> +	 * handles multipathing. Unlike SCSI, NVMe's error handling was
> >>> +	 * specifically designed to handle local retry for non-path errors.
> >>> +	 * As such, allow NVMe's local retry mechanism to be used for
> >>> +	 * requests marked with REQ_FAILFAST_TRANSPORT.
> >>> +	 */
> >>> +	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
> >>>  	    (nvme_req(req)->status & NVME_SC_DNR) ||
> >>>  	    nvme_req(req)->retries >= nvme_max_retries)
> >>>  		return COMPLETE;
> >>>
> >> Huh?
> >>
> >> #define blk_noretry_request(rq) \
> >>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
> >>                              REQ_FAILFAST_DRIVER))
> >>
> >> making the only _actual_ change in your patch _not_ evaluating the
> >> REQ_FAILFAST_DRIVER, which incidentally is only used by the NVMe core.
> > 
> > No, not sure how you got there. I'd have thought the 5 references to
> > "REQ_FAILFAST_TRANSPORT" would've been sufficient ;)
> > 
> 
> Ah. Misread stuff. You're excluding the REQ_FAILFAST_TRANSPORT here.
> But then it's _actually_ similar to the next patch (which I've also
> commented).
> 
> Wouldn't it be better to fold them into one patch and discuss things
> together; especially as my comment to the next one might actually
> achieve the same thing?

2 discrete things. This patch enables local retry.
Patch 3 allows proper failover via upper layer multipathing.

And as I replied, your suggestion about using DNR doesn't achieve the
same thing (said as much in reply to the patch 3 thread).

Mike
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 540d6fd8ffef..4134cf3c7e48 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -306,7 +306,14 @@  static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
 	if (likely(nvme_req(req)->status == 0))
 		return COMPLETE;
 
-	if (blk_noretry_request(req) ||
+	/*
+	 * REQ_FAILFAST_TRANSPORT is set by upper layer software that
+	 * handles multipathing. Unlike SCSI, NVMe's error handling was
+	 * specifically designed to handle local retry for non-path errors.
+	 * As such, allow NVMe's local retry mechanism to be used for
+	 * requests marked with REQ_FAILFAST_TRANSPORT.
+	 */
+	if ((req->cmd_flags & (REQ_FAILFAST_DEV | REQ_FAILFAST_DRIVER)) ||
 	    (nvme_req(req)->status & NVME_SC_DNR) ||
 	    nvme_req(req)->retries >= nvme_max_retries)
 		return COMPLETE;