diff mbox series

[v11,07/34] scsi: sd: Have scsi-ml retry read_capacity_16 errors

Message ID 20230905231547.83945-8-michael.christie@oracle.com (mailing list archive)
State Changes Requested
Headers show
Series [v11,01/34] scsi: Add helper to prep sense during error handling | expand

Commit Message

Mike Christie Sept. 5, 2023, 11:15 p.m. UTC
This has read_capacity_16 have scsi-ml retry errors instead of driving
them itself.

There are 2 behavior changes with this patch:
1. There is one behavior change where we no longer retry when
scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry
for failures like the queue being removed, and for the case where there
are no tags/reqs since the block layer waits/retries for us. For possible
memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
retrying will probably not help.
2. For the specific UAs we checked for and retried, we would get
READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left
from the loop's retries. Each UA now gets READ_CAPACITY_RETRIES_ON_RESET
reties, and the other errors (not including medium not present) get up
to 3 retries.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 110 ++++++++++++++++++++++++++++++----------------
 1 file changed, 73 insertions(+), 37 deletions(-)

Comments

Martin Wilck Sept. 15, 2023, 8:21 p.m. UTC | #1
On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
> This has read_capacity_16 have scsi-ml retry errors instead of
> driving
> them itself.
> 
> There are 2 behavior changes with this patch:
> 1. There is one behavior change where we no longer retry when
> scsi_execute_cmd returns < 0, but we should be ok. We don't need to
> retry
> for failures like the queue being removed, and for the case where
> there
> are no tags/reqs since the block layer waits/retries for us. For
> possible
> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> retrying will probably not help.
> 2. For the specific UAs we checked for and retried, we would get
> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were
> left
> from the loop's retries. Each UA now gets
> READ_CAPACITY_RETRIES_ON_RESET
> reties, and the other errors (not including medium not present) get
> up
> to 3 retries.

This is ok, but - just a thought - would it make sense to add a field
for maximum total retry count (summed over all failures)? That would
allow us to minimize behavior changes also in other cases.

> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Martin Wilck <mwilck@suse.com>
Martin Wilck Sept. 15, 2023, 9:34 p.m. UTC | #2
On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
> On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
> > This has read_capacity_16 have scsi-ml retry errors instead of
> > driving
> > them itself.
> > 
> > There are 2 behavior changes with this patch:
> > 1. There is one behavior change where we no longer retry when
> > scsi_execute_cmd returns < 0, but we should be ok. We don't need to
> > retry
> > for failures like the queue being removed, and for the case where
> > there
> > are no tags/reqs since the block layer waits/retries for us. For
> > possible
> > memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
> > retrying will probably not help.
> > 2. For the specific UAs we checked for and retried, we would get
> > READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were
> > left
> > from the loop's retries. Each UA now gets
> > READ_CAPACITY_RETRIES_ON_RESET
> > reties, and the other errors (not including medium not present) get
> > up
> > to 3 retries.
> 
> This is ok, but - just a thought - would it make sense to add a field
> for maximum total retry count (summed over all failures)? That would
> allow us to minimize behavior changes also in other cases.

Could we perhaps use scmd->allowed for this purpose?

I noted that several callers of scsi_execute_cmd() in your patch set
set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
But allowed doesn't seem to be used at all in the passthrough case,
so we might as well use it as an upper limit for the total number of
retries.
Mike Christie Sept. 18, 2023, 12:35 a.m. UTC | #3
On 9/15/23 4:34 PM, Martin Wilck wrote:
> On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
>> On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
>>> This has read_capacity_16 have scsi-ml retry errors instead of
>>> driving
>>> them itself.
>>>
>>> There are 2 behavior changes with this patch:
>>> 1. There is one behavior change where we no longer retry when
>>> scsi_execute_cmd returns < 0, but we should be ok. We don't need to
>>> retry
>>> for failures like the queue being removed, and for the case where
>>> there
>>> are no tags/reqs since the block layer waits/retries for us. For
>>> possible
>>> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so
>>> retrying will probably not help.
>>> 2. For the specific UAs we checked for and retried, we would get
>>> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were
>>> left
>>> from the loop's retries. Each UA now gets
>>> READ_CAPACITY_RETRIES_ON_RESET
>>> reties, and the other errors (not including medium not present) get
>>> up
>>> to 3 retries.
>>
>> This is ok, but - just a thought - would it make sense to add a field
>> for maximum total retry count (summed over all failures)? That would
>> allow us to minimize behavior changes also in other cases.
> 
> Could we perhaps use scmd->allowed for this purpose?
> 
> I noted that several callers of scsi_execute_cmd() in your patch set
> set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
> But allowed doesn't seem to be used at all in the passthrough case,

I think scmd->allowed is still used for errors that don't hit the
check_type goto in scsi_no_retry_cmd.

If the user sets up scsi failures for only UAs, and we got a
DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed which
checks scmd->allowed.

In scsi_noretry_cmd we then hit the:

        if (!scsi_status_is_check_condition(scmd->result))
                return false;

and will retry.

> so we might as well use it as an upper limit for the total number of
> retries.
> 

If you really want an overall retry count let me know. I can just add
a total limit to the scsi_failures struct. You have been the only person
to ask about it and it didn't sound like you were sure if you wanted it
or not, so I haven't done it.
Martin Wilck Sept. 18, 2023, 4:48 p.m. UTC | #4
On Sun, 2023-09-17 at 19:35 -0500, Mike Christie wrote:
> On 9/15/23 4:34 PM, Martin Wilck wrote:
> > On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
> > > On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
> > > > This has read_capacity_16 have scsi-ml retry errors instead of
> > > > driving
> > > > them itself.
> > > > 
> > > > There are 2 behavior changes with this patch:
> > > > 1. There is one behavior change where we no longer retry when
> > > > scsi_execute_cmd returns < 0, but we should be ok. We don't
> > > > need to
> > > > retry
> > > > for failures like the queue being removed, and for the case
> > > > where
> > > > there
> > > > are no tags/reqs since the block layer waits/retries for us.
> > > > For
> > > > possible
> > > > memory allocation failures from blk_rq_map_kern we use
> > > > GFP_NOIO, so
> > > > retrying will probably not help.
> > > > 2. For the specific UAs we checked for and retried, we would
> > > > get
> > > > READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries
> > > > were
> > > > left
> > > > from the loop's retries. Each UA now gets
> > > > READ_CAPACITY_RETRIES_ON_RESET
> > > > reties, and the other errors (not including medium not present)
> > > > get
> > > > up
> > > > to 3 retries.
> > > 
> > > This is ok, but - just a thought - would it make sense to add a
> > > field
> > > for maximum total retry count (summed over all failures)? That
> > > would
> > > allow us to minimize behavior changes also in other cases.
> > 
> > Could we perhaps use scmd->allowed for this purpose?
> > 
> > I noted that several callers of scsi_execute_cmd() in your patch
> > set
> > set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
> > But allowed doesn't seem to be used at all in the passthrough case,
> 
> I think scmd->allowed is still used for errors that don't hit the
> check_type goto in scsi_no_retry_cmd.
> 
> If the user sets up scsi failures for only UAs, and we got a
> DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed which
> checks scmd->allowed.
> 
> In scsi_noretry_cmd we then hit the:
> 
>         if (!scsi_status_is_check_condition(scmd->result))
>                 return false;
> 
> and will retry.

Right. But that's pretty confusing. Actually, I am confused about some
other things as well. 

I apologize for taking a step back and asking some more questions and
presenting some more thoughts about your patch set as a whole.

For passthrough commands, the simplified logic is now:

scsi_decide_disposition() 
{
         if (!scsi_device_online)
                return SUCCESS;
         if ((rtn = scsi_check_passthrough(scmd)) != SCSI_RETURN_NOT_HANDLED)
                return rtn;

         /* handle host byte for regular commands, 
            may return SUCESS, NEEDS_RETRY/ADD_TO_MLQUEUE, 
            FAILED, fall through, or jump to maybe_retry */

         /* handle status byte for regular commands, likewise */
          
 maybe_retry: /* [2] */
         if (scsi_cmd_retry_allowed(scmd) && !scsi_noretry_cmd(scmd))
                return NEEDS_RETRY;
         else
                return SUCCESS;
}

where scsi_noretry_cmd() has special treatment for passthrough commands
in certain cases (DID_TIME_OUT and CHECK_CONDITION).

Because you put the call to scsi_check_passthrough() before the
standard checks, some retries that the mid layer used to do will now
not be done if scsi_check_passthrough() returns SUCCESS. Also,
depending on the list of failures passed with the command, we might
miss the clauses in scsi_decide_disposition() where we previously
returned FAILED (I haven't reviewed the current callers, but at least
in theory it can happen). This will obviously change the runtime
behavior, in ways I wouldn't bet can't be detrimental.

Before your patch set, the checks we now do in scsi_check_passthrough()
were only performed in the case where the "regular command checks"
returned SUCCESS. If we want as little behavior change as possible, the
SUCCESS case is where we should plug in the call to
scsi_check_passthrough(). Or am I missing something? [1]

This way we'd preserve the current semantics of "retries" and "allowed"
which used to relate only to what were the "mid-layer retries" before
your patch set.

> > so we might as well use it as an upper limit for the total number of
> > retries.
> > 
> 
> If you really want an overall retry count let me know. I can just add
> a total limit to the scsi_failures struct. You have been the only person
> to ask about it and it didn't sound like you were sure if you wanted it
> or not, so I haven't done it.

The question whether we want an additional limit for the total "failure
retry" count is orthogonal to the above. My personal opinion is that we
do, as almost all call sites that I've seen in your set currently use
just a single retry count for all failures they handle.

I'm not sure whether the separate total retry count would have
practical relevance. I think that in most practical cases we won't have
a mix of standard "ML" retries and multiple different failure cases. I
suppose that in any given situation, it's more likely that there's a
single error condition which repeats.

I'm also unsure whether all this theoretical reasoning of mine warrants
you putting even more effort into this patch set, which has seen 11
revisions already. In general, you have much more experience with SCSI
error handling than I do.

Martin


[1] If we did that, we'd could also take into account that previously
the caller would have called scsi_execute_cmd() again, which would have
reset cmd->retries; so we could reset this field if
scsi_check_passthrough() decides to retry the command.

[2] It's weird, although unrelated to your patch set, that in the
maybe_retry clause we increment cmd->retries before checking
scsi_noretry_cmd().
Mike Christie Sept. 18, 2023, 6:45 p.m. UTC | #5
On 9/18/23 11:48 AM, Martin Wilck wrote:
> On Sun, 2023-09-17 at 19:35 -0500, Mike Christie wrote:
>> On 9/15/23 4:34 PM, Martin Wilck wrote:
>>> On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
>>>> On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
>>>>> This has read_capacity_16 have scsi-ml retry errors instead of
>>>>> driving
>>>>> them itself.
>>>>>
>>>>> There are 2 behavior changes with this patch:
>>>>> 1. There is one behavior change where we no longer retry when
>>>>> scsi_execute_cmd returns < 0, but we should be ok. We don't
>>>>> need to
>>>>> retry
>>>>> for failures like the queue being removed, and for the case
>>>>> where
>>>>> there
>>>>> are no tags/reqs since the block layer waits/retries for us.
>>>>> For
>>>>> possible
>>>>> memory allocation failures from blk_rq_map_kern we use
>>>>> GFP_NOIO, so
>>>>> retrying will probably not help.
>>>>> 2. For the specific UAs we checked for and retried, we would
>>>>> get
>>>>> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries
>>>>> were
>>>>> left
>>>>> from the loop's retries. Each UA now gets
>>>>> READ_CAPACITY_RETRIES_ON_RESET
>>>>> reties, and the other errors (not including medium not present)
>>>>> get
>>>>> up
>>>>> to 3 retries.
>>>>
>>>> This is ok, but - just a thought - would it make sense to add a
>>>> field
>>>> for maximum total retry count (summed over all failures)? That
>>>> would
>>>> allow us to minimize behavior changes also in other cases.
>>>
>>> Could we perhaps use scmd->allowed for this purpose?
>>>
>>> I noted that several callers of scsi_execute_cmd() in your patch
>>> set
>>> set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
>>> But allowed doesn't seem to be used at all in the passthrough case,
>>
>> I think scmd->allowed is still used for errors that don't hit the
>> check_type goto in scsi_no_retry_cmd.
>>
>> If the user sets up scsi failures for only UAs, and we got a
>> DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed which
>> checks scmd->allowed.
>>
>> In scsi_noretry_cmd we then hit the:
>>
>>         if (!scsi_status_is_check_condition(scmd->result))
>>                 return false;
>>
>> and will retry.
> 
> Right. But that's pretty confusing. Actually, I am confused about some
> other things as well. 
> 
> I apologize for taking a step back and asking some more questions and
> presenting some more thoughts about your patch set as a whole.
> 
> For passthrough commands, the simplified logic is now:
> 
> scsi_decide_disposition() 
> {
>          if (!scsi_device_online)
>                 return SUCCESS;
>          if ((rtn = scsi_check_passthrough(scmd)) != SCSI_RETURN_NOT_HANDLED)
>                 return rtn;
> 
>          /* handle host byte for regular commands, 
>             may return SUCESS, NEEDS_RETRY/ADD_TO_MLQUEUE, 
>             FAILED, fall through, or jump to maybe_retry */
> 
>          /* handle status byte for regular commands, likewise */
>           
>  maybe_retry: /* [2] */
>          if (scsi_cmd_retry_allowed(scmd) && !scsi_noretry_cmd(scmd))
>                 return NEEDS_RETRY;
>          else
>                 return SUCCESS;
> }
> 
> where scsi_noretry_cmd() has special treatment for passthrough commands
> in certain cases (DID_TIME_OUT and CHECK_CONDITION).
> 
> Because you put the call to scsi_check_passthrough() before the
> standard checks, some retries that the mid layer used to do will now
> not be done if scsi_check_passthrough() returns SUCCESS. Also,

Yeah, I did that on purpose to give scsi_execute_cmd more control over
whether to retry or not. I think you are looking at this more like
we want to be able to retry when scsi-ml decided not to.

For example, I was thinking multipath related code like the scsi_dh handlers
would want to fail for cases scsi-ml was currently retrying. Right now they
are setting REQ_FAILFAST_TRANSPORT but for most drivers that's not doing what
they think it does. Only DID_BUS_BUSY fast fails and I think the scsi_dh
failover code wants other errors like DID_TRANSPORT_DISRUPTED to fail so the
caller can decide based on something like the dm-multipath pg retries.


> depending on the list of failures passed with the command, we might
> miss the clauses in scsi_decide_disposition() where we previously
> returned FAILED (I haven't reviewed the current callers, but at least
> in theory it can happen). This will obviously change the runtime
> behavior, in ways I wouldn't bet can't be detrimental.


I think it's reverse of what you are thinking for the FAILED case
but I agree it's wrong (wrong but for different reasons).

If scsi_decide_disposition returns FAILED then runtime is bad, because
the scsi eh will start and then we have to wait for that.

The scsi_execute_cmd user can now actually bypass the EH for FAILED
so runtime will be shorter. However, I agree that it's wrong we can
bypass the EH because in some cases we need to kick the device or
cleanup cmds. So that should be fixed for sure and we should always
start the EH and go through that code path.


> 
> Before your patch set, the checks we now do in scsi_check_passthrough()
> were only performed in the case where the "regular command checks"
> returned SUCCESS. If we want as little behavior change as possible, the
> SUCCESS case is where we should plug in the call to
> scsi_check_passthrough(). Or am I missing something? [1]
> 
> This way we'd preserve the current semantics of "retries" and "allowed"
> which used to relate only to what were the "mid-layer retries" before
> your patch set.

It looks like we had different priorities. I was trying to allow
scsi_execute_cmd to be able to override scsi-ml retries, and not just allow
us to retry if scsi-ml decided not to.

Given I messed up on the FAILED case above, I agree doing the less
risky approach is better now. We can change it for multipath some other
day.


> 
>>> so we might as well use it as an upper limit for the total number of
>>> retries.
>>>
>>
>> If you really want an overall retry count let me know. I can just add
>> a total limit to the scsi_failures struct. You have been the only person
>> to ask about it and it didn't sound like you were sure if you wanted it
>> or not, so I haven't done it.
> 
> The question whether we want an additional limit for the total "failure
> retry" count is orthogonal to the above. My personal opinion is that we
> do, as almost all call sites that I've seen in your set currently use
> just a single retry count for all failures they handle.
> 
> I'm not sure whether the separate total retry count would have
> practical relevance. I think that in most practical cases we won't have
> a mix of standard "ML" retries and multiple different failure cases. I
> suppose that in any given situation, it's more likely that there's a
> single error condition which repeats.

I'm not sure what you are saying in the 2 paragraphs above.

We have:

1. scsi_execute_cmd users like read_capacity_10 which will retry the
device reset UA 10 times. Then it can retry that error 3 more time
(this was probably not intentional so can be ignored) and it can retry
any error other error except medium not present 3 times.

And then it calls scsi_execute_cmd with sdkp->max_retries so the scsi-ml
retried are whatever the user requested and 5 by default.

I think we wanted to keep this behavior.

2.  Then, for the initial device setup/discovery tests where the transport is
flakey during device discovery/setup, we can hit the DID_TRANSPORT_DISRUPTED
that multiple times, then we will get UAs after we relogin/reset the
transport/device. So I think we want to keep the behavior where a user
does

+       struct scsi_failure failures[] = {
+               {
+                       .sense = UNIT_ATTENTION,
+                       .asc = SCMD_FAILURE_ASC_ANY,
+                       .ascq = SCMD_FAILURE_ASCQ_ANY,
+                       .allowed = UA_RETRY_COUNT,
+                       .result = SAM_STAT_CHECK_CONDITION,
+               },
+               {}
+       };


scsi_execute_cmd(.... $NORMAL_CMD_RETRIES)

then we can retry transport errors NORMAL_CMD_RETRIES then once those
settle still retry UAs UA_RETRY_COUNT times.

3. Then we have the cases where the scsi_execute_cmd user retried
multiple errors and in this patchset we used to retry a total of N
times, but now can retry each error up to N times. For this it sounds
like you want to code it so we only do a total of N times so the
behavior is the same as before.

To handle all these I think I need the extra allowed field on the
scsi_failures.
Martin Wilck Sept. 19, 2023, 9:07 a.m. UTC | #6
On Mon, 2023-09-18 at 13:45 -0500, Mike Christie wrote:
> On 9/18/23 11:48 AM, Martin Wilck wrote:
> > On Sun, 2023-09-17 at 19:35 -0500, Mike Christie wrote:
> > > On 9/15/23 4:34 PM, Martin Wilck wrote:
> > > > On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
> > > > > On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
> > > > > > This has read_capacity_16 have scsi-ml retry errors instead
> > > > > > of
> > > > > > driving
> > > > > > them itself.
> > > > > > 
> > > > > > There are 2 behavior changes with this patch:
> > > > > > 1. There is one behavior change where we no longer retry
> > > > > > when
> > > > > > scsi_execute_cmd returns < 0, but we should be ok. We don't
> > > > > > need to
> > > > > > retry
> > > > > > for failures like the queue being removed, and for the case
> > > > > > where
> > > > > > there
> > > > > > are no tags/reqs since the block layer waits/retries for
> > > > > > us.
> > > > > > For
> > > > > > possible
> > > > > > memory allocation failures from blk_rq_map_kern we use
> > > > > > GFP_NOIO, so
> > > > > > retrying will probably not help.
> > > > > > 2. For the specific UAs we checked for and retried, we
> > > > > > would
> > > > > > get
> > > > > > READ_CAPACITY_RETRIES_ON_RESET retries plus whatever
> > > > > > retries
> > > > > > were
> > > > > > left
> > > > > > from the loop's retries. Each UA now gets
> > > > > > READ_CAPACITY_RETRIES_ON_RESET
> > > > > > reties, and the other errors (not including medium not
> > > > > > present)
> > > > > > get
> > > > > > up
> > > > > > to 3 retries.
> > > > > 
> > > > > This is ok, but - just a thought - would it make sense to add
> > > > > a
> > > > > field
> > > > > for maximum total retry count (summed over all failures)?
> > > > > That
> > > > > would
> > > > > allow us to minimize behavior changes also in other cases.
> > > > 
> > > > Could we perhaps use scmd->allowed for this purpose?
> > > > 
> > > > I noted that several callers of scsi_execute_cmd() in your
> > > > patch
> > > > set
> > > > set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
> > > > But allowed doesn't seem to be used at all in the passthrough
> > > > case,
> > > 
> > > I think scmd->allowed is still used for errors that don't hit the
> > > check_type goto in scsi_no_retry_cmd.
> > > 
> > > If the user sets up scsi failures for only UAs, and we got a
> > > DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed
> > > which
> > > checks scmd->allowed.
> > > 
> > > In scsi_noretry_cmd we then hit the:
> > > 
> > >         if (!scsi_status_is_check_condition(scmd->result))
> > >                 return false;
> > > 
> > > and will retry.
> > 
> > Right. But that's pretty confusing. Actually, I am confused about
> > some
> > other things as well. 
> > 
> > I apologize for taking a step back and asking some more questions
> > and
> > presenting some more thoughts about your patch set as a whole.
> > 
> > For passthrough commands, the simplified logic is now:
> > 
> > scsi_decide_disposition() 
> > {
> >          if (!scsi_device_online)
> >                 return SUCCESS;
> >          if ((rtn = scsi_check_passthrough(scmd)) !=
> > SCSI_RETURN_NOT_HANDLED)
> >                 return rtn;
> > 
> >          /* handle host byte for regular commands, 
> >             may return SUCESS, NEEDS_RETRY/ADD_TO_MLQUEUE, 
> >             FAILED, fall through, or jump to maybe_retry */
> > 
> >          /* handle status byte for regular commands, likewise */
> >           
> >  maybe_retry: /* [2] */
> >          if (scsi_cmd_retry_allowed(scmd) &&
> > !scsi_noretry_cmd(scmd))
> >                 return NEEDS_RETRY;
> >          else
> >                 return SUCCESS;
> > }
> > 
> > where scsi_noretry_cmd() has special treatment for passthrough
> > commands
> > in certain cases (DID_TIME_OUT and CHECK_CONDITION).
> > 
> > Because you put the call to scsi_check_passthrough() before the
> > standard checks, some retries that the mid layer used to do will
> > now
> > not be done if scsi_check_passthrough() returns SUCCESS. Also,
> 
> Yeah, I did that on purpose to give scsi_execute_cmd more control
> over
> whether to retry or not. I think you are looking at this more like
> we want to be able to retry when scsi-ml decided not to.

I am not saying that giving the failure checks more control is wrong; I
lack experience to judge that. I was just pointing out a change in
behavior that wasn't obvious to me from the outset. IOW, I'd appreciate
a clear separation between the  formal change that moves failure
treatment for passthrough commands into the mid layer, and the semantic
changes regarding the ordering and precedence of the various cases in
scsi_decide_disposition().

> For example, I was thinking multipath related code like the scsi_dh
> handlers
> would want to fail for cases scsi-ml was currently retrying. Right
> now they
> are setting REQ_FAILFAST_TRANSPORT but for most drivers that's not
> doing what
> they think it does. Only DID_BUS_BUSY fast fails and I think the
> scsi_dh
> failover code wants other errors like DID_TRANSPORT_DISRUPTED to fail
> so the
> caller can decide based on something like the dm-multipath pg
> retries.

This makes sense for device handlers, but not so much for other
callers. Do we need to discuss a separate approach for commands sent by
device handlers?

Current device handlers have been written with the standard ML retry
handling in mind. The device handler changes in your patch set apply to
special CHECK CONDITION situations. These checks are executed before
the host byte checks in scsi_decide_disposition(). It's not obvious
from the code that none of these conditions can trigger if the host
byte is set [1], which would seem wrong. Perhaps we need separate
scsi_check_passthrough() handlers for host byte and status byte?

> > depending on the list of failures passed with the command, we might
> > miss the clauses in scsi_decide_disposition() where we previously
> > returned FAILED (I haven't reviewed the current callers, but at
> > least
> > in theory it can happen). This will obviously change the runtime
> > behavior, in ways I wouldn't bet can't be detrimental.
> 
> 
> I think it's reverse of what you are thinking for the FAILED case
> but I agree it's wrong (wrong but for different reasons).
> 
> If scsi_decide_disposition returns FAILED then runtime is bad,
> because
> the scsi eh will start and then we have to wait for that.
> 
> The scsi_execute_cmd user can now actually bypass the EH for FAILED
> so runtime will be shorter. However, I agree that it's wrong we can
> bypass the EH because in some cases we need to kick the device or
> cleanup cmds. So that should be fixed for sure and we should always
> start the EH and go through that code path.

Thanks for clarifying. I think I actually meant this, but my mind was
somewhat shady :-)

> > 
> > Before your patch set, the checks we now do in
> > scsi_check_passthrough()
> > were only performed in the case where the "regular command checks"
> > returned SUCCESS. If we want as little behavior change as possible,
> > the
> > SUCCESS case is where we should plug in the call to
> > scsi_check_passthrough(). Or am I missing something? [1]
> > 
> > This way we'd preserve the current semantics of "retries" and
> > "allowed"
> > which used to relate only to what were the "mid-layer retries"
> > before
> > your patch set.
> 
> It looks like we had different priorities. I was trying to allow
> scsi_execute_cmd to be able to override scsi-ml retries, and not just
> allow
> us to retry if scsi-ml decided not to.

As I said above, I have nothing in general against this approach.
I would just like to separate it logically from the part of the patch
set that simply moves code from HL to ML. And obviously, changes in the
logic of scsi_decide_disposition() need in-depth review, if possible by
better qualified people than myself. I've stared at this code a lot of
times, but I wouldn't say I understand the entire decision tree.

> Given I messed up on the FAILED case above, I agree doing the less
> risky approach is better now. We can change it for multipath some
> other
> day.

AFAICT, dm-multipath + scsi device handlers are currently doing a
decent job. If we want improvements in this area, we should cover ALUA,
too, which would probably require a different approach, as the ALUA
handler's logic in alua_rtpg() can't be easily mapped to a "struct
failure" array.

> 
> > 
> > > > so we might as well use it as an upper limit for the total
> > > > number of
> > > > retries.
> > > > 
> > > 
> > > If you really want an overall retry count let me know. I can just
> > > add
> > > a total limit to the scsi_failures struct. You have been the only
> > > person
> > > to ask about it and it didn't sound like you were sure if you
> > > wanted it
> > > or not, so I haven't done it.
> > 
> > The question whether we want an additional limit for the total
> > "failure
> > retry" count is orthogonal to the above. My personal opinion is
> > that we
> > do, as almost all call sites that I've seen in your set currently
> > use
> > just a single retry count for all failures they handle.
> > 
> > I'm not sure whether the separate total retry count would have
> > practical relevance. I think that in most practical cases we won't
> > have
> > a mix of standard "ML" retries and multiple different failure
> > cases. I
> > suppose that in any given situation, it's more likely that there's
> > a
> > single error condition which repeats.
> 
> I'm not sure what you are saying in the 2 paragraphs above.
> 
> We have:
> 
> 1. scsi_execute_cmd users like read_capacity_10 which will retry the
> device reset UA 10 times. Then it can retry that error 3 more time
> (this was probably not intentional so can be ignored) and it can
> retry
> any error other error except medium not present 3 times.
> 
> And then it calls scsi_execute_cmd with sdkp->max_retries so the
> scsi-ml
> retried are whatever the user requested and 5 by default.
> 
> I think we wanted to keep this behavior.

I don't object. In this case, if we had a "total failure retries"
field, it would be set to infinity (or 13).

When I talked about a 'mix of standard "ML" retries and multiple
different failure cases' and its practical relevance, I had some weird
hypothetical devices in mind. Ignore it if you wish, or look at [2].

> 2.  Then, for the initial device setup/discovery tests where the
> transport is
> flakey during device discovery/setup, we can hit the
> DID_TRANSPORT_DISRUPTED
> that multiple times, then we will get UAs after we relogin/reset the
> transport/device. So I think we want to keep the behavior where a
> user
> does
> 
> +       struct scsi_failure failures[] = {
> +               {
> +                       .sense = UNIT_ATTENTION,
> +                       .asc = SCMD_FAILURE_ASC_ANY,
> +                       .ascq = SCMD_FAILURE_ASCQ_ANY,
> +                       .allowed = UA_RETRY_COUNT,
> +                       .result = SAM_STAT_CHECK_CONDITION,
> +               },
> +               {}
> +       };
> 
> 
> scsi_execute_cmd(.... $NORMAL_CMD_RETRIES)
> 
> then we can retry transport errors NORMAL_CMD_RETRIES then once those
> settle still retry UAs UA_RETRY_COUNT times.

Yes, sure. I already agreed that we still need the standard "allowed"
field in its current form, and that we can't re-use it for failures.
In this case, with just one element in the failures array, a total
failure count wouldn't matter at all.
> 
> 3. Then we have the cases where the scsi_execute_cmd user retried
> multiple errors and in this patchset we used to retry a total of N
> times, but now can retry each error up to N times. For this it sounds
> like you want to code it so we only do a total of N times so the
> behavior is the same as before.
> 
> To handle all these I think I need the extra allowed field on the
> scsi_failures.
> 

Right. My impression was that some callers wanted to limit the overall
number of retries. But it's difficult to say what the original
intention of the authors was, and how much thought they put into it in
the first place.

Martin

[1] I think that if the host byte is set we never have a CC status and
valid sense data. But the host byte is ultimately up to drivers, so
this is not obvious from the code, and hard to prove.

[2] My thinking was as follows: the e.g. for read_capacity_10(), the
SCSI result can be a) CC/UA/3a/00 (device reset, which read_capacity_10
will retry 10x), or b) some code that ML retries (e.g. CC/UA/04/01), or
c) anything else, which s_r_10 will retry 3x. Case b) used to be
retried up to 3*sdkp->max_retries times, while with your patch it will
only be retried sdkp->max_retries 1times. 

It gets more confusing if you consider a device that would return these
error codes in an alternating fashion. Consider a device that returns
CC/UA/04/01, CC/UA/3a/00, CC/UA/04/01, CC/UA/3a/00,...
Previously, the CC/UA/04/01 would be hidden in the ML, because the ML
retry count would be reset with every call to scsi_execute_cmd(), and
CC/UA/3a/00 would be repeated 10x. Now, CC/UA/04/01 will use up retries
(first those from your SCMD_FAILURE_RESULT_ANY case, then the ML
retries) and eventually fail before the 10 retries for the device reset
case are exhausted. You can imagine out even more tricky scenarios.

This kind of weirdness is what I meant with 'a mix of standard "ML"
retries and multiple different failure cases'. Such cases are treated
differently now than they used to be. I wanted to point that out, and
at the same time I wanted to say that I doubt this has practical
relevance. *Real* devices will return CC/UA/3a/00 a couple of times in
a row, and then some other error, like you described.
Mike Christie Sept. 19, 2023, 6:02 p.m. UTC | #7
On 9/19/23 4:07 AM, Martin Wilck wrote:
> On Mon, 2023-09-18 at 13:45 -0500, Mike Christie wrote:
>> On 9/18/23 11:48 AM, Martin Wilck wrote:
>>> On Sun, 2023-09-17 at 19:35 -0500, Mike Christie wrote:
>>>> On 9/15/23 4:34 PM, Martin Wilck wrote:
>>>>> On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote:
>>>>>> On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote:
>>>>>>> This has read_capacity_16 have scsi-ml retry errors instead
>>>>>>> of
>>>>>>> driving
>>>>>>> them itself.
>>>>>>>
>>>>>>> There are 2 behavior changes with this patch:
>>>>>>> 1. There is one behavior change where we no longer retry
>>>>>>> when
>>>>>>> scsi_execute_cmd returns < 0, but we should be ok. We don't
>>>>>>> need to
>>>>>>> retry
>>>>>>> for failures like the queue being removed, and for the case
>>>>>>> where
>>>>>>> there
>>>>>>> are no tags/reqs since the block layer waits/retries for
>>>>>>> us.
>>>>>>> For
>>>>>>> possible
>>>>>>> memory allocation failures from blk_rq_map_kern we use
>>>>>>> GFP_NOIO, so
>>>>>>> retrying will probably not help.
>>>>>>> 2. For the specific UAs we checked for and retried, we
>>>>>>> would
>>>>>>> get
>>>>>>> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever
>>>>>>> retries
>>>>>>> were
>>>>>>> left
>>>>>>> from the loop's retries. Each UA now gets
>>>>>>> READ_CAPACITY_RETRIES_ON_RESET
>>>>>>> reties, and the other errors (not including medium not
>>>>>>> present)
>>>>>>> get
>>>>>>> up
>>>>>>> to 3 retries.
>>>>>>
>>>>>> This is ok, but - just a thought - would it make sense to add
>>>>>> a
>>>>>> field
>>>>>> for maximum total retry count (summed over all failures)?
>>>>>> That
>>>>>> would
>>>>>> allow us to minimize behavior changes also in other cases.
>>>>>
>>>>> Could we perhaps use scmd->allowed for this purpose?
>>>>>
>>>>> I noted that several callers of scsi_execute_cmd() in your
>>>>> patch
>>>>> set
>>>>> set the allowed parameter, e.g. to sdkp->max_retries in 07/34.
>>>>> But allowed doesn't seem to be used at all in the passthrough
>>>>> case,
>>>>
>>>> I think scmd->allowed is still used for errors that don't hit the
>>>> check_type goto in scsi_no_retry_cmd.
>>>>
>>>> If the user sets up scsi failures for only UAs, and we got a
>>>> DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed
>>>> which
>>>> checks scmd->allowed.
>>>>
>>>> In scsi_noretry_cmd we then hit the:
>>>>
>>>>         if (!scsi_status_is_check_condition(scmd->result))
>>>>                 return false;
>>>>
>>>> and will retry.
>>>
>>> Right. But that's pretty confusing. Actually, I am confused about
>>> some
>>> other things as well. 
>>>
>>> I apologize for taking a step back and asking some more questions
>>> and
>>> presenting some more thoughts about your patch set as a whole.
>>>
>>> For passthrough commands, the simplified logic is now:
>>>
>>> scsi_decide_disposition() 
>>> {
>>>          if (!scsi_device_online)
>>>                 return SUCCESS;
>>>          if ((rtn = scsi_check_passthrough(scmd)) !=
>>> SCSI_RETURN_NOT_HANDLED)
>>>                 return rtn;
>>>
>>>          /* handle host byte for regular commands, 
>>>             may return SUCESS, NEEDS_RETRY/ADD_TO_MLQUEUE, 
>>>             FAILED, fall through, or jump to maybe_retry */
>>>
>>>          /* handle status byte for regular commands, likewise */
>>>           
>>>  maybe_retry: /* [2] */
>>>          if (scsi_cmd_retry_allowed(scmd) &&
>>> !scsi_noretry_cmd(scmd))
>>>                 return NEEDS_RETRY;
>>>          else
>>>                 return SUCCESS;
>>> }
>>>
>>> where scsi_noretry_cmd() has special treatment for passthrough
>>> commands
>>> in certain cases (DID_TIME_OUT and CHECK_CONDITION).
>>>
>>> Because you put the call to scsi_check_passthrough() before the
>>> standard checks, some retries that the mid layer used to do will
>>> now
>>> not be done if scsi_check_passthrough() returns SUCCESS. Also,
>>
>> Yeah, I did that on purpose to give scsi_execute_cmd more control
>> over
>> whether to retry or not. I think you are looking at this more like
>> we want to be able to retry when scsi-ml decided not to.
> 
> I am not saying that giving the failure checks more control is wrong; I
> lack experience to judge that. I was just pointing out a change in
> behavior that wasn't obvious to me from the outset. IOW, I'd appreciate
> a clear separation between the  formal change that moves failure
> treatment for passthrough commands into the mid layer, and the semantic
> changes regarding the ordering and precedence of the various cases in
> scsi_decide_disposition().

Yeah, agree now. The email is a mess because I wrote that below. I
should have written that here to save you time :)


> 
>> For example, I was thinking multipath related code like the scsi_dh
>> handlers
>> would want to fail for cases scsi-ml was currently retrying. Right
>> now they
>> are setting REQ_FAILFAST_TRANSPORT but for most drivers that's not
>> doing what
>> they think it does. Only DID_BUS_BUSY fast fails and I think the
>> scsi_dh
>> failover code wants other errors like DID_TRANSPORT_DISRUPTED to fail
>> so the
>> caller can decide based on something like the dm-multipath pg
>> retries.
> 
> This makes sense for device handlers, but not so much for other
> callers. Do we need to discuss a separate approach for commands sent by
> device handlers?
> 
> Current device handlers have been written with the standard ML retry
> handling in mind. The device handler changes in your patch set apply to
> special CHECK CONDITION situations. These checks are executed before
> the host byte checks in scsi_decide_disposition(). It's not obvious
> from the code that none of these conditions can trigger if the host
> byte is set [1], which would seem wrong. Perhaps we need separate
> scsi_check_passthrough() handlers for host byte and status byte?

I think we are talking about different things. I can do what you want
first then go for the behavior change later. We can discuss that later.
The second/other patchset will be smaller so it will be easier to
discuss.


Snip the chunk below because I agree with you.

And for the total retry issue, since I'm going to try and keep the
existing behavior as much as possible, I'll add code for that. It
will be easier to discuss code then.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 70178c1f3f8e..69188b13f5e2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2363,55 +2363,91 @@  static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
-	unsigned char cmd[16];
-	struct scsi_sense_hdr sshdr;
-	const struct scsi_exec_args exec_args = {
-		.sshdr = &sshdr,
+	static const u8 cmd[16] = {
+		[0] = SERVICE_ACTION_IN_16,
+		[1] = SAI_READ_CAPACITY_16,
+		[13] = RC16_LEN,
 	};
+	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
+	struct scsi_failure failures[] = {
+		/*
+		 * Fail immediately for Invalid Command Operation Code or
+		 * Invalid Field in CDB.
+		 */
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x20,
+			.allowed = 0,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = ILLEGAL_REQUEST,
+			.asc = 0x24,
+			.allowed = 0,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Fail immediately for Medium Not Present */
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x3A,
+			.allowed = 0,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = NOT_READY,
+			.asc = 0x3A,
+			.ascq = 0x0,
+			.allowed = 0,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = 0x29,
+			/* Device reset might occur several times */
+			.allowed = READ_CAPACITY_RETRIES_ON_RESET,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		/* Any other error not listed above retry */
+		{
+			.result = SCMD_FAILURE_RESULT_ANY,
+			.allowed = 3,
+		},
+		{}
+	};
+	const struct scsi_exec_args exec_args = {
+		.sshdr = &sshdr,
+		.failures = failures,
+	};
 
 	if (sdp->no_read_capacity_16)
 		return -EINVAL;
 
-	do {
-		memset(cmd, 0, 16);
-		cmd[0] = SERVICE_ACTION_IN_16;
-		cmd[1] = SAI_READ_CAPACITY_16;
-		cmd[13] = RC16_LEN;
-		memset(buffer, 0, RC16_LEN);
-
-		the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN,
-					      buffer, RC16_LEN, SD_TIMEOUT,
-					      sdkp->max_retries, &exec_args);
-		if (the_result > 0) {
-			if (media_not_present(sdkp, &sshdr))
-				return -ENODEV;
+	memset(buffer, 0, RC16_LEN);
 
-			sense_valid = scsi_sense_valid(&sshdr);
-			if (sense_valid &&
-			    sshdr.sense_key == ILLEGAL_REQUEST &&
-			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-			    sshdr.ascq == 0x00)
-				/* Invalid Command Operation Code or
-				 * Invalid Field in CDB, just retry
-				 * silently with RC10 */
-				return -EINVAL;
-			if (sense_valid &&
-			    sshdr.sense_key == UNIT_ATTENTION &&
-			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
-				/* Device reset might occur several times,
-				 * give it one more chance */
-				if (--reset_retries > 0)
-					continue;
-		}
-		retries--;
+	the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer,
+				      RC16_LEN, SD_TIMEOUT, sdkp->max_retries,
+				      &exec_args);
 
-	} while (the_result && retries);
+	if (the_result > 0) {
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		sense_valid = scsi_sense_valid(&sshdr);
+		if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+		     sshdr.ascq == 0x00) {
+			/*
+			 * Invalid Command Operation Code or Invalid Field in
+			 * CDB, just retry silently with RC10
+			 */
+			return -EINVAL;
+		}
+	}
 
 	if (the_result) {
 		sd_print_result(sdkp, "Read Capacity(16) failed", the_result);