Message ID | 1476384477-3060-1-git-send-email-krisman@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Thu, 2016-10-13 at 15:47 -0300, Gabriel Krisman Bertazi wrote: > Hi James, > > Thanks for your review. Please see the v2 below. > > > OK, so really this isn't what you want, because blk_execute_req may > > have used several of your retries, so you now get a maximum > > possible set of retries at UNIT_ATTENTION_RETRIES*retries. You > > need to start from the returned req->retries, which probably means > > this loop needs to be inside __scsi_execute. > > Hmm, I was aware of that, but I saw there were other places that may > have run retries^2 times, like scsi_test_unit_ready and > scsi_mode_sense, if I read the code correctly. There's a lot of this type of retry multiplication, but we really do need to reduce it, not increase it. > But, I see your point and I fixed it on v2. I also updated the > second patch to rework these cases. > > Another thing that got me confused is where the blk layer updates > req->retries. > > What do you think about the v2 below? > > Thanks, > > -- >8 -- > > Usually, re-sending the SCSI command is enough to recover from a Unit > Attention (UA). This adds a generic retry code to the SCSI command > path in case of an UA, before giving up and returning the error > condition to the caller. > > I added the UA verification into scsi_execute instead of > scsi_execute_req because there are at least a few callers that invoke > scsi_execute directly and would benefit from the internal UA retry. > Also, I didn't use scsi_normalize_sense to not duplicate > functionality with scsi_execute_req_flags. Instead, scsi_execute > uses a small helper function that verifies only the UA condition > directly from the raw sense buffer. If this design is not OK, I can > refactor to use scsi_normalize_sense. > > This prevents us from duplicating the retry code in at least a few > places. In particular, it fixes an issue found in some IBM > enclosures, in which the device may return an Unit Attention during > probe, breaking the bind with the ses module: > > scsi 1:0:7:0: Failed to get diagnostic page 0x8000002 > scsi 1:0:7:0: Failed to bind enclosure -19 > > Link: https://patchwork.kernel.org/patch/9336763/ > Suggested-by: Brian King <brking@linux.vnet.ibm.com> > Suggested-by: James Bottomley <jejb@linux.vnet.ibm.com> > Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> > --- > drivers/scsi/scsi_lib.c | 27 ++++++++++++++++++++++++--- > include/scsi/scsi_common.h | 9 +++++++++ > 2 files changed, 33 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index c71344aebdbb..9c6623abf120 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -187,15 +187,24 @@ int scsi_execute(struct scsi_device *sdev, > const unsigned char *cmd, > struct request *req; > int write = (data_direction == DMA_TO_DEVICE); > int ret = DRIVER_ERROR << 24; > + bool priv_sense = false; > > + if (!sense) { > + sense = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_NOIO); > + if (!sense) > + return ret; > + priv_sense = true; > + } I think here we might be able to stack allocate this and avoid a potential error return under memory pressure. The annoying thing is that blk_execute_rq does exactly this, so now we have an additional useless allocation bloating the stack, but I think it's worth it to get rid of priv_sense and all the error legs. > + retry: > req = blk_get_request(sdev->request_queue, write, > __GFP_RECLAIM); > if (IS_ERR(req)) > - return ret; > + goto free_sense; > blk_rq_set_block_pc(req); > > if (bufflen && blk_rq_map_kern(sdev->request_queue, > req, > buffer, bufflen, > __GFP_RECLAIM)) > - goto out; > + goto put_req; > > req->cmd_len = COMMAND_SIZE(cmd[0]); > memcpy(req->cmd, cmd, req->cmd_len); > @@ -210,6 +219,13 @@ int scsi_execute(struct scsi_device *sdev, const > unsigned char *cmd, > */ > blk_execute_rq(req->q, NULL, req, 1); > > + if (scsi_sense_unit_attention(sense) && req->retries > 0) { > + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); > + retries = req->retries - 1; > + blk_put_request(req); > + goto retry; > + } OK, so this is more theory, but I think you can actually reuse the same request to go around this loop without doing a get/put. I've cc'd Jens to confirm, since no other driver I can find does this, but if it's legal, it saves freeing and reallocating the request. You can then replace the goto with a do { } while (...) which makes the loop obvious to the next person looking at this. > /* > * Some devices (USB mass-storage in particular) may > transfer > * garbage data together with a residue indicating that the > data > @@ -222,9 +238,14 @@ int scsi_execute(struct scsi_device *sdev, const > unsigned char *cmd, > if (resid) > *resid = req->resid_len; > ret = req->errors; > - out: > + > + put_req: > blk_put_request(req); > > + free_sense: > + if (priv_sense) > + kfree(sense); > + > return ret; > } > EXPORT_SYMBOL(scsi_execute); > diff --git a/include/scsi/scsi_common.h b/include/scsi/scsi_common.h > index 20bf7eaef05a..747b632d5b57 100644 > --- a/include/scsi/scsi_common.h > +++ b/include/scsi/scsi_common.h > @@ -58,6 +58,15 @@ static inline bool scsi_sense_valid(const struct > scsi_sense_hdr *sshdr) > return (sshdr->response_code & 0x70) == 0x70; > } > > +static inline bool scsi_sense_unit_attention(const char *sense) > +{ > + int resp = sense[0] & 0x7f; > + > + return ((resp & 0x70) && > + ((resp >= 0x72 && (sense[1] & 0xf) == > UNIT_ATTENTION) || > + (resp < 0x72 && (sense[2] & 0xf) == > UNIT_ATTENTION))); > +} I think also, to emphasise the locality of this function, just put it above scsi_execute inside scsi_lib.c ... that way no-one can proliferate its use. Other than the above, this looks much better. Thanks, James > extern bool scsi_normalize_sense(const u8 *sense_buffer, int sb_len, > struct scsi_sense_hdr *sshdr); > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley <jejb@linux.vnet.ibm.com> writes: > On Thu, 2016-10-13 at 15:47 -0300, Gabriel Krisman Bertazi wrote: >> @@ -210,6 +219,13 @@ int scsi_execute(struct scsi_device *sdev, const >> unsigned char *cmd, >> */ >> blk_execute_rq(req->q, NULL, req, 1); >> >> + if (scsi_sense_unit_attention(sense) && req->retries > 0) { >> + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); >> + retries = req->retries - 1; >> + blk_put_request(req); >> + goto retry; >> + } > > OK, so this is more theory, but I think you can actually reuse the same > request to go around this loop without doing a get/put. I've cc'd Jens > to confirm, since no other driver I can find does this, but if it's > legal, it saves freeing and reallocating the request. You can then > replace the goto with a do { } while (...) which makes the loop obvious > to the next person looking at this. Hi James, I don't think the block layer currently has the machinery to reuse the request. I think it would be easy to add for the MQ case but I don't know about SQ. If we don't clean up or reinit the request before re-sending, we'll hit the BUG_ON in blk_start_request: BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); Do you wanna take a v3 of the patch and fix this on a future patch, or should I be looking into patching the block layer interface? I'll be looking into it, but I need to get familiar with the SQ code first.
On 10/17/2016 11:17 AM, Gabriel Krisman Bertazi wrote: > James Bottomley <jejb@linux.vnet.ibm.com> writes: > >> On Thu, 2016-10-13 at 15:47 -0300, Gabriel Krisman Bertazi wrote: >>> @@ -210,6 +219,13 @@ int scsi_execute(struct scsi_device *sdev, const >>> unsigned char *cmd, >>> */ >>> blk_execute_rq(req->q, NULL, req, 1); >>> >>> + if (scsi_sense_unit_attention(sense) && req->retries > 0) { >>> + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); >>> + retries = req->retries - 1; >>> + blk_put_request(req); >>> + goto retry; >>> + } >> >> OK, so this is more theory, but I think you can actually reuse the same >> request to go around this loop without doing a get/put. I've cc'd Jens >> to confirm, since no other driver I can find does this, but if it's >> legal, it saves freeing and reallocating the request. You can then >> replace the goto with a do { } while (...) which makes the loop obvious >> to the next person looking at this. > > Hi James, > > I don't think the block layer currently has the machinery to reuse the > request. I think it would be easy to add for the MQ case but I > don't know about SQ. If we don't clean up or reinit the request before > re-sending, we'll hit the BUG_ON in blk_start_request: > > BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); > > Do you wanna take a v3 of the patch and fix this on a future patch, or > should I be looking into patching the block layer interface? I'll be > looking into it, but I need to get familiar with the SQ code first. Would it make more sense to try to stick this retry logic in scsi_error.c in check_sense? Or do we not have enough information there to decide whether or not to retry 06/29? -Brian
On Mon, 2016-10-17 at 14:17 -0200, Gabriel Krisman Bertazi wrote: > James Bottomley <jejb@linux.vnet.ibm.com> writes: > > > On Thu, 2016-10-13 at 15:47 -0300, Gabriel Krisman Bertazi wrote: > > > @@ -210,6 +219,13 @@ int scsi_execute(struct scsi_device *sdev, > > > const > > > unsigned char *cmd, > > > */ > > > blk_execute_rq(req->q, NULL, req, 1); > > > > > > + if (scsi_sense_unit_attention(sense) && req->retries > > > > 0) { > > > + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); > > > + retries = req->retries - 1; > > > + blk_put_request(req); > > > + goto retry; > > > + } > > > > OK, so this is more theory, but I think you can actually reuse the > > same > > request to go around this loop without doing a get/put. I've cc'd > > Jens > > to confirm, since no other driver I can find does this, but if it's > > legal, it saves freeing and reallocating the request. You can then > > replace the goto with a do { } while (...) which makes the loop > > obvious > > to the next person looking at this. > > Hi James, > > I don't think the block layer currently has the machinery to reuse > the request. I think it would be easy to add for the MQ case but I > don't know about SQ. If we don't clean up or reinit the request > before re-sending, we'll hit the BUG_ON in blk_start_request: > > BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); > > Do you wanna take a v3 of the patch and fix this on a future patch, That works. I certainly believe, looking at the code, that we can reuse the request, but in the absence from confirmation from Jens I'm certainly not going to insist on it. James > or should I be looking into patching the block layer interface? I'll > be looking into it, but I need to get familiar with the SQ code > first. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c71344aebdbb..9c6623abf120 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -187,15 +187,24 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, struct request *req; int write = (data_direction == DMA_TO_DEVICE); int ret = DRIVER_ERROR << 24; + bool priv_sense = false; + if (!sense) { + sense = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_NOIO); + if (!sense) + return ret; + priv_sense = true; + } + + retry: req = blk_get_request(sdev->request_queue, write, __GFP_RECLAIM); if (IS_ERR(req)) - return ret; + goto free_sense; blk_rq_set_block_pc(req); if (bufflen && blk_rq_map_kern(sdev->request_queue, req, buffer, bufflen, __GFP_RECLAIM)) - goto out; + goto put_req; req->cmd_len = COMMAND_SIZE(cmd[0]); memcpy(req->cmd, cmd, req->cmd_len); @@ -210,6 +219,13 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, */ blk_execute_rq(req->q, NULL, req, 1); + if (scsi_sense_unit_attention(sense) && req->retries > 0) { + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); + retries = req->retries - 1; + blk_put_request(req); + goto retry; + } + /* * Some devices (USB mass-storage in particular) may transfer * garbage data together with a residue indicating that the data @@ -222,9 +238,14 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, if (resid) *resid = req->resid_len; ret = req->errors; - out: + + put_req: blk_put_request(req); + free_sense: + if (priv_sense) + kfree(sense); + return ret; } EXPORT_SYMBOL(scsi_execute); diff --git a/include/scsi/scsi_common.h b/include/scsi/scsi_common.h index 20bf7eaef05a..747b632d5b57 100644 --- a/include/scsi/scsi_common.h +++ b/include/scsi/scsi_common.h @@ -58,6 +58,15 @@ static inline bool scsi_sense_valid(const struct scsi_sense_hdr *sshdr) return (sshdr->response_code & 0x70) == 0x70; } +static inline bool scsi_sense_unit_attention(const char *sense) +{ + int resp = sense[0] & 0x7f; + + return ((resp & 0x70) && + ((resp >= 0x72 && (sense[1] & 0xf) == UNIT_ATTENTION) || + (resp < 0x72 && (sense[2] & 0xf) == UNIT_ATTENTION))); +} + extern bool scsi_normalize_sense(const u8 *sense_buffer, int sb_len, struct scsi_sense_hdr *sshdr);