diff mbox

[v3,07/19] lpfc: Fix IO failure during hba reset testing with nvme io.

Message ID 20180130235903.5316-8-jsmart2021@gmail.com (mailing list archive)
State Accepted
Headers show

Commit Message

James Smart Jan. 30, 2018, 11:58 p.m. UTC
A stress test repeatedly resetting the adapter while performing
io would eventually report I/O failures and missing nvme namespaces.

The driver was setting the nvmefc_fcp_req->private pointer to NULL
during the IO completion routine before upcalling done().
If the transport was also running an abort for that IO, the driver
would fail the abort with message 6140. Failing the abort is not
allowed by the nvme-fc transport, as it mandates that the io must be
returned back to the transport. As that does not happen, the transport
controller delete has an outstanding reference and can't complete
teardown.

The NULL-ing of the private pointer should be done only when the io
is considered complete. It's complete when the adapter returns the
exchange with the "exchange busy" flag clear.

Move the NULL'ing of the structure to the done case. This leaves the
io contexts set while it is busy and until the subsequent XRI_ABORTED
completion which returns the exchange is received.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>

---
v3:
  Address review comment that preferred to continue to NULL if the
  io did complete in order to protect stale conditions. After review,
  we agreed, thus the above change.
---
 drivers/scsi/lpfc/lpfc_nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff mbox

Patch

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 81e3a4f10c3c..c6e5b9972585 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -980,14 +980,14 @@  lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn,
 			phba->cpucheck_cmpl_io[lpfc_ncmd->cpu]++;
 	}
 #endif
-	freqpriv = nCmd->private;
-	freqpriv->nvme_buf = NULL;
 
 	/* NVME targets need completion held off until the abort exchange
 	 * completes unless the NVME Rport is getting unregistered.
 	 */
 
 	if (!(lpfc_ncmd->flags & LPFC_SBUF_XBUSY)) {
+		freqpriv = nCmd->private;
+		freqpriv->nvme_buf = NULL;
 		nCmd->done(nCmd);
 		lpfc_ncmd->nvmeCmd = NULL;
 	}