From patchwork Tue Jan 22 18:14:29 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Smart <jsmart2021@gmail.com>
X-Patchwork-Id: 10775957
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A884A14E5
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Tue, 22 Jan 2019 18:15:09 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9837A2B82D
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Tue, 22 Jan 2019 18:15:09 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 96BA52B885; Tue, 22 Jan 2019 18:15:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4C8A2B9BE
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Tue, 22 Jan 2019 18:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726120AbfAVSPG (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Tue, 22 Jan 2019 13:15:06 -0500
Received: from mail-pl1-f193.google.com ([209.85.214.193]:39128 "EHLO
        mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726744AbfAVSPE (ORCPT
        <rfc822;linux-scsi@vger.kernel.org>); Tue, 22 Jan 2019 13:15:04 -0500
Received: by mail-pl1-f193.google.com with SMTP id 101so11855909pld.6
        for <linux-scsi@vger.kernel.org>;
 Tue, 22 Jan 2019 10:15:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=from:to:cc:subject:date:message-id:in-reply-to:references;
        bh=tLSmLrtnMf1O42Ht5FmEOnbNI3HzXlx96/TcO1+2A80=;
        b=KK0EI4HHBhZHkT8zlvlLiPdxxldEa+r00eqe7wTO/dMyeKoHb9Zvvkgyu8dPTsb3GE
         ouTPmSOUnW5CmYHIWP6rY6X495Ba1SgsZP1oo33+sXOLY1+xzvpF1pA/lbMYO3ZeVnVY
         Ibfjrj3pKccg5joY8U7Wgy+zHEQFJTMhaBqMuFQ8eHyY9zJaLxOFDQzRD9i463FomTrl
         BVFkm2claWHjjGYuAxv8cQpEoixNHxN/KqUlWkpDRwsdTs3Fst7JgwUHgtV4XxuJ06W8
         q4KUm17xX8X1qMTh/CQGTHGhUeQFotcvwbfJMgt2V/Na2ypPjwAEm86e6+/ENpRJiM5T
         3Vlg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references;
        bh=tLSmLrtnMf1O42Ht5FmEOnbNI3HzXlx96/TcO1+2A80=;
        b=p+o4ggSJjrdtF9uGHA2lpEQQFcJJu5vzuki38EGmiTQYXI2TedlMahKb/ZSMWz2t+u
         Zy4XQIdWGPDl4++DaBUPhG9+WvW8NHuI9mg0XuEWXbMLDtx0hH8YDJOWWCe8chfHiKRP
         ncXi4pX2ZEEuaMZbwAK9ZYcRyZSaOpL3ERc8XESJJMgqEFOdS8iRxtQdCaDZRd8uz+06
         LgNFNwnj7Hi1+6DMOkJW37MVzSwiy2DTOsw0V+4konfEECQtzq8LUSa8UkaABt5R3Ha8
         bYKdnJOKzzdihmAlLBSXRBWzm/kWqFzpHwcPCulVD9OlbdNkq1agsvXMBLUFS96erwgf
         iVvQ==
X-Gm-Message-State: AJcUukfMwqsqQrPxjVxqLUWbRaOivEfWaMKPnd8ztUhLUQA0ChdLk1W8
        o/rqM5t8HJZCoh2gU8J0OOp/sx/x
X-Google-Smtp-Source: 
 ALg8bN7xUROMlfiQbz30jo2pKIx1Z1RhAmmzUYOvOANjBp7QgPJ0BTEMe1p6K2sgPOaJlXkGUg8boA==
X-Received: by 2002:a17:902:280b:: with SMTP id
 e11mr35333248plb.269.1548180903634;
        Tue, 22 Jan 2019 10:15:03 -0800 (PST)
Received: from pallmd1.broadcom.com ([192.19.223.250])
        by smtp.gmail.com with ESMTPSA id
 h134sm38848765pfe.27.2019.01.22.10.15.02
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
        Tue, 22 Jan 2019 10:15:03 -0800 (PST)
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>,
        Dick Kennedy <dick.kennedy@broadcom.com>
Subject: [PATCH v2 24/26] lpfc: Fix nvmet issues when link bounce under IO
 load
Date: Tue, 22 Jan 2019 10:14:29 -0800
Message-Id: <20190122181431.8297-25-jsmart2021@gmail.com>
X-Mailer: git-send-email 2.13.7
In-Reply-To: <20190122181431.8297-1-jsmart2021@gmail.com>
References: <20190122181431.8297-1-jsmart2021@gmail.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Various null pointer dereference and general protection fault panics
occur when there is a link bounce under load. There are a large number
of "error" message 6413 indicating "bad release".

The issues resolve to list corruptions due to missing or inconsistent
lock protection. Lockups are due to nested locks in the unsolicited
abort path. The unsolicited abort path calls the wrong abort
processing routine. There was also duplicate context release while
aborts were still active in the hardware.

Removed duplicate locks and added lock protection around list item
removal. Commonized lock handling around the abort processing routines.
Prevent context release while still in ABTS list.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/lpfc/lpfc_nvmet.c | 50 +++++++++++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 0d10dfc74018..4aadb3d5e718 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -1032,7 +1032,6 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport,
 	atomic_inc(&lpfc_nvmep->xmt_fcp_abort);
 
 	spin_lock_irqsave(&ctxp->ctxlock, flags);
-	ctxp->state = LPFC_NVMET_STE_ABORT;
 
 	/* Since iaab/iaar are NOT set, we need to check
 	 * if the firmware is in process of aborting IO
@@ -1044,13 +1043,14 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport,
 	ctxp->flag |= LPFC_NVMET_ABORT_OP;
 
 	if (ctxp->flag & LPFC_NVMET_DEFER_WQFULL) {
+		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
 						 ctxp->oxid);
 		wq = ctxp->hdwq->nvme_wq;
-		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		lpfc_nvmet_wqfull_flush(phba, wq, ctxp);
 		return;
 	}
+	spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 
 	/* An state of LPFC_NVMET_STE_RCV means we have just received
 	 * the NVME command and have not started processing it.
@@ -1062,7 +1062,6 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport,
 	else
 		lpfc_nvmet_sol_fcp_issue_abort(phba, ctxp, ctxp->sid,
 					       ctxp->oxid);
-	spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 }
 
 static void
@@ -1076,14 +1075,18 @@ lpfc_nvmet_xmt_fcp_release(struct nvmet_fc_target_port *tgtport,
 	unsigned long flags;
 	bool aborting = false;
 
-	if (ctxp->state != LPFC_NVMET_STE_DONE &&
-	    ctxp->state != LPFC_NVMET_STE_ABORT) {
+	spin_lock_irqsave(&ctxp->ctxlock, flags);
+	if (ctxp->flag & LPFC_NVMET_XBUSY)
+		lpfc_printf_log(phba, KERN_INFO, LOG_NVME_IOERR,
+				"6027 NVMET release with XBUSY flag x%x"
+				" oxid x%x\n",
+				ctxp->flag, ctxp->oxid);
+	else if (ctxp->state != LPFC_NVMET_STE_DONE &&
+		 ctxp->state != LPFC_NVMET_STE_ABORT)
 		lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR,
 				"6413 NVMET release bad state %d %d oxid x%x\n",
 				ctxp->state, ctxp->entry_cnt, ctxp->oxid);
-	}
 
-	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	if ((ctxp->flag & LPFC_NVMET_ABORT_OP) ||
 	    (ctxp->flag & LPFC_NVMET_XBUSY)) {
 		aborting = true;
@@ -1523,6 +1526,7 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba,
 		if (ctxp->ctxbuf->sglq->sli4_xritag != xri)
 			continue;
 
+		spin_lock(&ctxp->ctxlock);
 		/* Check if we already received a free context call
 		 * and we have completed processing an abort situation.
 		 */
@@ -1532,6 +1536,7 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba,
 			released = true;
 		}
 		ctxp->flag &= ~LPFC_NVMET_XBUSY;
+		spin_unlock(&ctxp->ctxlock);
 		spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 
 		rrq_empty = list_empty(&phba->active_rrq_list);
@@ -1563,7 +1568,6 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba,
 int
 lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport,
 			   struct fc_frame_header *fc_hdr)
-
 {
 #if (IS_ENABLED(CONFIG_NVME_TARGET_FC))
 	struct lpfc_hba *phba = vport->phba;
@@ -2696,15 +2700,17 @@ lpfc_nvmet_sol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe,
 	if (ctxp->flag & LPFC_NVMET_ABORT_OP)
 		atomic_inc(&tgtp->xmt_fcp_abort_cmpl);
 
+	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	ctxp->state = LPFC_NVMET_STE_DONE;
 
 	/* Check if we already received a free context call
 	 * and we have completed processing an abort situation.
 	 */
-	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	if ((ctxp->flag & LPFC_NVMET_CTX_RLS) &&
 	    !(ctxp->flag & LPFC_NVMET_XBUSY)) {
+		spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 		list_del(&ctxp->list);
+		spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 		released = true;
 	}
 	ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
@@ -2770,6 +2776,7 @@ lpfc_nvmet_unsol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe,
 	}
 
 	tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private;
+	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	if (ctxp->flag & LPFC_NVMET_ABORT_OP)
 		atomic_inc(&tgtp->xmt_fcp_abort_cmpl);
 
@@ -2784,10 +2791,11 @@ lpfc_nvmet_unsol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe,
 	 * and we have completed processing an abort situation.
 	 */
 	ctxp->state = LPFC_NVMET_STE_DONE;
-	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	if ((ctxp->flag & LPFC_NVMET_CTX_RLS) &&
 	    !(ctxp->flag & LPFC_NVMET_XBUSY)) {
+		spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 		list_del(&ctxp->list);
+		spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 		released = true;
 	}
 	ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
@@ -2993,12 +3001,15 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 				(ndlp) ? ndlp->nlp_state : NLP_STE_MAX_STATE);
 
 		/* No failure to an ABTS request. */
+		spin_lock_irqsave(&ctxp->ctxlock, flags);
 		ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
+		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		return 0;
 	}
 
 	/* Issue ABTS for this WQE based on iotag */
 	ctxp->abort_wqeq = lpfc_sli_get_iocbq(phba);
+	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	if (!ctxp->abort_wqeq) {
 		atomic_inc(&tgtp->xmt_abort_rsp_error);
 		lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS,
@@ -3006,11 +3017,13 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 				"xri: x%x\n", ctxp->oxid);
 		/* No failure to an ABTS request. */
 		ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
+		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		return 0;
 	}
 	abts_wqeq = ctxp->abort_wqeq;
 	abts_wqe = &abts_wqeq->wqe;
 	ctxp->state = LPFC_NVMET_STE_ABORT;
+	spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 
 	/* Announce entry to new IO submit field. */
 	lpfc_printf_log(phba, KERN_INFO, LOG_NVME_ABTS,
@@ -3031,7 +3044,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 				"NVME Req now. hba_flag x%x oxid x%x\n",
 				phba->hba_flag, ctxp->oxid);
 		lpfc_sli_release_iocbq(phba, abts_wqeq);
+		spin_lock_irqsave(&ctxp->ctxlock, flags);
 		ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
+		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		return 0;
 	}
 
@@ -3044,7 +3059,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 				"still pending on oxid x%x\n",
 				ctxp->oxid);
 		lpfc_sli_release_iocbq(phba, abts_wqeq);
+		spin_lock_irqsave(&ctxp->ctxlock, flags);
 		ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
+		spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 		return 0;
 	}
 
@@ -3099,7 +3116,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 	}
 
 	atomic_inc(&tgtp->xmt_abort_rsp_error);
+	spin_lock_irqsave(&ctxp->ctxlock, flags);
 	ctxp->flag &= ~LPFC_NVMET_ABORT_OP;
+	spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 	lpfc_sli_release_iocbq(phba, abts_wqeq);
 	lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS,
 			"6166 Failed ABORT issue_wqe with status x%x "
@@ -3108,7 +3127,6 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba,
 	return 1;
 }
 
-
 static int
 lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba,
 				 struct lpfc_nvmet_rcv_ctx *ctxp,
@@ -3117,6 +3135,7 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba,
 	struct lpfc_nvmet_tgtport *tgtp;
 	struct lpfc_iocbq *abts_wqeq;
 	unsigned long flags;
+	bool released = false;
 	int rc;
 
 	tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private;
@@ -3154,8 +3173,12 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba,
 
 aerr:
 	spin_lock_irqsave(&ctxp->ctxlock, flags);
-	if (ctxp->flag & LPFC_NVMET_CTX_RLS)
+	if (ctxp->flag & LPFC_NVMET_CTX_RLS) {
+		spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
 		list_del(&ctxp->list);
+		spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock);
+		released = true;
+	}
 	ctxp->flag &= ~(LPFC_NVMET_ABORT_OP | LPFC_NVMET_CTX_RLS);
 	spin_unlock_irqrestore(&ctxp->ctxlock, flags);
 
@@ -3163,7 +3186,8 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba,
 	lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS,
 			"6135 Failed to Issue ABTS for oxid x%x. Status x%x\n",
 			ctxp->oxid, rc);
-	lpfc_nvmet_ctxbuf_post(phba, ctxp->ctxbuf);
+	if (released)
+		lpfc_nvmet_ctxbuf_post(phba, ctxp->ctxbuf);
 	return 1;
 }