From patchwork Wed Dec 26 23:33:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Smart X-Patchwork-Id: 10743385 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 295A51800 for ; Wed, 26 Dec 2018 23:34:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A4D9283FF for ; Wed, 26 Dec 2018 23:34:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0EEE12849B; Wed, 26 Dec 2018 23:34:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C819284ED for ; Wed, 26 Dec 2018 23:34:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727832AbeLZXeW (ORCPT ); Wed, 26 Dec 2018 18:34:22 -0500 Received: from mail-yw1-f68.google.com ([209.85.161.68]:36343 "EHLO mail-yw1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727679AbeLZXeV (ORCPT ); Wed, 26 Dec 2018 18:34:21 -0500 Received: by mail-yw1-f68.google.com with SMTP id i73so6766941ywg.3 for ; Wed, 26 Dec 2018 15:34:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7YdQ576HvNw6nRmJASiMWNbx6drYi4Sl4kaGKxKm/rw=; b=LArlAsPnleOSasdz3LC0U/8oaIjJryhEcG//6KqBdx97/7tDl9ehXTEQjitcQwLAmj y+6zsa00Qf/CTQ0KiWvhddI+B7kSQgh57qmWYDZeDzrAgI4hUMkTYwFHK0Y5/hQPzc7F 0AVdYARXcJA3OO1vvMluJBwDatP+WMVw3BOM7r/BnQOnAmOnScXu7wTLozJTwGRuh1ix ltGzORnHsP+MhuMjRmDgSwECFYe/gcF9HjWEF0snw7dW0pDfN/pBDCPPMdtAhvTXsqem IZTlIHezchQ3fVJ8oxqwzMBZ92Lid+zAcEhGIS6+WnlHXdxOTZQIaUyVYBYz8AezNrXv kpIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7YdQ576HvNw6nRmJASiMWNbx6drYi4Sl4kaGKxKm/rw=; b=gUNXe17QkWk4W6QzDG/FfDosK5bArQGb+QoJjvS90PtYQY9iFojOxQw6OkMqseIELL GrNZXEdX4QzhCiOniDkPHQmWzVEfgdSAp13wId1FeOWpPyEBmyvpGUKtdprEpXjJXEyM IPCbOlAicX296W00p3TnktOt5six6hqVRzpYnuOUR/7ZuFkVeQwd1+miUfovqLl8yf4s a5C30e9ap8HfibmPYcZEqppCjTOrL0irWh+dgLcgoWlD2enQ48b2lu8OKC2w+X8nRmSc A1VaWCKqRTcylIxImkpVuXlUxkEpmrX7Z+tdLe0cR+sNLSE2dySzxs24V5nnl//uShoS +/dQ== X-Gm-Message-State: AA+aEWbTG5g9i2iWesrjyDVtbOOVZ8Sgj4CkklXad9hN4/5YAMub6EmB eTuQbgSm2n3oWAGhTekcdC1qOu5A X-Google-Smtp-Source: AFSGD/Wp0MU0XjK+DcpyKeU/Q/VvJE/B3W4p28HGRPDQGA+sTZVRb4lDePA0kCo/+QmhVabfrUmPQw== X-Received: by 2002:a81:e14:: with SMTP id 20mr21126942ywo.87.1545867258882; Wed, 26 Dec 2018 15:34:18 -0800 (PST) Received: from os42.localdomain ([192.19.228.250]) by smtp.gmail.com with ESMTPSA id h145sm13616483ywc.72.2018.12.26.15.34.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 26 Dec 2018 15:34:18 -0800 (PST) From: James Smart To: linux-scsi@vger.kernel.org Cc: James Smart , Dick Kennedy Subject: [PATCH 24/25] lpfc: Fix nvmet issues when link bounce under IO load Date: Wed, 26 Dec 2018 15:33:33 -0800 Message-Id: <20181226233334.27518-25-jsmart2021@gmail.com> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20181226233334.27518-1-jsmart2021@gmail.com> References: <20181226233334.27518-1-jsmart2021@gmail.com> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Various null pointer dereference and general protection fault panics occur when there is a link bounce under load. There are a large number of "error" message 6413 indicating "bad release". The issues resolve to list corruptions due to missing or inconsistent lock protection. Lockups are due to nested locks in the unsolicited abort path. The unsolicited abort path calls the wrong abort processing routine. There was also duplicate context release while aborts were still active in the hardware. Removed duplicate locks and added lock protection around list item removal. Commonized lock handling around the abort processing routines. Prevent context release while still in ABTS list. Signed-off-by: Dick Kennedy Signed-off-by: James Smart Reviewed-by: Hannes Reinecke --- drivers/scsi/lpfc/lpfc_nvmet.c | 50 +++++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 0d10dfc74018..4aadb3d5e718 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -1032,7 +1032,6 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport, atomic_inc(&lpfc_nvmep->xmt_fcp_abort); spin_lock_irqsave(&ctxp->ctxlock, flags); - ctxp->state = LPFC_NVMET_STE_ABORT; /* Since iaab/iaar are NOT set, we need to check * if the firmware is in process of aborting IO @@ -1044,13 +1043,14 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport, ctxp->flag |= LPFC_NVMET_ABORT_OP; if (ctxp->flag & LPFC_NVMET_DEFER_WQFULL) { + spin_unlock_irqrestore(&ctxp->ctxlock, flags); lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid, ctxp->oxid); wq = ctxp->hdwq->nvme_wq; - spin_unlock_irqrestore(&ctxp->ctxlock, flags); lpfc_nvmet_wqfull_flush(phba, wq, ctxp); return; } + spin_unlock_irqrestore(&ctxp->ctxlock, flags); /* An state of LPFC_NVMET_STE_RCV means we have just received * the NVME command and have not started processing it. @@ -1062,7 +1062,6 @@ lpfc_nvmet_xmt_fcp_abort(struct nvmet_fc_target_port *tgtport, else lpfc_nvmet_sol_fcp_issue_abort(phba, ctxp, ctxp->sid, ctxp->oxid); - spin_unlock_irqrestore(&ctxp->ctxlock, flags); } static void @@ -1076,14 +1075,18 @@ lpfc_nvmet_xmt_fcp_release(struct nvmet_fc_target_port *tgtport, unsigned long flags; bool aborting = false; - if (ctxp->state != LPFC_NVMET_STE_DONE && - ctxp->state != LPFC_NVMET_STE_ABORT) { + spin_lock_irqsave(&ctxp->ctxlock, flags); + if (ctxp->flag & LPFC_NVMET_XBUSY) + lpfc_printf_log(phba, KERN_INFO, LOG_NVME_IOERR, + "6027 NVMET release with XBUSY flag x%x" + " oxid x%x\n", + ctxp->flag, ctxp->oxid); + else if (ctxp->state != LPFC_NVMET_STE_DONE && + ctxp->state != LPFC_NVMET_STE_ABORT) lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR, "6413 NVMET release bad state %d %d oxid x%x\n", ctxp->state, ctxp->entry_cnt, ctxp->oxid); - } - spin_lock_irqsave(&ctxp->ctxlock, flags); if ((ctxp->flag & LPFC_NVMET_ABORT_OP) || (ctxp->flag & LPFC_NVMET_XBUSY)) { aborting = true; @@ -1523,6 +1526,7 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba, if (ctxp->ctxbuf->sglq->sli4_xritag != xri) continue; + spin_lock(&ctxp->ctxlock); /* Check if we already received a free context call * and we have completed processing an abort situation. */ @@ -1532,6 +1536,7 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba, released = true; } ctxp->flag &= ~LPFC_NVMET_XBUSY; + spin_unlock(&ctxp->ctxlock); spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock); rrq_empty = list_empty(&phba->active_rrq_list); @@ -1563,7 +1568,6 @@ lpfc_sli4_nvmet_xri_aborted(struct lpfc_hba *phba, int lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport, struct fc_frame_header *fc_hdr) - { #if (IS_ENABLED(CONFIG_NVME_TARGET_FC)) struct lpfc_hba *phba = vport->phba; @@ -2696,15 +2700,17 @@ lpfc_nvmet_sol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, if (ctxp->flag & LPFC_NVMET_ABORT_OP) atomic_inc(&tgtp->xmt_fcp_abort_cmpl); + spin_lock_irqsave(&ctxp->ctxlock, flags); ctxp->state = LPFC_NVMET_STE_DONE; /* Check if we already received a free context call * and we have completed processing an abort situation. */ - spin_lock_irqsave(&ctxp->ctxlock, flags); if ((ctxp->flag & LPFC_NVMET_CTX_RLS) && !(ctxp->flag & LPFC_NVMET_XBUSY)) { + spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock); list_del(&ctxp->list); + spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock); released = true; } ctxp->flag &= ~LPFC_NVMET_ABORT_OP; @@ -2770,6 +2776,7 @@ lpfc_nvmet_unsol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, } tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private; + spin_lock_irqsave(&ctxp->ctxlock, flags); if (ctxp->flag & LPFC_NVMET_ABORT_OP) atomic_inc(&tgtp->xmt_fcp_abort_cmpl); @@ -2784,10 +2791,11 @@ lpfc_nvmet_unsol_fcp_abort_cmp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdwqe, * and we have completed processing an abort situation. */ ctxp->state = LPFC_NVMET_STE_DONE; - spin_lock_irqsave(&ctxp->ctxlock, flags); if ((ctxp->flag & LPFC_NVMET_CTX_RLS) && !(ctxp->flag & LPFC_NVMET_XBUSY)) { + spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock); list_del(&ctxp->list); + spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock); released = true; } ctxp->flag &= ~LPFC_NVMET_ABORT_OP; @@ -2993,12 +3001,15 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, (ndlp) ? ndlp->nlp_state : NLP_STE_MAX_STATE); /* No failure to an ABTS request. */ + spin_lock_irqsave(&ctxp->ctxlock, flags); ctxp->flag &= ~LPFC_NVMET_ABORT_OP; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); return 0; } /* Issue ABTS for this WQE based on iotag */ ctxp->abort_wqeq = lpfc_sli_get_iocbq(phba); + spin_lock_irqsave(&ctxp->ctxlock, flags); if (!ctxp->abort_wqeq) { atomic_inc(&tgtp->xmt_abort_rsp_error); lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS, @@ -3006,11 +3017,13 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, "xri: x%x\n", ctxp->oxid); /* No failure to an ABTS request. */ ctxp->flag &= ~LPFC_NVMET_ABORT_OP; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); return 0; } abts_wqeq = ctxp->abort_wqeq; abts_wqe = &abts_wqeq->wqe; ctxp->state = LPFC_NVMET_STE_ABORT; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); /* Announce entry to new IO submit field. */ lpfc_printf_log(phba, KERN_INFO, LOG_NVME_ABTS, @@ -3031,7 +3044,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, "NVME Req now. hba_flag x%x oxid x%x\n", phba->hba_flag, ctxp->oxid); lpfc_sli_release_iocbq(phba, abts_wqeq); + spin_lock_irqsave(&ctxp->ctxlock, flags); ctxp->flag &= ~LPFC_NVMET_ABORT_OP; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); return 0; } @@ -3044,7 +3059,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, "still pending on oxid x%x\n", ctxp->oxid); lpfc_sli_release_iocbq(phba, abts_wqeq); + spin_lock_irqsave(&ctxp->ctxlock, flags); ctxp->flag &= ~LPFC_NVMET_ABORT_OP; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); return 0; } @@ -3099,7 +3116,9 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, } atomic_inc(&tgtp->xmt_abort_rsp_error); + spin_lock_irqsave(&ctxp->ctxlock, flags); ctxp->flag &= ~LPFC_NVMET_ABORT_OP; + spin_unlock_irqrestore(&ctxp->ctxlock, flags); lpfc_sli_release_iocbq(phba, abts_wqeq); lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS, "6166 Failed ABORT issue_wqe with status x%x " @@ -3108,7 +3127,6 @@ lpfc_nvmet_sol_fcp_issue_abort(struct lpfc_hba *phba, return 1; } - static int lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba, struct lpfc_nvmet_rcv_ctx *ctxp, @@ -3117,6 +3135,7 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba, struct lpfc_nvmet_tgtport *tgtp; struct lpfc_iocbq *abts_wqeq; unsigned long flags; + bool released = false; int rc; tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private; @@ -3154,8 +3173,12 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba, aerr: spin_lock_irqsave(&ctxp->ctxlock, flags); - if (ctxp->flag & LPFC_NVMET_CTX_RLS) + if (ctxp->flag & LPFC_NVMET_CTX_RLS) { + spin_lock(&phba->sli4_hba.abts_nvmet_buf_list_lock); list_del(&ctxp->list); + spin_unlock(&phba->sli4_hba.abts_nvmet_buf_list_lock); + released = true; + } ctxp->flag &= ~(LPFC_NVMET_ABORT_OP | LPFC_NVMET_CTX_RLS); spin_unlock_irqrestore(&ctxp->ctxlock, flags); @@ -3163,7 +3186,8 @@ lpfc_nvmet_unsol_fcp_issue_abort(struct lpfc_hba *phba, lpfc_printf_log(phba, KERN_ERR, LOG_NVME_ABTS, "6135 Failed to Issue ABTS for oxid x%x. Status x%x\n", ctxp->oxid, rc); - lpfc_nvmet_ctxbuf_post(phba, ctxp->ctxbuf); + if (released) + lpfc_nvmet_ctxbuf_post(phba, ctxp->ctxbuf); return 1; }