From patchwork Tue Mar 12 23:30:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Smart X-Patchwork-Id: 10850233 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A98191850 for ; Tue, 12 Mar 2019 23:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 95B0029705 for ; Tue, 12 Mar 2019 23:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8A3CF29903; Tue, 12 Mar 2019 23:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07BA829972 for ; Tue, 12 Mar 2019 23:30:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727407AbfCLXa6 (ORCPT ); Tue, 12 Mar 2019 19:30:58 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:39925 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727366AbfCLXaz (ORCPT ); Tue, 12 Mar 2019 19:30:55 -0400 Received: by mail-pf1-f194.google.com with SMTP id i20so20pfo.6 for ; Tue, 12 Mar 2019 16:30:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TC+ey7nZD6KbfDgjY1+ys1K31nxfmSr7wUKRfI9fbvA=; b=LeTblxZ72smQvBn0M/eJGsXU8Q8nsrS2ZhoacGEcvTluTjgFvDZVjm/Mqy05zS/Nt6 h2E36KYt4UyCSZamLn/cUKB1WXS9LL8QIfAZomXDuSd+cEbrjFqEpzV8eoEaVFH4mT+5 QzQzNLlHp3jQr5bgLfW/vR8QRd8z8k83lBCQCze9IKIEBJd0shuTzUL+42u6XgW2UN6H hOf1SM9+PvO3DNP+W1CI/lom1/oaRf4dwfPzTH9k2uNCx1L1BVfrXi8n++OpBt7ACc/R a4QVfthL5MabL45evnGZHfPI4qyqk601c4U7douXweTgyq+2yxqN45hcBNyhG4nr0rsy Vz7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TC+ey7nZD6KbfDgjY1+ys1K31nxfmSr7wUKRfI9fbvA=; b=B+ECgujF0Q5xos5v1WtOZJAVHsckUVY+K05dbaHSshlNrO4Q/P9JbAFxNi2oBBd9gl sprumVrMt9L3ofE2nmg2MmnJBQzw0CCCDuBqFpo1iGJ1IeeclyeLqMY9ucx6LFVu2EUj HU80m3iX/gKOIVIBcnyX/M47L8ja5ksmrHIm/deyBvlZ0WAT5RamTg4VUBjSorjiVe1y 3iE5p4uIkoKHs62Ma9vIBeD+vnnq5Qmkw18twq1khkapmu/bYy7A8BEjmT0PzdkN7U1l FHfyWvncUIt3m6bysj+3iVPaaV9XS7VA66sWT2c5Ytawg9k+gHWjsNWbI5TlgGHH4nYb j/LA== X-Gm-Message-State: APjAAAWXwu6ba6zMlpYB7xc4PHRk1JQOmyeJK5K6z/Rk3Q1SOHrei8Zt oMoodJpG+KR1CkcPuqRkJ7rkCBiF X-Google-Smtp-Source: APXvYqzNhV3IyMGkgctwb4lJCYye6SzG5VMeMzdBKT+G6oU64QnTiJvf2Vm9C4o7qrBUZUhxbdCmww== X-Received: by 2002:a63:6605:: with SMTP id a5mr37567367pgc.372.1552433454403; Tue, 12 Mar 2019 16:30:54 -0700 (PDT) Received: from pallmd1.broadcom.com ([192.19.223.250]) by smtp.gmail.com with ESMTPSA id d11sm17284409pfh.29.2019.03.12.16.30.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 12 Mar 2019 16:30:53 -0700 (PDT) From: James Smart To: linux-scsi@vger.kernel.org Cc: James Smart , Dick Kennedy Subject: [PATCH 11/30] lpfc: Coordinate adapter error handling with offline handling Date: Tue, 12 Mar 2019 16:30:14 -0700 Message-Id: <20190312233033.32670-12-jsmart2021@gmail.com> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190312233033.32670-1-jsmart2021@gmail.com> References: <20190312233033.32670-1-jsmart2021@gmail.com> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The driver periodically checks for adapter error in a background thread. If the thread detects an error, the adapter will be reset including the deletion and reallocation of workqueues on the adapter. Simultaneously, there may be a user-space request to offline the adapter which may try to do many of the same steps, in parallel, on a different thread. As memory was deallocated while unexpected, the parallel offline request hit a bad pointer. Add coordination between the two threads. The error recovery thread has precedence. So, when an error is detected, a flag is set on the adapter to indicate the error thread is terminating the adapter. But, before doing that work, it will look for a flag that is set by the offline flow, and if set, will wait for it to complete before then processing the error handling path. Similarly, in the offline thread, it first checks for whether the error thread is resetting the adapter, and if so, will then wait for the error thread to finish. Only after it has finished, will it set its flag and offline the adapter. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_attr.c | 19 +++++++++++++++++++ drivers/scsi/lpfc/lpfc_init.c | 19 +++++++++++++++++++ drivers/scsi/lpfc/lpfc_sli.c | 6 +++--- drivers/scsi/lpfc/lpfc_sli.h | 4 ++++ 4 files changed, 45 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index 5d6c874c44e7..61745f590916 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -1204,6 +1204,20 @@ lpfc_do_offline(struct lpfc_hba *phba, uint32_t type) psli = &phba->sli; + /* + * If freeing the queues have already started, don't access them. + * Otherwise set FREE_WAIT to indicate that queues are being used + * to hold the freeing process until we finish. + */ + spin_lock_irq(&phba->hbalock); + if (!(psli->sli_flag & LPFC_QUEUE_FREE_INIT)) { + psli->sli_flag |= LPFC_QUEUE_FREE_WAIT; + } else { + spin_unlock_irq(&phba->hbalock); + goto skip_wait; + } + spin_unlock_irq(&phba->hbalock); + /* Wait a little for things to settle down, but not * long enough for dev loss timeout to expire. */ @@ -1225,6 +1239,11 @@ lpfc_do_offline(struct lpfc_hba *phba, uint32_t type) } } out: + spin_lock_irq(&phba->hbalock); + psli->sli_flag &= ~LPFC_QUEUE_FREE_WAIT; + spin_unlock_irq(&phba->hbalock); + +skip_wait: init_completion(&online_compl); rc = lpfc_workq_post_event(phba, &status, &online_compl, type); if (rc == 0) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 4a470f80f601..440b631c2155 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9132,6 +9132,20 @@ lpfc_sli4_release_hdwq(struct lpfc_hba *phba) void lpfc_sli4_queue_destroy(struct lpfc_hba *phba) { + /* + * Set FREE_INIT before beginning to free the queues. + * Wait until the users of queues to acknowledge to + * release queues by clearing FREE_WAIT. + */ + spin_lock_irq(&phba->hbalock); + phba->sli.sli_flag |= LPFC_QUEUE_FREE_INIT; + while (phba->sli.sli_flag & LPFC_QUEUE_FREE_WAIT) { + spin_unlock_irq(&phba->hbalock); + msleep(20); + spin_lock_irq(&phba->hbalock); + } + spin_unlock_irq(&phba->hbalock); + /* Release HBA eqs */ if (phba->sli4_hba.hdwq) lpfc_sli4_release_hdwq(phba); @@ -9170,6 +9184,11 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba) /* Everything on this list has been freed */ INIT_LIST_HEAD(&phba->sli4_hba.lpfc_wq_list); + + /* Done with freeing the queues */ + spin_lock_irq(&phba->hbalock); + phba->sli.sli_flag &= ~LPFC_QUEUE_FREE_INIT; + spin_unlock_irq(&phba->hbalock); } int diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 9d7fc2d4f6d0..ea80f3e60699 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -14416,6 +14416,9 @@ lpfc_sli4_queue_free(struct lpfc_queue *queue) if (!queue) return; + if (!list_empty(&queue->wq_list)) + list_del(&queue->wq_list); + while (!list_empty(&queue->page_list)) { list_remove_head(&queue->page_list, dmabuf, struct lpfc_dmabuf, list); @@ -14431,9 +14434,6 @@ lpfc_sli4_queue_free(struct lpfc_queue *queue) if (!list_empty(&queue->cpu_list)) list_del(&queue->cpu_list); - if (!list_empty(&queue->wq_list)) - list_del(&queue->wq_list); - kfree(queue); return; } diff --git a/drivers/scsi/lpfc/lpfc_sli.h b/drivers/scsi/lpfc/lpfc_sli.h index 1153a6c91bde..467b8270f7fd 100644 --- a/drivers/scsi/lpfc/lpfc_sli.h +++ b/drivers/scsi/lpfc/lpfc_sli.h @@ -327,6 +327,10 @@ struct lpfc_sli { #define LPFC_SLI_ASYNC_MBX_BLK 0x2000 /* Async mailbox is blocked */ #define LPFC_SLI_SUPPRESS_RSP 0x4000 /* Suppress RSP feature is supported */ #define LPFC_SLI_USE_EQDR 0x8000 /* EQ Delay Register is supported */ +#define LPFC_QUEUE_FREE_INIT 0x10000 /* Queue freeing is in progress */ +#define LPFC_QUEUE_FREE_WAIT 0x20000 /* Hold Queue free as it is being + * used outside worker thread + */ struct lpfc_sli_ring *sli3_ring;