From patchwork Thu Jun 27 10:17:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sumit Saxena X-Patchwork-Id: 13714120 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3CBA13D887 for ; Thu, 27 Jun 2024 10:20:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719483656; cv=none; b=K2OlpJGeAgaVVLa7c6/zML22Xx1YpmSUQCsWNiOmMwmG/k2aLApb0CC7XaDzTyeTIa2FaT4XbjAEyYASdb1isDKHENOq8OSMbWKEcj+vOPMwyJ7/hHVu7XR8+HGnfWA7H4MF1IYbQeHjBixxs8kZeFsFgjjgFh2emP7KVSAO/vk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719483656; c=relaxed/simple; bh=DfKH9KbHm7ySZyAiq4PYVFa6GinH2ip/wt2GWYzd5RY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=eYJnqHXQoeASAahJ+6k0P49FqUZhTY0kRm6zGzHo4aqe2PajcXgl+M1Jo4uFQAGB7FGcX9A0SBI5gPPLqEWazaTSmFOHIa4p4zuJaxPiUvNM2AG6X7OUVyjAMJ9YNO/DuLyez4ywBVV0iRlG3mp0WgO9KrfkODMFxQioJt4ScCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=broadcom.com; spf=fail smtp.mailfrom=broadcom.com; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b=GipvoRe+; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=broadcom.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=broadcom.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="GipvoRe+" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1fa55dbf2e7so31074705ad.2 for ; Thu, 27 Jun 2024 03:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1719483653; x=1720088453; darn=vger.kernel.org; h=mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=/gFTleLR0XtWT68aC+UD/80h9Bip64awK79gyLXsNBA=; b=GipvoRe+kaVaFBkpKXdEG0/U4997yKLIzjtv9ZxBUl1dGm8ofxLRkB+HIU/f2pzUbC Vf1u5EA82sr3wPHu4EA8ykEkub4lwSogSNVI9WvhpSnWXzUdrBUdfJ1z2GKu0XU+aFvX T+0ANfTnOWcaBfnldpHi2/1LxpU0xrUkpMQhU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719483653; x=1720088453; h=mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/gFTleLR0XtWT68aC+UD/80h9Bip64awK79gyLXsNBA=; b=SI5Q6aUpBoM6GiFMnOjqROS/sfOf6Uh0GQ1iL3J8zflGIqmGDy6rwUM1mjQoMbWF2u khMHY0YErEN6TE07o2vOhm7+iqBxFzVdxei4OgW14/Yt8AeO4TTybWyPRf4ehJUDr6IR CjP7o3yvJaKefrCGV7q9kht329U8RrpggowUt6etOxVku5l+mhEeHlPEW5xbc8KncN84 3/f9d6zOYqMp+49RHu/0mKA27y3AcfrY62vUwjwCskw/f8CMI9L/vmD0Xu+uHJUs90vD idSSigC9pgw1TAtc1UKjzogQbyAyHzOZdNkQQY0CWW9BQHsbA8rRaeJRi/GDTh7RykPd 1/RA== X-Gm-Message-State: AOJu0YzUfh8tnO+Tr0OpvavNxBuUzLeWR+R58YxjrMNvg0dTu8HSJj1c hSbhDoQ3vOhTTQv7h55iYbRiVUWnJkiUMI4K2T46gezGN4NUtLEPY1088qURHw== X-Google-Smtp-Source: AGHT+IFzSJ/8DrMrOWwPhJe/fH2bFuOwjP4XmdvRF2n7A4f//PPpMyBoOoZxGYt0bmmg25zdT8CZRw== X-Received: by 2002:a17:903:32d2:b0:1f7:1931:7a8f with SMTP id d9443c01a7336-1fa15944201mr134928915ad.64.1719483653056; Thu, 27 Jun 2024 03:20:53 -0700 (PDT) Received: from localhost.localdomain ([192.19.234.250]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1faac979478sm9858495ad.180.2024.06.27.03.20.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jun 2024 03:20:52 -0700 (PDT) From: Sumit Saxena To: martin.petersen@oracle.com, helgaas@kernel.org, sathya.prakash@broadcom.com, sumit.saxena@broadcom.com, chandrakanth.patil@broadcom.com, ranjan.kumar@broadcom.com, prayas.patel@broadcom.com Cc: linux-scsi@vger.kernel.org, linux-pci@vger.kernel.org Subject: [PATCH v5 1/3] mpi3mr: Support PCI Error Recovery callback handlers Date: Thu, 27 Jun 2024 15:47:33 +0530 Message-Id: <20240627101735.18286-2-sumit.saxena@broadcom.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20240627101735.18286-1-sumit.saxena@broadcom.com> References: <20240627101735.18286-1-sumit.saxena@broadcom.com> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 PCI Error recovery support is required to recover the controller upon detection of PCI errors. Add support for the PCI error recovery callback handlers in mpi3mr driver. Signed-off-by: Sathya Prakash Signed-off-by: Ranjan Kumar Signed-off-by: Sumit Saxena --- drivers/scsi/mpi3mr/mpi3mr.h | 6 + drivers/scsi/mpi3mr/mpi3mr_os.c | 199 ++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+) diff --git a/drivers/scsi/mpi3mr/mpi3mr.h b/drivers/scsi/mpi3mr/mpi3mr.h index c8968f12b9e6..2b1d5645ba9b 100644 --- a/drivers/scsi/mpi3mr/mpi3mr.h +++ b/drivers/scsi/mpi3mr/mpi3mr.h @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -129,6 +130,7 @@ extern atomic64_t event_counter; #define MPI3MR_PREPARE_FOR_RESET_TIMEOUT 180 #define MPI3MR_RESET_ACK_TIMEOUT 30 #define MPI3MR_MUR_TIMEOUT 120 +#define MPI3MR_RESET_TIMEOUT 510 #define MPI3MR_WATCHDOG_INTERVAL 1000 /* in milli seconds */ @@ -1153,6 +1155,8 @@ struct scmd_priv { * @trace_release_trigger_active: Trace trigger active flag * @fw_release_trigger_active: Fw release trigger active flag * @snapdump_trigger_active: Snapdump trigger active flag + * @pci_err_recovery: PCI error recovery in progress + * @block_on_pci_err: Block IO during PCI error recovery */ struct mpi3mr_ioc { struct list_head list; @@ -1353,6 +1357,8 @@ struct mpi3mr_ioc { bool snapdump_trigger_active; bool trace_release_trigger_active; bool fw_release_trigger_active; + bool pci_err_recovery; + bool block_on_pci_err; }; /** diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c index eac179dc9370..0986b362e5f0 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_os.c +++ b/drivers/scsi/mpi3mr/mpi3mr_os.c @@ -5546,6 +5546,197 @@ mpi3mr_resume(struct device *dev) return 0; } +/** + * mpi3mr_pcierr_error_detected - PCI error detected callback + * @pdev: PCI device instance + * @state: channel state + * + * This function is called by the PCI error recovery driver and + * based on the state passed the driver decides what actions to + * be recommended back to PCI driver. + * + * For all of the states if there is no valid mrioc or scsi host + * references in the PCI device then this function will return + * the result as disconnect. + * + * For normal state, this function will return the result as can + * recover. + * + * For frozen state, this function will block for any pending + * controller initialization or re-initialization to complete, + * stop any new interactions with the controller and return + * status as reset required. + * + * For permanent failure state, this function will mark the + * controller as unrecoverable and return status as disconnect. + * + * Returns: PCI_ERS_RESULT_NEED_RESET or CAN_RECOVER or + * DISCONNECT based on the controller state. + */ +static pci_ers_result_t +mpi3mr_pcierr_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct Scsi_Host *shost; + struct mpi3mr_ioc *mrioc; + unsigned int timeout = MPI3MR_RESET_TIMEOUT; + + dev_info(&pdev->dev, "%s: callback invoked state(%d)\n", __func__, + state); + + shost = pci_get_drvdata(pdev); + mrioc = shost_priv(shost); + + switch (state) { + case pci_channel_io_normal: + return PCI_ERS_RESULT_CAN_RECOVER; + case pci_channel_io_frozen: + mrioc->pci_err_recovery = true; + mrioc->block_on_pci_err = true; + do { + if (mrioc->reset_in_progress || mrioc->is_driver_loading) + ssleep(1); + else + break; + } while (--timeout); + + if (!timeout) { + mrioc->pci_err_recovery = true; + mrioc->block_on_pci_err = true; + mrioc->unrecoverable = 1; + mpi3mr_stop_watchdog(mrioc); + mpi3mr_flush_cmds_for_unrecovered_controller(mrioc); + return PCI_ERS_RESULT_DISCONNECT; + } + + scsi_block_requests(mrioc->shost); + mpi3mr_stop_watchdog(mrioc); + mpi3mr_cleanup_resources(mrioc); + return PCI_ERS_RESULT_NEED_RESET; + case pci_channel_io_perm_failure: + mrioc->pci_err_recovery = true; + mrioc->block_on_pci_err = true; + mrioc->unrecoverable = 1; + mpi3mr_stop_watchdog(mrioc); + mpi3mr_flush_cmds_for_unrecovered_controller(mrioc); + return PCI_ERS_RESULT_DISCONNECT; + default: + return PCI_ERS_RESULT_DISCONNECT; + } +} + +/** + * mpi3mr_pcierr_slot_reset - Post slot reset callback + * @pdev: PCI device instance + * + * This function is called by the PCI error recovery driver + * after a slot or link reset issued by it for the recovery, the + * driver is expected to bring back the controller and + * initialize it. + * + * This function restores PCI state and reinitializes controller + * resources and the controller, this blocks for any pending + * reset to complete. + * + * Returns: PCI_ERS_RESULT_DISCONNECT on failure or + * PCI_ERS_RESULT_RECOVERED + */ +static pci_ers_result_t mpi3mr_pcierr_slot_reset(struct pci_dev *pdev) +{ + struct Scsi_Host *shost; + struct mpi3mr_ioc *mrioc; + unsigned int timeout = MPI3MR_RESET_TIMEOUT; + + dev_info(&pdev->dev, "%s: callback invoked\n", __func__); + + shost = pci_get_drvdata(pdev); + mrioc = shost_priv(shost); + + do { + if (mrioc->reset_in_progress) + ssleep(1); + else + break; + } while (--timeout); + + if (!timeout) + goto out_failed; + + pci_restore_state(pdev); + + if (mpi3mr_setup_resources(mrioc)) { + ioc_err(mrioc, "setup resources failed\n"); + goto out_failed; + } + mrioc->unrecoverable = 0; + mrioc->pci_err_recovery = false; + + if (mpi3mr_soft_reset_handler(mrioc, MPI3MR_RESET_FROM_FIRMWARE, 0)) + goto out_failed; + + return PCI_ERS_RESULT_RECOVERED; + +out_failed: + mrioc->unrecoverable = 1; + mrioc->block_on_pci_err = false; + scsi_unblock_requests(shost); + mpi3mr_start_watchdog(mrioc); + return PCI_ERS_RESULT_DISCONNECT; +} + +/** + * mpi3mr_pcierr_resume - PCI error recovery resume + * callback + * @pdev: PCI device instance + * + * This function enables all I/O and IOCTLs post reset issued as + * part of the PCI error recovery + * + * Return: Nothing. + */ +static void mpi3mr_pcierr_resume(struct pci_dev *pdev) +{ + struct Scsi_Host *shost; + struct mpi3mr_ioc *mrioc; + + dev_info(&pdev->dev, "%s: callback invoked\n", __func__); + + shost = pci_get_drvdata(pdev); + mrioc = shost_priv(shost); + + if (mrioc->block_on_pci_err) { + mrioc->block_on_pci_err = false; + scsi_unblock_requests(shost); + mpi3mr_start_watchdog(mrioc); + } +} + +/** + * mpi3mr_pcierr_mmio_enabled - PCI error recovery callback + * @pdev: PCI device instance + * + * This is called only if mpi3mr_pcierr_error_detected returns + * PCI_ERS_RESULT_CAN_RECOVER. + * + * Return: PCI_ERS_RESULT_DISCONNECT when the controller is + * unrecoverable or when the shost/mrioc reference cannot be + * found, else return PCI_ERS_RESULT_RECOVERED + */ +static pci_ers_result_t mpi3mr_pcierr_mmio_enabled(struct pci_dev *pdev) +{ + struct Scsi_Host *shost; + struct mpi3mr_ioc *mrioc; + + dev_info(&pdev->dev, "%s: callback invoked\n", __func__); + + shost = pci_get_drvdata(pdev); + mrioc = shost_priv(shost); + + if (mrioc->unrecoverable) + return PCI_ERS_RESULT_DISCONNECT; + + return PCI_ERS_RESULT_RECOVERED; +} + static const struct pci_device_id mpi3mr_pci_id_table[] = { { PCI_DEVICE_SUB(MPI3_MFGPAGE_VENDORID_BROADCOM, @@ -5563,6 +5754,13 @@ static const struct pci_device_id mpi3mr_pci_id_table[] = { }; MODULE_DEVICE_TABLE(pci, mpi3mr_pci_id_table); +static struct pci_error_handlers mpi3mr_err_handler = { + .error_detected = mpi3mr_pcierr_error_detected, + .mmio_enabled = mpi3mr_pcierr_mmio_enabled, + .slot_reset = mpi3mr_pcierr_slot_reset, + .resume = mpi3mr_pcierr_resume, +}; + static SIMPLE_DEV_PM_OPS(mpi3mr_pm_ops, mpi3mr_suspend, mpi3mr_resume); static struct pci_driver mpi3mr_pci_driver = { @@ -5571,6 +5769,7 @@ static struct pci_driver mpi3mr_pci_driver = { .probe = mpi3mr_probe, .remove = mpi3mr_remove, .shutdown = mpi3mr_shutdown, + .err_handler = &mpi3mr_err_handler, .driver.pm = &mpi3mr_pm_ops, };