From patchwork Thu Sep 20 16:27:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608105 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 765B8161F for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 644262DDB5 for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 587C42E0B9; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E23F2DEA4 for ; Thu, 20 Sep 2018 16:26:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727676AbeITWKn (ORCPT ); Thu, 20 Sep 2018 18:10:43 -0400 Received: from mga07.intel.com ([134.134.136.100]:21034 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727990AbeITWKm (ORCPT ); Thu, 20 Sep 2018 18:10:42 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918162" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:25 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 01/12] PCI: portdrv: Initialize service drivers directly Date: Thu, 20 Sep 2018 10:27:06 -0600 Message-Id: <20180920162717.31066-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The PCI port driver saves the PCI state after initializing the device with the applicable service devices. This was, however, before the service drivers were even registered because pci probe happens before the device_initcall initialized those service drivers. The config space state that the services set up were not being saved. The end result would cause pci devices to not react to events that the drivers think they did if the pci state ever needed to be restored. This patch fixes this by changing the service drivers from using the init calls to having the portdrv driver calling the services directly. This will get the state saved as desired, while making the relationship between the port driver and the services under it more explicit in the code. Signed-off-by: Keith Busch --- drivers/pci/hotplug/pciehp_core.c | 3 +-- drivers/pci/pcie/aer.c | 3 +-- drivers/pci/pcie/dpc.c | 3 +-- drivers/pci/pcie/pme.c | 3 +-- drivers/pci/pcie/portdrv.h | 24 ++++++++++++++++++++++++ drivers/pci/pcie/portdrv_pci.c | 9 +++++++++ 6 files changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index 68b20e667764..944047976569 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -287,7 +287,7 @@ static struct pcie_port_service_driver hpdriver_portdrv = { #endif /* PM */ }; -static int __init pcied_init(void) +int __init pcie_hp_init(void) { int retval = 0; @@ -298,4 +298,3 @@ static int __init pcied_init(void) return retval; } -device_initcall(pcied_init); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 83180edd6ed4..637d638f73da 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1569,10 +1569,9 @@ static struct pcie_port_service_driver aerdriver = { * * Invoked when AER root service driver is loaded. */ -static int __init aer_service_init(void) +int __init pcie_aer_init(void) { if (!pci_aer_available() || aer_acpi_firmware_first()) return -ENXIO; return pcie_port_service_register(&aerdriver); } -device_initcall(aer_service_init); diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index f03279fc87cd..a1fd16bf1cab 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -282,8 +282,7 @@ static struct pcie_port_service_driver dpcdriver = { .reset_link = dpc_reset_link, }; -static int __init dpc_service_init(void) +int __init pcie_dpc_init(void) { return pcie_port_service_register(&dpcdriver); } -device_initcall(dpc_service_init); diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index 3ed67676ea2a..1a8b85051b1b 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -446,8 +446,7 @@ static struct pcie_port_service_driver pcie_pme_driver = { /** * pcie_pme_service_init - Register the PCIe PME service driver. */ -static int __init pcie_pme_service_init(void) +int __init pcie_pme_init(void) { return pcie_port_service_register(&pcie_pme_driver); } -device_initcall(pcie_pme_service_init); diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index d59afa42fc14..2498b2d34009 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -23,6 +23,30 @@ #define PCIE_PORT_DEVICE_MAXSERVICES 4 +#ifdef CONFIG_PCIEAER +int pcie_aer_init(void); +#else +static inline int pcie_aer_init(void) { return 0; } +#endif + +#ifdef CONFIG_HOTPLUG_PCI_PCIE +int pcie_hp_init(void); +#else +static inline int pcie_hp_init(void) { return 0; } +#endif + +#ifdef CONFIG_PCIE_PME +int pcie_pme_init(void); +#else +static inline int pcie_pme_init(void) { return 0; } +#endif + +#ifdef CONFIG_PCIE_DPC +int pcie_dpc_init(void); +#else +static inline int pcie_dpc_init(void) { return 0; } +#endif + /* Port Type */ #define PCIE_ANY_PORT (~0) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index eef22dc29140..23a5a0c2c3fe 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -226,11 +226,20 @@ static const struct dmi_system_id pcie_portdrv_dmi_table[] __initconst = { {} }; +static void __init pcie_init_services(void) +{ + pcie_aer_init(); + pcie_pme_init(); + pcie_dpc_init(); + pcie_hp_init(); +} + static int __init pcie_portdrv_init(void) { if (pcie_ports_disabled) return -EACCES; + pcie_init_services(); dmi_check_system(pcie_portdrv_dmi_table); return pci_register_driver(&pcie_portdriver); From patchwork Thu Sep 20 16:27:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608099 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4BE9A913 for ; Thu, 20 Sep 2018 16:26:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B9B52E00F for ; Thu, 20 Sep 2018 16:26:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F2C32E0E0; Thu, 20 Sep 2018 16:26:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B2BC2DDB5 for ; Thu, 20 Sep 2018 16:26:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727990AbeITWKn (ORCPT ); Thu, 20 Sep 2018 18:10:43 -0400 Received: from mga07.intel.com ([134.134.136.100]:21034 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726540AbeITWKn (ORCPT ); Thu, 20 Sep 2018 18:10:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918165" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:26 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 02/12] PCI: portdrv: Restore pci state on slot reset Date: Thu, 20 Sep 2018 10:27:07 -0600 Message-Id: <20180920162717.31066-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The port's config space may be cleared after a link reset, which wipes out the bridge's bus and memory windows. We need to restore the config space that was saved during probe in order to successfully access downstream devices. Signed-off-by: Keith Busch --- drivers/pci/pcie/portdrv_pci.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index 23a5a0c2c3fe..17256733fa43 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -146,6 +146,13 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, return PCI_ERS_RESULT_CAN_RECOVER; } +static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) +{ + pci_restore_state(dev); + pci_save_state(dev); + return PCI_ERS_RESULT_RECOVERED; +} + static pci_ers_result_t pcie_portdrv_mmio_enabled(struct pci_dev *dev) { return PCI_ERS_RESULT_RECOVERED; @@ -185,6 +192,7 @@ static const struct pci_device_id port_pci_ids[] = { { static const struct pci_error_handlers pcie_portdrv_err_handler = { .error_detected = pcie_portdrv_error_detected, + .slot_reset = pcie_portdrv_slot_reset, .mmio_enabled = pcie_portdrv_mmio_enabled, .resume = pcie_portdrv_err_resume, }; From patchwork Thu Sep 20 16:27:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608103 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BF84913 for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BEC62DDB5 for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 201962E100; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB5F22DDB5 for ; Thu, 20 Sep 2018 16:26:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728547AbeITWKo (ORCPT ); Thu, 20 Sep 2018 18:10:44 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726540AbeITWKo (ORCPT ); Thu, 20 Sep 2018 18:10:44 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918168" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:27 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 03/12] PCI: DPC: Save and restore control state Date: Thu, 20 Sep 2018 10:27:08 -0600 Message-Id: <20180920162717.31066-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch provides DPC save and restore capabilities. This is necessary for the driver to observe DPC events in the event the configuration space needs to be restored after a reset. Signed-off-by: Keith Busch --- drivers/pci/pci.c | 2 ++ drivers/pci/pci.h | 8 +++++++ drivers/pci/pcie/dpc.c | 61 +++++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 65 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index c4801a83fcc5..bcb70b7c190c 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1284,6 +1284,7 @@ int pci_save_state(struct pci_dev *dev) if (i != 0) return i; + pci_save_dpc_state(dev); return pci_save_vc_state(dev); } EXPORT_SYMBOL(pci_save_state); @@ -1389,6 +1390,7 @@ void pci_restore_state(struct pci_dev *dev) pci_restore_ats_state(dev); pci_restore_vc_state(dev); pci_restore_rebar_state(dev); + pci_restore_dpc_state(dev); pci_cleanup_aer_error_status_regs(dev); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 6e0d1528d471..5a5c6099b253 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -346,6 +346,14 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info); void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); #endif /* CONFIG_PCIEAER */ +#ifdef CONFIG_PCIE_DPC +void pci_save_dpc_state(struct pci_dev *dev); +void pci_restore_dpc_state(struct pci_dev *dev); +#else +void pci_save_dpc_state(struct pci_dev *dev) {} +void pci_restore_dpc_state(struct pci_dev *dev) {} +#endif + #ifdef CONFIG_PCI_ATS void pci_restore_ats_state(struct pci_dev *dev); #else diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a1fd16bf1cab..ed815a28512e 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -44,6 +44,58 @@ static const char * const rp_pio_error_string[] = { "Memory Request Completion Timeout", /* Bit Position 18 */ }; +static struct dpc_dev *to_dpc_dev(struct pci_dev *dev) +{ + struct device *device; + + device = pcie_port_find_device(dev, PCIE_PORT_SERVICE_DPC); + if (!device) + return NULL; + return get_service_data(to_pcie_device(device)); +} + +void pci_save_dpc_state(struct pci_dev *dev) +{ + struct dpc_dev *dpc; + struct pci_cap_saved_state *save_state; + u16 *cap; + + if (!pci_is_pcie(dev)) + return; + + dpc = to_dpc_dev(dev); + if (!dpc) + return; + + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC); + if (!save_state) + return; + + cap = (u16 *)&save_state->cap.data[0]; + pci_read_config_word(dev, dpc->cap_pos + PCI_EXP_DPC_CTL, cap); +} + +void pci_restore_dpc_state(struct pci_dev *dev) +{ + struct dpc_dev *dpc; + struct pci_cap_saved_state *save_state; + u16 *cap; + + if (!pci_is_pcie(dev)) + return; + + dpc = to_dpc_dev(dev); + if (!dpc) + return; + + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC); + if (!save_state) + return; + + cap = (u16 *)&save_state->cap.data[0]; + pci_write_config_word(dev, dpc->cap_pos + PCI_EXP_DPC_CTL, *cap); +} + static int dpc_wait_rp_inactive(struct dpc_dev *dpc) { unsigned long timeout = jiffies + HZ; @@ -67,18 +119,13 @@ static int dpc_wait_rp_inactive(struct dpc_dev *dpc) static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) { struct dpc_dev *dpc; - struct pcie_device *pciedev; - struct device *devdpc; - u16 cap; /* * DPC disables the Link automatically in hardware, so it has * already been reset by the time we get here. */ - devdpc = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_DPC); - pciedev = to_pcie_device(devdpc); - dpc = get_service_data(pciedev); + dpc = to_dpc_dev(pdev); cap = dpc->cap_pos; /* @@ -259,6 +306,8 @@ static int dpc_probe(struct pcie_device *dev) FLAG(cap, PCI_EXP_DPC_CAP_POISONED_TLP), FLAG(cap, PCI_EXP_DPC_CAP_SW_TRIGGER), dpc->rp_log_size, FLAG(cap, PCI_EXP_DPC_CAP_DL_ACTIVE)); + + pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_DPC, sizeof(u16)); return status; } From patchwork Thu Sep 20 16:27:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608107 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83B3217EE for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73BDB2E00F for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 67D6D2E100; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15D9F2E0E0 for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726540AbeITWKp (ORCPT ); Thu, 20 Sep 2018 18:10:45 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730149AbeITWKo (ORCPT ); Thu, 20 Sep 2018 18:10:44 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918174" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:27 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 04/12] PCI: AER: Take reference on error devices Date: Thu, 20 Sep 2018 10:27:09 -0600 Message-Id: <20180920162717.31066-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Error handling may be running in parallel with a hot removal. This patch reference counts the devices AER handling tracks so the device can not be freed while AER wants to reference it. Signed-off-by: Keith Busch --- drivers/pci/pcie/aer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 637d638f73da..ffbbd759683c 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -866,7 +866,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) { if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { - e_info->dev[e_info->error_dev_num] = dev; + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); e_info->error_dev_num++; return 0; } @@ -1013,6 +1013,7 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) pcie_do_nonfatal_recovery(dev); else if (info->severity == AER_FATAL) pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER); + pci_dev_put(dev); } #ifdef CONFIG_ACPI_APEI_PCIEAER From patchwork Thu Sep 20 16:27:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608109 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18608913 for ; Thu, 20 Sep 2018 16:26:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 088D32DDB5 for ; Thu, 20 Sep 2018 16:26:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F0C292E00F; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A88492DDB5 for ; Thu, 20 Sep 2018 16:26:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730149AbeITWKp (ORCPT ); Thu, 20 Sep 2018 18:10:45 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729361AbeITWKp (ORCPT ); Thu, 20 Sep 2018 18:10:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918179" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:28 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 05/12] PCI: AER: Don't read upstream ports below fatal errors Date: Thu, 20 Sep 2018 10:27:10 -0600 Message-Id: <20180920162717.31066-6-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The AER driver has never read the config space of an end device that reported a fatal error because the link to that device is considered unreliable. An ERR_FATAL from upstream port almost certainly detected that error on its upstream link, so we can't expect to reliably read its config space for the same reason. Signed-off-by: Keith Busch --- drivers/pci/pcie/aer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ffbbd759683c..5c3ea7254c6a 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1116,8 +1116,9 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) &info->mask); if (!(info->status & ~info->mask)) return 0; - } else if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE || - info->severity == AER_NONFATAL) { + } else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM || + info->severity == AER_NONFATAL) { /* Link is still healthy for IO reads */ pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, From patchwork Thu Sep 20 16:27:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608111 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02CE8913 for ; Thu, 20 Sep 2018 16:26:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E73902DDB5 for ; Thu, 20 Sep 2018 16:26:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB5DE2E00F; Thu, 20 Sep 2018 16:26:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EC4B2DDB5 for ; Thu, 20 Sep 2018 16:26:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731520AbeITWKq (ORCPT ); Thu, 20 Sep 2018 18:10:46 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729361AbeITWKq (ORCPT ); Thu, 20 Sep 2018 18:10:46 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918183" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:29 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 06/12] PCI: ERR: Use slot reset if available Date: Thu, 20 Sep 2018 10:27:11 -0600 Message-Id: <20180920162717.31066-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The secondary bus reset may have link side effects that a hot plug capable port may incorrectly react to. This patch will use the slot specific reset for hotplug ports, fixing the undesirable link down-up handling during error recovering. Signed-off-by: Keith Busch --- drivers/pci/pci.c | 33 +++++++++++++++++++++++++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 2 +- drivers/pci/pcie/err.c | 2 +- drivers/pci/slot.c | 2 +- 5 files changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index bcb70b7c190c..ce4dafbfe2d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5169,6 +5169,39 @@ static int pci_bus_reset(struct pci_bus *bus, int probe) return ret; } +/** + * pci_bus_error_reset - reset the bridge's subordinate bus + * @bridge: The parent device that connects to the bus to reset + * + * This function will first try to reset the slots on this bus if the method is + * available. If slot reset fails or is not available, this will fall back to a + * secondary bus reset. + */ +int pci_bus_error_reset(struct pci_dev *bridge) +{ + struct pci_bus *bus = bridge->subordinate; + + if (!bus) + return -ENOTTY; + + mutex_lock(&pci_slot_mutex); + if (!list_empty(&bus->slots)) { + struct pci_slot *slot; + + list_for_each_entry(slot, &bus->slots, list) + if (pci_probe_reset_slot(slot)) + goto bus_reset; + list_for_each_entry(slot, &bus->slots, list) + if (pci_slot_reset(slot, 0)) + goto bus_reset; + } + mutex_unlock(&pci_slot_mutex); + return 0; +bus_reset: + mutex_unlock(&pci_slot_mutex); + return pci_bus_reset(bridge->subordinate, 0); +} + /** * pci_probe_reset_bus - probe whether a PCI bus can be reset * @bus: PCI bus to probe diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5a5c6099b253..807e31e24fd0 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -35,6 +35,7 @@ int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai, int pci_probe_reset_function(struct pci_dev *dev); int pci_bridge_secondary_bus_reset(struct pci_dev *dev); +int pci_bus_error_reset(struct pci_dev *dev); /** * struct pci_platform_pm_ops - Firmware PM callbacks @@ -136,6 +137,7 @@ static inline void pci_remove_legacy_files(struct pci_bus *bus) { return; } /* Lock for read/write access to pci device and bus lists */ extern struct rw_semaphore pci_bus_sem; +extern struct mutex pci_slot_mutex; extern raw_spinlock_t pci_lock; diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 5c3ea7254c6a..1563e22600ec 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1528,7 +1528,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32); - rc = pci_bridge_secondary_bus_reset(dev); + rc = pci_bus_error_reset(dev); pci_printk(KERN_DEBUG, dev, "Root Port link has been reset\n"); /* Clear Root Error Status */ diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index cac406b6e936..62ab665f0f03 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -177,7 +177,7 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev) { int rc; - rc = pci_bridge_secondary_bus_reset(dev); + rc = pci_bus_error_reset(dev); pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; } diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index 145cd953b518..3da03fcc6fbf 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -14,7 +14,7 @@ struct kset *pci_slots_kset; EXPORT_SYMBOL_GPL(pci_slots_kset); -static DEFINE_MUTEX(pci_slot_mutex); +DEFINE_MUTEX(pci_slot_mutex); static ssize_t pci_slot_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) From patchwork Thu Sep 20 16:27:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608113 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DB44913 for ; Thu, 20 Sep 2018 16:26:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C7982E00F for ; Thu, 20 Sep 2018 16:26:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F36D82E0E0; Thu, 20 Sep 2018 16:26:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4811C2E00F for ; Thu, 20 Sep 2018 16:26:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729361AbeITWKr (ORCPT ); Thu, 20 Sep 2018 18:10:47 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeITWKr (ORCPT ); Thu, 20 Sep 2018 18:10:47 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918187" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:30 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 07/12] PCI: ERR: Handle fatal error recovery Date: Thu, 20 Sep 2018 10:27:12 -0600 Message-Id: <20180920162717.31066-8-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't need to be paranoid about the topology changing while handling an error. If the device has changed in a hotplug capable slot, we can rely on the presence detection handling to react to a changing topology. This patch restores the fatal error handling behavior that existed before merging DPC with AER with 7e9084b3674 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices"). Signed-off-by: Keith Busch --- Documentation/PCI/pci-error-recovery.txt | 35 +++++---------- drivers/pci/pci.h | 4 +- drivers/pci/pcie/aer.c | 12 +++-- drivers/pci/pcie/dpc.c | 4 +- drivers/pci/pcie/err.c | 75 +++----------------------------- 5 files changed, 28 insertions(+), 102 deletions(-) diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index 688b69121e82..0b6bb3ef449e 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt @@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error event will be platform-dependent, but will follow the general sequence described below. -STEP 0: Error Event: ERR_NONFATAL +STEP 0: Error Event ------------------- A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, @@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform proceeds to STEP 4 (Slot Reset) -STEP 3: Slot Reset +STEP 3: Link Reset +------------------ +The platform resets the link. This is a PCI-Express specific step +and is done whenever a fatal error has been detected that can be +"solved" by resetting the link. + +STEP 4: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the @@ -314,7 +320,7 @@ Failure). >>> However, it probably should. -STEP 4: Resume Operations +STEP 5: Resume Operations ------------------------- The platform will call the resume() callback on all affected device drivers if all drivers on the segment have returned @@ -326,7 +332,7 @@ a result code. At this point, if a new error happens, the platform will restart a new error recovery sequence. -STEP 5: Permanent Failure +STEP 6: Permanent Failure ------------------------- A "permanent failure" has occurred, and the platform cannot recover the device. The platform will call error_detected() with a @@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt for additional detail on real-life experience of the causes of software errors. -STEP 0: Error Event: ERR_FATAL -------------------- -PCI bus error is detected by the PCI hardware. On powerpc, the slot is -isolated, in that all I/O is blocked: all reads return 0xffffffff, all -writes are ignored. - -STEP 1: Remove devices --------------------- -Platform removes the devices depending on the error agent, it could be -this port for all subordinates or upstream component (likely downstream -port) - -STEP 2: Reset link --------------------- -The platform resets the link. This is a PCI-Express specific step and is -done whenever a fatal error has been detected that can be "solved" by -resetting the link. - -STEP 3: Re-enumerate the devices --------------------- -Initiates the re-enumeration. Conclusion; General Remarks --------------------------- diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 807e31e24fd0..028abe34a15b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -433,8 +433,8 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) #endif /* PCI error reporting and recovery */ -void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); -void pcie_do_nonfatal_recovery(struct pci_dev *dev); +void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, + u32 service); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1563e22600ec..0619ec5d7bb5 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1010,9 +1010,11 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) info->status); pci_aer_clear_device_status(dev); } else if (info->severity == AER_NONFATAL) - pcie_do_nonfatal_recovery(dev); + pcie_do_recovery(dev, pci_channel_io_normal, + PCIE_PORT_SERVICE_AER); else if (info->severity == AER_FATAL) - pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER); + pcie_do_recovery(dev, pci_channel_io_frozen, + PCIE_PORT_SERVICE_AER); pci_dev_put(dev); } @@ -1048,9 +1050,11 @@ static void aer_recover_work_func(struct work_struct *work) } cper_print_aer(pdev, entry.severity, entry.regs); if (entry.severity == AER_NONFATAL) - pcie_do_nonfatal_recovery(pdev); + pcie_do_recovery(pdev, pci_channel_io_normal, + PCIE_PORT_SERVICE_AER); else if (entry.severity == AER_FATAL) - pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_AER); + pcie_do_recovery(pdev, pci_channel_io_frozen, + PCIE_PORT_SERVICE_AER); pci_dev_put(pdev); } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index ed815a28512e..23e063aefddf 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -216,7 +216,7 @@ static irqreturn_t dpc_handler(int irq, void *context) reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1; ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5; - dev_warn(dev, "DPC %s detected, remove downstream devices\n", + dev_warn(dev, "DPC %s detected\n", (reason == 0) ? "unmasked uncorrectable error" : (reason == 1) ? "ERR_NONFATAL" : (reason == 2) ? "ERR_FATAL" : @@ -233,7 +233,7 @@ static irqreturn_t dpc_handler(int irq, void *context) } /* We configure DPC so it only triggers on ERR_FATAL */ - pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_DPC); + pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC); return IRQ_HANDLED; } diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 62ab665f0f03..644f3f725ef0 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -271,83 +271,20 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, return result_data.result; } -/** - * pcie_do_fatal_recovery - handle fatal error recovery process - * @dev: pointer to a pci_dev data structure of agent detecting an error - * - * Invoked when an error is fatal. Once being invoked, removes the devices - * beneath this AER agent, followed by reset link e.g. secondary bus reset - * followed by re-enumeration of devices. - */ -void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) -{ - struct pci_dev *udev; - struct pci_bus *parent; - struct pci_dev *pdev, *temp; - pci_ers_result_t result; - - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - udev = dev; - else - udev = dev->bus->self; - - parent = udev->subordinate; - pci_walk_bus(parent, pci_dev_set_disconnected, NULL); - - pci_lock_rescan_remove(); - pci_dev_get(dev); - list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, - bus_list) { - pci_stop_and_remove_bus_device(pdev); - } - - result = reset_link(udev, service); - - if ((service == PCIE_PORT_SERVICE_AER) && - (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - pci_aer_clear_fatal_status(dev); - pci_aer_clear_device_status(dev); - } - - if (result == PCI_ERS_RESULT_RECOVERED) { - if (pcie_wait_for_link(udev, true)) - pci_rescan_bus(udev->bus); - pci_info(dev, "Device recovery from fatal error successful\n"); - } else { - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - pci_info(dev, "Device recovery from fatal error failed\n"); - } - - pci_dev_put(dev); - pci_unlock_rescan_remove(); -} - -/** - * pcie_do_nonfatal_recovery - handle nonfatal error recovery process - * @dev: pointer to a pci_dev data structure of agent detecting an error - * - * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast - * error detected message to all downstream drivers within a hierarchy in - * question and return the returned code. - */ -void pcie_do_nonfatal_recovery(struct pci_dev *dev) +void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, + u32 service) { pci_ers_result_t status; - enum pci_channel_state state; - - state = pci_channel_io_normal; status = broadcast_error_message(dev, state, "error_detected", report_error_detected); + if (state == pci_channel_io_frozen && + reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED) + goto failed; + if (status == PCI_ERS_RESULT_CAN_RECOVER) status = broadcast_error_message(dev, state, From patchwork Thu Sep 20 16:27:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608115 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B45DA161F for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A40152E00F for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 989CC2E100; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15F522E00F for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731757AbeITWKt (ORCPT ); Thu, 20 Sep 2018 18:10:49 -0400 Received: from mga07.intel.com ([134.134.136.100]:21046 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727094AbeITWKs (ORCPT ); Thu, 20 Sep 2018 18:10:48 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918193" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:31 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 08/12] PCI: ERR: Always use the first downstream port Date: Thu, 20 Sep 2018 10:27:13 -0600 Message-Id: <20180920162717.31066-9-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The link reset always used the first bridge device, but AER broadcast error handling may have reported an end device. This means the reset may hit devices that were never notified of the impending error recovery. This patch uses the first downstream port in the hierarchy considered reliable. An error detected by a switch upstream port should mean it occurred on its upstream link, so the patch selects the parent device if the error is not a root or downstream port. This allows two other clean-ups. First, error handling can only run on bridges so this patch removes checks for end devices. Second, the first accessible port does not inherit the channel error state since we can access it, so the special cases for error detect and resume are no longer necessary. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 85 +++++++++++++------------------------------------- 1 file changed, 21 insertions(+), 64 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 644f3f725ef0..0fa5e1417a4a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -63,30 +63,12 @@ static int report_error_detected(struct pci_dev *dev, void *data) if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { - if (result_data->state == pci_channel_io_frozen && - dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { - /* - * In case of fatal recovery, if one of down- - * stream device has no driver. We might be - * unable to recover because a later insmod - * of a driver for this device is unaware of - * its hw state. - */ - pci_printk(KERN_DEBUG, dev, "device has %s\n", - dev->driver ? - "no AER-aware driver" : "no driver"); - } - /* - * If there's any device in the subtree that does not - * have an error_detected callback, returning - * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of - * the subsequent mmio_enabled/slot_reset/resume - * callbacks of "any" device in the subtree. All the - * devices in the subtree are left in the error state - * without recovery. + * If any device in the subtree does not have an error_detected + * callback, PCI_ERS_RESULT_NO_AER_DRIVER prevents subsequent + * error callbacks of "any" device in the subtree, and will + * exit in the disconnected error state. */ - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) vote = PCI_ERS_RESULT_NO_AER_DRIVER; else @@ -184,34 +166,23 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev) static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service) { - struct pci_dev *udev; pci_ers_result_t status; struct pcie_port_service_driver *driver = NULL; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* Reset this port for all subordinates */ - udev = dev; - } else { - /* Reset the upstream component (likely downstream port) */ - udev = dev->bus->self; - } - - /* Use the aer driver of the component firstly */ - driver = pcie_port_find_service(udev, service); - + driver = pcie_port_find_service(dev, service); if (driver && driver->reset_link) { - status = driver->reset_link(udev); - } else if (udev->has_secondary_link) { - status = default_reset_link(udev); + status = driver->reset_link(dev); + } else if (dev->has_secondary_link) { + status = default_reset_link(dev); } else { pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } if (status != PCI_ERS_RESULT_RECOVERED) { pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } @@ -243,31 +214,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, else result_data.result = PCI_ERS_RESULT_RECOVERED; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - if (cb == report_error_detected) - dev->error_state = state; - pci_walk_bus(dev->subordinate, cb, &result_data); - if (cb == report_resume) { - pci_aer_clear_device_status(dev); - pci_cleanup_aer_uncorrect_error_status(dev); - dev->error_state = pci_channel_io_normal; - } - } else { - /* - * If the error is reported by an end point, we think this - * error is related to the upstream link of the end point. - * The error is non fatal so the bus is ok; just invoke - * the callback for the function that logged the error. - */ - cb(dev, &result_data); - } - + pci_walk_bus(dev->subordinate, cb, &result_data); return result_data.result; } @@ -276,6 +223,14 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, { pci_ers_result_t status; + /* + * Error recovery runs on all subordinates of the first downstream port. + * If the downstream port detected the error, it is cleared at the end. + */ + if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) + dev = dev->bus->self; + status = broadcast_error_message(dev, state, "error_detected", @@ -311,6 +266,8 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, "resume", report_resume); + pci_aer_clear_device_status(dev); + pci_cleanup_aer_uncorrect_error_status(dev); pci_info(dev, "AER: Device recovery successful\n"); return; From patchwork Thu Sep 20 16:27:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608117 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D64B5174F for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C58332E0B9 for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B9D2A2E00F; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 37C992E0B9 for ; Thu, 20 Sep 2018 16:26:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727094AbeITWKt (ORCPT ); Thu, 20 Sep 2018 18:10:49 -0400 Received: from mga07.intel.com ([134.134.136.100]:21058 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730915AbeITWKt (ORCPT ); Thu, 20 Sep 2018 18:10:49 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918199" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:31 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 09/12] PCI: ERR: Simplify broadcast callouts Date: Thu, 20 Sep 2018 10:27:14 -0600 Message-Id: <20180920162717.31066-10-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is no point in having a generic broadcast function if it needs to have special cases for each callback it broadcasts. This patch abstracts the error broadcast to only the necessary information and removes the now unnecessary helper to walk the bus. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 107 ++++++++++++++++++------------------------------- 1 file changed, 38 insertions(+), 69 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 0fa5e1417a4a..362a717c831a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -19,11 +19,6 @@ #include "portdrv.h" #include "../pci.h" -struct aer_broadcast_data { - enum pci_channel_state state; - enum pci_ers_result result; -}; - static pci_ers_result_t merge_result(enum pci_ers_result orig, enum pci_ers_result new) { @@ -49,16 +44,15 @@ static pci_ers_result_t merge_result(enum pci_ers_result orig, return orig; } -static int report_error_detected(struct pci_dev *dev, void *data) +static int report_error_detected(struct pci_dev *dev, + enum pci_channel_state state, + enum pci_ers_result *result) { pci_ers_result_t vote; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); - dev->error_state = result_data->state; + dev->error_state = state; if (!dev->driver || !dev->driver->err_handler || @@ -75,22 +69,29 @@ static int report_error_detected(struct pci_dev *dev, void *data) vote = PCI_ERS_RESULT_NONE; } else { err_handler = dev->driver->err_handler; - vote = err_handler->error_detected(dev, result_data->state); + vote = err_handler->error_detected(dev, state); pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); } - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); device_unlock(&dev->dev); return 0; } +static int report_frozen_detected(struct pci_dev *dev, void *data) +{ + return report_error_detected(dev, pci_channel_io_frozen, data); +} + +static int report_normal_detected(struct pci_dev *dev, void *data) +{ + return report_error_detected(dev, pci_channel_io_normal, data); +} + static int report_mmio_enabled(struct pci_dev *dev, void *data) { - pci_ers_result_t vote; + pci_ers_result_t vote, *result = data; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); if (!dev->driver || @@ -100,7 +101,7 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; vote = err_handler->mmio_enabled(dev); - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); out: device_unlock(&dev->dev); return 0; @@ -108,11 +109,8 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data) static int report_slot_reset(struct pci_dev *dev, void *data) { - pci_ers_result_t vote; + pci_ers_result_t vote, *result = data; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); if (!dev->driver || @@ -122,7 +120,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; vote = err_handler->slot_reset(dev); - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); out: device_unlock(&dev->dev); return 0; @@ -189,39 +187,11 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service) return status; } -/** - * broadcast_error_message - handle message broadcast to downstream drivers - * @dev: pointer to from where in a hierarchy message is broadcasted down - * @state: error state - * @error_mesg: message to print - * @cb: callback to be broadcasted - * - * Invoked during error recovery process. Once being invoked, the content - * of error severity will be broadcasted to all downstream drivers in a - * hierarchy in question. - */ -static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, - enum pci_channel_state state, - char *error_mesg, - int (*cb)(struct pci_dev *, void *)) -{ - struct aer_broadcast_data result_data; - - pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); - result_data.state = state; - if (cb == report_error_detected) - result_data.result = PCI_ERS_RESULT_CAN_RECOVER; - else - result_data.result = PCI_ERS_RESULT_RECOVERED; - - pci_walk_bus(dev->subordinate, cb, &result_data); - return result_data.result; -} - void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, u32 service) { - pci_ers_result_t status; + pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + struct pci_bus *bus; /* * Error recovery runs on all subordinates of the first downstream port. @@ -230,21 +200,23 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) dev = dev->bus->self; + bus = dev->subordinate; - status = broadcast_error_message(dev, - state, - "error_detected", - report_error_detected); + pci_dbg(dev, "broadcast error_detected message\n"); + if (state == pci_channel_io_frozen) + pci_walk_bus(bus, report_frozen_detected, &status); + else + pci_walk_bus(bus, report_normal_detected, &status); if (state == pci_channel_io_frozen && reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED) goto failed; - if (status == PCI_ERS_RESULT_CAN_RECOVER) - status = broadcast_error_message(dev, - state, - "mmio_enabled", - report_mmio_enabled); + if (status == PCI_ERS_RESULT_CAN_RECOVER) { + status = PCI_ERS_RESULT_RECOVERED; + pci_dbg(dev, "broadcast mmio_enabled message\n"); + pci_walk_bus(bus, report_mmio_enabled, &status); + } if (status == PCI_ERS_RESULT_NEED_RESET) { /* @@ -252,19 +224,16 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, * functions to reset slot before calling * drivers' slot_reset callbacks? */ - status = broadcast_error_message(dev, - state, - "slot_reset", - report_slot_reset); + status = PCI_ERS_RESULT_RECOVERED; + pci_dbg(dev, "broadcast slot_reset message\n"); + pci_walk_bus(bus, report_slot_reset, &status); } if (status != PCI_ERS_RESULT_RECOVERED) goto failed; - broadcast_error_message(dev, - state, - "resume", - report_resume); + pci_dbg(dev, "broadcast resume message\n"); + pci_walk_bus(bus, report_resume, &status); pci_aer_clear_device_status(dev); pci_cleanup_aer_uncorrect_error_status(dev); From patchwork Thu Sep 20 16:27:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608119 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7E8F913 for ; Thu, 20 Sep 2018 16:26:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 985112E00F for ; Thu, 20 Sep 2018 16:26:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8C7772E0E0; Thu, 20 Sep 2018 16:26:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 447D62E00F for ; Thu, 20 Sep 2018 16:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731785AbeITWKt (ORCPT ); Thu, 20 Sep 2018 18:10:49 -0400 Received: from mga07.intel.com ([134.134.136.100]:21062 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730915AbeITWKt (ORCPT ); Thu, 20 Sep 2018 18:10:49 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918206" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:32 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 10/12] PCI: ERR: Report current recovery status for udev Date: Thu, 20 Sep 2018 10:27:15 -0600 Message-Id: <20180920162717.31066-11-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A device still participates in error recovery even if it doesn't have the error callbacks. This patch provides the status for user event watchers. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 362a717c831a..31e8a4314384 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -70,9 +70,8 @@ static int report_error_detected(struct pci_dev *dev, } else { err_handler = dev->driver->err_handler; vote = err_handler->error_detected(dev, state); - pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); } - + pci_uevent_ers(dev, vote); *result = merge_result(*result, vote); device_unlock(&dev->dev); return 0; @@ -140,8 +139,8 @@ static int report_resume(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; err_handler->resume(dev); - pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); out: + pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); device_unlock(&dev->dev); return 0; } From patchwork Thu Sep 20 16:27:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608121 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7039F161F for ; Thu, 20 Sep 2018 16:26:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 607032E00F for ; Thu, 20 Sep 2018 16:26:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 544322E0E0; Thu, 20 Sep 2018 16:26:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E81EA2E00F for ; Thu, 20 Sep 2018 16:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731832AbeITWKv (ORCPT ); Thu, 20 Sep 2018 18:10:51 -0400 Received: from mga07.intel.com ([134.134.136.100]:21062 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730915AbeITWKu (ORCPT ); Thu, 20 Sep 2018 18:10:50 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918210" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:33 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 11/12] PCI: Unify device inaccessible Date: Thu, 20 Sep 2018 10:27:16 -0600 Message-Id: <20180920162717.31066-12-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch brings surprise removals and permanent failures together so we no longer need separate flags. The implementation enforces the error handling will not be able to override a surprise removal's permanent channel failure. Signed-off-by: Keith Busch --- drivers/pci/pci.h | 60 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/pci/pcie/err.c | 10 ++++----- 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 028abe34a15b..a244bd0c5ca7 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -295,21 +295,71 @@ struct pci_sriov { bool drivers_autoprobe; /* Auto probing of VFs by driver */ }; -/* pci_dev priv_flags */ -#define PCI_DEV_DISCONNECTED 0 -#define PCI_DEV_ADDED 1 +/** + * pci_dev_set_io_state - Set the new error state if possible. + * + * @dev - pci device to set new error_state + * @new - the state we want dev to be in + * + * Must be called with device_lock held. + * + * Returns true if state has been changed to the requested state. + */ +static inline bool pci_dev_set_io_state(struct pci_dev *dev, + pci_channel_state_t new) +{ + bool changed = false; + + device_lock_assert(&dev->dev); + switch (new) { + case pci_channel_io_perm_failure: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + case pci_channel_io_perm_failure: + changed = true; + break; + } + break; + case pci_channel_io_frozen: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + changed = true; + break; + } + break; + case pci_channel_io_normal: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + changed = true; + break; + } + break; + } + if (changed) + dev->error_state = new; + return changed; +} static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) { - set_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + device_lock(&dev->dev); + pci_dev_set_io_state(dev, pci_channel_io_perm_failure); + device_unlock(&dev->dev); + return 0; } static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) { - return test_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + return dev->error_state == pci_channel_io_perm_failure; } +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 + static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) { assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 31e8a4314384..4da2a62b4f77 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -52,9 +52,8 @@ static int report_error_detected(struct pci_dev *dev, const struct pci_error_handlers *err_handler; device_lock(&dev->dev); - dev->error_state = state; - - if (!dev->driver || + if (!pci_dev_set_io_state(dev, state) || + !dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { /* @@ -130,9 +129,8 @@ static int report_resume(struct pci_dev *dev, void *data) const struct pci_error_handlers *err_handler; device_lock(&dev->dev); - dev->error_state = pci_channel_io_normal; - - if (!dev->driver || + if (!pci_dev_set_io_state(dev, pci_channel_io_normal) || + !dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->resume) goto out; From patchwork Thu Sep 20 16:27:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10608123 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AAEE913 for ; Thu, 20 Sep 2018 16:26:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A7092E00F for ; Thu, 20 Sep 2018 16:26:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E94A2E145; Thu, 20 Sep 2018 16:26:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C8B252E0B9 for ; Thu, 20 Sep 2018 16:26:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727813AbeITWKv (ORCPT ); Thu, 20 Sep 2018 18:10:51 -0400 Received: from mga07.intel.com ([134.134.136.100]:21062 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731798AbeITWKv (ORCPT ); Thu, 20 Sep 2018 18:10:51 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 09:26:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,281,1534834800"; d="scan'208";a="258918216" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga005.jf.intel.com with ESMTP; 20 Sep 2018 09:26:34 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv4 12/12] PCI: Make link active reporting detection generic Date: Thu, 20 Sep 2018 10:27:17 -0600 Message-Id: <20180920162717.31066-13-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180920162717.31066-1-keith.busch@intel.com> References: <20180920162717.31066-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The spec has timing requirements when waiting for a link to become active after a conventional reset. This patch implements those hard delays when waiting for an active link so pciehp and dpc drivers don't need to duplicate this. Since downstream ports that are HPC and DPC capable exist that do not implement data link layer active reporting, the pci driver will wait a fixed recommend time if the device reports it does not have link reporing capabilities. Signed-off-by: Keith Busch --- drivers/pci/hotplug/pciehp.h | 6 ------ drivers/pci/hotplug/pciehp_hpc.c | 22 ++-------------------- drivers/pci/pci.c | 33 +++++++++++++++++++++++++++------ drivers/pci/pcie/dpc.c | 4 +++- drivers/pci/probe.c | 1 + include/linux/pci.h | 1 + 6 files changed, 34 insertions(+), 33 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 3740f1a759c5..75fd52571107 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -62,11 +62,6 @@ do { \ * struct controller - PCIe hotplug controller * @pcie: pointer to the controller's PCIe port service device * @slot_cap: cached copy of the Slot Capabilities register - * @link_active_reporting: cached copy of Data Link Layer Link Active Reporting - * Capable bit in Link Capabilities register; if this bit is zero, the - * Data Link Layer Link Active bit in the Link Status register will never - * be set and the driver is thus confined to wait 1 second before assuming - * the link to a hotplugged device is up and accessing it * @slot_ctrl: cached copy of the Slot Control register * @ctrl_lock: serializes writes to the Slot Control register * @cmd_started: jiffies when the Slot Control register was last written; @@ -103,7 +98,6 @@ struct controller { struct pcie_device *pcie; u32 slot_cap; /* capabilities and quirks */ - unsigned int link_active_reporting:1; u16 slot_ctrl; /* control register access */ struct mutex ctrl_lock; diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7b5f9db60d9a..f0f3f4a3dac4 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -214,13 +214,6 @@ bool pciehp_check_link_active(struct controller *ctrl) return ret; } -static void pcie_wait_link_active(struct controller *ctrl) -{ - struct pci_dev *pdev = ctrl_dev(ctrl); - - pcie_wait_for_link(pdev, true); -} - static bool pci_bus_check_dev(struct pci_bus *bus, int devfn) { u32 l; @@ -253,18 +246,9 @@ int pciehp_check_link_status(struct controller *ctrl) bool found; u16 lnk_status; - /* - * Data Link Layer Link Active Reporting must be capable for - * hot-plug capable downstream port. But old controller might - * not implement it. In this case, we wait for 1000 ms. - */ - if (ctrl->link_active_reporting) - pcie_wait_link_active(ctrl); - else - msleep(1000); + if (!pcie_wait_for_link(pdev, true)) + return -1; - /* wait 100ms before read pci conf, and try in 1s */ - msleep(100); found = pci_bus_check_dev(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0)); @@ -865,8 +849,6 @@ struct controller *pcie_init(struct pcie_device *dev) /* Check if Data Link Layer Link Active Reporting is implemented */ pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap); - if (link_cap & PCI_EXP_LNKCAP_DLLLARC) - ctrl->link_active_reporting = 1; /* Clear all remaining event bits in Slot Status register. */ pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ce4dafbfe2d1..2b4117011313 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4501,21 +4501,42 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active) bool ret; u16 lnk_status; + /* + * Some controllers might not implement link active reporting. In this + * case, we wait for 1000 + 100 ms. + */ + if (!pdev->link_active_reporting) { + msleep(1100); + return true; + } + + /* + * PCIe 4.0r1 6.6.1, a component must enter LTSSM Detect within 20ms, + * after which we should expect an link active if the reset was + * successful. If so, software must wait a minimum 100ms before sending + * configuration requests to devices downstream this port. + * + * If the link fails to activate, either the device was physically + * removed or the link is permanently failed. + */ + if (active) + msleep(20); for (;;) { pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); if (ret == active) - return true; + break; if (timeout <= 0) break; msleep(10); timeout -= 10; } - - pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", - active ? "set" : "cleared"); - - return false; + if (active && ret) + msleep(100); + else if (ret != active) + pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", + active ? "set" : "cleared"); + return ret == active; } void pci_reset_secondary_bus(struct pci_dev *dev) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 23e063aefddf..e435d12e61a0 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -140,10 +140,12 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); + if (!pcie_wait_for_link(pdev, true)) + return PCI_ERS_RESULT_DISCONNECT; + return PCI_ERS_RESULT_RECOVERED; } - static void dpc_process_rp_pio_error(struct dpc_dev *dpc) { struct device *dev = &dpc->dev->device; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7c422ccbf9b4..ea2a6289e513 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -713,6 +713,7 @@ static void pci_set_bus_speed(struct pci_bus *bus) pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap); bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS]; + bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC); pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta); pcie_update_link_speed(bus, linksta); diff --git a/include/linux/pci.h b/include/linux/pci.h index 6925828f9f25..896b42032ec5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -402,6 +402,7 @@ struct pci_dev { unsigned int has_secondary_link:1; unsigned int non_compliant_bars:1; /* Broken BARs; ignore them */ unsigned int is_probed:1; /* Device probing in progress */ + unsigned int link_active_reporting:1;/* Device capable of reporting link active */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */