From patchwork Fri Aug 31 21:26:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10584461 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3FE75A4 for ; Fri, 31 Aug 2018 21:25:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C83E2C6E4 for ; Fri, 31 Aug 2018 21:25:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90E412C6E8; Fri, 31 Aug 2018 21:25:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B1F82C6EA for ; Fri, 31 Aug 2018 21:25:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727410AbeIABep (ORCPT ); Fri, 31 Aug 2018 21:34:45 -0400 Received: from mga12.intel.com ([192.55.52.136]:14083 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727405AbeIABep (ORCPT ); Fri, 31 Aug 2018 21:34:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Aug 2018 14:25:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,313,1531810800"; d="scan'208";a="86663821" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga001.jf.intel.com with ESMTP; 31 Aug 2018 14:25:25 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Keith Busch Subject: [PATCH 06/16] PCI/ERR: Remove devices on recovery failure Date: Fri, 31 Aug 2018 15:26:29 -0600 Message-Id: <20180831212639.10196-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180831212639.10196-1-keith.busch@intel.com> References: <20180831212639.10196-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch removes devices connected through a bus that can't recover from an error. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 44c55f7ceb39..45f574954fd6 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -166,6 +166,15 @@ static int report_resume(struct pci_dev *dev, void *data) return 0; } +static int report_disconnect(struct pci_dev *dev, void *data) +{ + device_lock(&dev->dev); + pci_dev_set_disconnected(dev, NULL); + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + device_unlock(&dev->dev); + return 0; +} + /** * default_reset_link - default reset function * @dev: pointer to pci_dev data structure @@ -271,6 +280,34 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, return result_data.result; } +/** + * pcie_disconnect_device - Called when error handling ends with + * PCI_ERS_RESULT_DISCONNECT status. + * + * Reaching here means error handling has irrevocably failed. This function + * will ungracefully disconnect all the devices below the bus that has + * experienced the unrecoverable error. + * + * If the link is active after the removing all devices on the bus, this will + * attempt to re-enumerate the bus from scratch. + */ +static void pcie_disconnect_device(struct pci_dev *dev) +{ + struct pci_bus *bus = dev->subordinate; + struct pci_dev *child, *tmp; + + broadcast_error_message(dev, PCI_ERS_RESULT_DISCONNECT, + "disconnect", report_disconnect); + pci_lock_rescan_remove(); + list_for_each_entry_safe(child, tmp, &bus->devices, bus_list) + pci_stop_and_remove_bus_device(child); + + pci_bridge_secondary_bus_reset(dev); + if (pcie_wait_for_link(dev, true)) + pci_rescan_bus(bus); + pci_unlock_rescan_remove(); +} + static void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, u32 service) { @@ -313,12 +350,9 @@ static void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, pci_info(dev, "AER: Device recovery successful\n"); return; - failed: - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - - /* TODO: Should kernel panic here? */ pci_info(dev, "AER: Device recovery failed\n"); + pcie_disconnect_device(dev); } void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)