Message ID | 1525323838-1735-6-git-send-email-poza@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi Oza, Thank you for the patch! Yet something to improve: [auto build test ERROR on pci/next] [also build test ERROR on v4.17-rc3 next-20180503] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oza-Pawandeep/Address-error-and-recovery-for-AER-and-DPC/20180504-050017 base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next config: x86_64-randconfig-x015-201817 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/pci/pcie/err.c: In function 'report_error_detected': >> drivers/pci/pcie/err.c:98:3: error: implicit declaration of function 'pci_uevent_ers'; did you mean 'pci_select_bars'? [-Werror=implicit-function-declaration] pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); ^~~~~~~~~~~~~~ pci_select_bars drivers/pci/pcie/err.c: In function 'reset_link': drivers/pci/pcie/err.c:203:5: warning: 'driver' is used uninitialized in this function [-Wuninitialized] if (driver && driver->reset_link) { ^ cc1: some warnings being treated as errors vim +98 drivers/pci/pcie/err.c 52 53 static int report_error_detected(struct pci_dev *dev, void *data) 54 { 55 pci_ers_result_t vote; 56 const struct pci_error_handlers *err_handler; 57 struct aer_broadcast_data *result_data; 58 59 result_data = (struct aer_broadcast_data *) data; 60 61 device_lock(&dev->dev); 62 dev->error_state = result_data->state; 63 64 if (!dev->driver || 65 !dev->driver->err_handler || 66 !dev->driver->err_handler->error_detected) { 67 if (result_data->state == pci_channel_io_frozen && 68 dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { 69 /* 70 * In case of fatal recovery, if one of down- 71 * stream device has no driver. We might be 72 * unable to recover because a later insmod 73 * of a driver for this device is unaware of 74 * its hw state. 75 */ 76 pci_printk(KERN_DEBUG, dev, "device has %s\n", 77 dev->driver ? 78 "no AER-aware driver" : "no driver"); 79 } 80 81 /* 82 * If there's any device in the subtree that does not 83 * have an error_detected callback, returning 84 * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of 85 * the subsequent mmio_enabled/slot_reset/resume 86 * callbacks of "any" device in the subtree. All the 87 * devices in the subtree are left in the error state 88 * without recovery. 89 */ 90 91 if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) 92 vote = PCI_ERS_RESULT_NO_AER_DRIVER; 93 else 94 vote = PCI_ERS_RESULT_NONE; 95 } else { 96 err_handler = dev->driver->err_handler; 97 vote = err_handler->error_detected(dev, result_data->state); > 98 pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); 99 } 100 101 result_data->result = merge_result(result_data->result, vote); 102 device_unlock(&dev->dev); 103 return 0; 104 } 105 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Oza, Thank you for the patch! Yet something to improve: [auto build test ERROR on pci/next] [also build test ERROR on v4.17-rc3 next-20180503] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oza-Pawandeep/Address-error-and-recovery-for-AER-and-DPC/20180504-050017 base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next config: i386-randconfig-a1-05030941 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/pci/pcie/err.c: In function 'report_error_detected': >> drivers/pci/pcie/err.c:98:3: error: implicit declaration of function 'pci_uevent_ers' [-Werror=implicit-function-declaration] pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); ^ drivers/pci/pcie/err.c: In function 'reset_link': drivers/pci/pcie/err.c:203:5: warning: 'driver' is used uninitialized in this function [-Wuninitialized] if (driver && driver->reset_link) { ^ cc1: some warnings being treated as errors vim +/pci_uevent_ers +98 drivers/pci/pcie/err.c 52 53 static int report_error_detected(struct pci_dev *dev, void *data) 54 { 55 pci_ers_result_t vote; 56 const struct pci_error_handlers *err_handler; 57 struct aer_broadcast_data *result_data; 58 59 result_data = (struct aer_broadcast_data *) data; 60 61 device_lock(&dev->dev); 62 dev->error_state = result_data->state; 63 64 if (!dev->driver || 65 !dev->driver->err_handler || 66 !dev->driver->err_handler->error_detected) { 67 if (result_data->state == pci_channel_io_frozen && 68 dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { 69 /* 70 * In case of fatal recovery, if one of down- 71 * stream device has no driver. We might be 72 * unable to recover because a later insmod 73 * of a driver for this device is unaware of 74 * its hw state. 75 */ 76 pci_printk(KERN_DEBUG, dev, "device has %s\n", 77 dev->driver ? 78 "no AER-aware driver" : "no driver"); 79 } 80 81 /* 82 * If there's any device in the subtree that does not 83 * have an error_detected callback, returning 84 * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of 85 * the subsequent mmio_enabled/slot_reset/resume 86 * callbacks of "any" device in the subtree. All the 87 * devices in the subtree are left in the error state 88 * without recovery. 89 */ 90 91 if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) 92 vote = PCI_ERS_RESULT_NO_AER_DRIVER; 93 else 94 vote = PCI_ERS_RESULT_NONE; 95 } else { 96 err_handler = dev->driver->err_handler; 97 vote = err_handler->error_detected(dev, result_data->state); > 98 pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); 99 } 100 101 result_data->result = merge_result(result_data->result, vote); 102 device_unlock(&dev->dev); 103 return 0; 104 } 105 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 2018-05-03 10:33, Oza Pawandeep wrote: > This patch factors out error reporting callbacks, which are currently > tightly coupled with AER. > > DPC should be able to register callbacks and attempt recovery when DPC > trigger event occurs. > > Signed-off-by: Oza Pawandeep <poza@codeaurora.org> > > diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile > index 800e1d4..03f4e0b 100644 > --- a/drivers/pci/pcie/Makefile > +++ b/drivers/pci/pcie/Makefile > @@ -2,7 +2,7 @@ > # > # Makefile for PCI Express features and port driver > > -pcieportdrv-y := portdrv_core.o portdrv_pci.o > +pcieportdrv-y := portdrv_core.o portdrv_pci.o err.o > > obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o > > diff --git a/drivers/pci/pcie/aer/aerdrv.h > b/drivers/pci/pcie/aer/aerdrv.h > index 08b4584..b4c9506 100644 > --- a/drivers/pci/pcie/aer/aerdrv.h > +++ b/drivers/pci/pcie/aer/aerdrv.h > @@ -76,36 +76,6 @@ struct aer_rpc { > */ > }; > > -struct aer_broadcast_data { > - enum pci_channel_state state; > - enum pci_ers_result result; > -}; > - > -static inline pci_ers_result_t merge_result(enum pci_ers_result orig, > - enum pci_ers_result new) > -{ > - if (new == PCI_ERS_RESULT_NO_AER_DRIVER) > - return PCI_ERS_RESULT_NO_AER_DRIVER; > - > - if (new == PCI_ERS_RESULT_NONE) > - return orig; > - > - switch (orig) { > - case PCI_ERS_RESULT_CAN_RECOVER: > - case PCI_ERS_RESULT_RECOVERED: > - orig = new; > - break; > - case PCI_ERS_RESULT_DISCONNECT: > - if (new == PCI_ERS_RESULT_NEED_RESET) > - orig = PCI_ERS_RESULT_NEED_RESET; > - break; > - default: > - break; > - } > - > - return orig; > -} > - > extern struct bus_type pcie_port_bus_type; > void aer_isr(struct work_struct *work); > void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c > b/drivers/pci/pcie/aer/aerdrv_core.c > index be4ee3b..51515d1 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -228,191 +228,6 @@ static bool find_source_device(struct pci_dev > *parent, > return true; > } > > -static int report_error_detected(struct pci_dev *dev, void *data) > -{ > - pci_ers_result_t vote; > - const struct pci_error_handlers *err_handler; > - struct aer_broadcast_data *result_data; > - result_data = (struct aer_broadcast_data *) data; > - > - device_lock(&dev->dev); > - dev->error_state = result_data->state; > - > - if (!dev->driver || > - !dev->driver->err_handler || > - !dev->driver->err_handler->error_detected) { > - if (result_data->state == pci_channel_io_frozen && > - dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { > - /* > - * In case of fatal recovery, if one of down- > - * stream device has no driver. We might be > - * unable to recover because a later insmod > - * of a driver for this device is unaware of > - * its hw state. > - */ > - pci_printk(KERN_DEBUG, dev, "device has %s\n", > - dev->driver ? > - "no AER-aware driver" : "no driver"); > - } > - > - /* > - * If there's any device in the subtree that does not > - * have an error_detected callback, returning > - * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of > - * the subsequent mmio_enabled/slot_reset/resume > - * callbacks of "any" device in the subtree. All the > - * devices in the subtree are left in the error state > - * without recovery. > - */ > - > - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) > - vote = PCI_ERS_RESULT_NO_AER_DRIVER; > - else > - vote = PCI_ERS_RESULT_NONE; > - } else { > - err_handler = dev->driver->err_handler; > - vote = err_handler->error_detected(dev, result_data->state); > - pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); > - } > - > - result_data->result = merge_result(result_data->result, vote); > - device_unlock(&dev->dev); > - return 0; > -} > - > -static int report_mmio_enabled(struct pci_dev *dev, void *data) > -{ > - pci_ers_result_t vote; > - const struct pci_error_handlers *err_handler; > - struct aer_broadcast_data *result_data; > - result_data = (struct aer_broadcast_data *) data; > - > - device_lock(&dev->dev); > - if (!dev->driver || > - !dev->driver->err_handler || > - !dev->driver->err_handler->mmio_enabled) > - goto out; > - > - err_handler = dev->driver->err_handler; > - vote = err_handler->mmio_enabled(dev); > - result_data->result = merge_result(result_data->result, vote); > -out: > - device_unlock(&dev->dev); > - return 0; > -} > - > -static int report_slot_reset(struct pci_dev *dev, void *data) > -{ > - pci_ers_result_t vote; > - const struct pci_error_handlers *err_handler; > - struct aer_broadcast_data *result_data; > - result_data = (struct aer_broadcast_data *) data; > - > - device_lock(&dev->dev); > - if (!dev->driver || > - !dev->driver->err_handler || > - !dev->driver->err_handler->slot_reset) > - goto out; > - > - err_handler = dev->driver->err_handler; > - vote = err_handler->slot_reset(dev); > - result_data->result = merge_result(result_data->result, vote); > -out: > - device_unlock(&dev->dev); > - return 0; > -} > - > -static int report_resume(struct pci_dev *dev, void *data) > -{ > - const struct pci_error_handlers *err_handler; > - > - device_lock(&dev->dev); > - dev->error_state = pci_channel_io_normal; > - > - if (!dev->driver || > - !dev->driver->err_handler || > - !dev->driver->err_handler->resume) > - goto out; > - > - err_handler = dev->driver->err_handler; > - err_handler->resume(dev); > - pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); > -out: > - device_unlock(&dev->dev); > - return 0; > -} > - > -/** > - * broadcast_error_message - handle message broadcast to downstream > drivers > - * @dev: pointer to from where in a hierarchy message is broadcasted > down > - * @state: error state > - * @error_mesg: message to print > - * @cb: callback to be broadcasted > - * > - * Invoked during error recovery process. Once being invoked, the > content > - * of error severity will be broadcasted to all downstream drivers in > a > - * hierarchy in question. > - */ > -static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, > - enum pci_channel_state state, > - char *error_mesg, > - int (*cb)(struct pci_dev *, void *)) > -{ > - struct aer_broadcast_data result_data; > - > - pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); > - result_data.state = state; > - if (cb == report_error_detected) > - result_data.result = PCI_ERS_RESULT_CAN_RECOVER; > - else > - result_data.result = PCI_ERS_RESULT_RECOVERED; > - > - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > - /* > - * If the error is reported by a bridge, we think this error > - * is related to the downstream link of the bridge, so we > - * do error recovery on all subordinates of the bridge instead > - * of the bridge and clear the error status of the bridge. > - */ > - if (cb == report_error_detected) > - dev->error_state = state; > - pci_walk_bus(dev->subordinate, cb, &result_data); > - if (cb == report_resume) { > - pci_cleanup_aer_uncorrect_error_status(dev); > - dev->error_state = pci_channel_io_normal; > - } > - } else { > - /* > - * If the error is reported by an end point, we think this > - * error is related to the upstream link of the end point. > - */ > - if (state == pci_channel_io_normal) > - /* > - * the error is non fatal so the bus is ok, just invoke > - * the callback for the function that logged the error. > - */ > - cb(dev, &result_data); > - else > - pci_walk_bus(dev->bus, cb, &result_data); > - } > - > - return result_data.result; > -} > - > -/** > - * default_reset_link - default reset function > - * @dev: pointer to pci_dev data structure > - * > - * Invoked when performing link reset on a Downstream Port or a > - * Root Port with no aer driver. > - */ > -static pci_ers_result_t default_reset_link(struct pci_dev *dev) > -{ > - pci_reset_bridge_secondary_bus(dev); > - pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); > - return PCI_ERS_RESULT_RECOVERED; > -} > - > static int find_aer_service_iter(struct device *device, void *data) > { > struct pcie_port_service_driver *service_driver, **drv; > @@ -430,7 +245,7 @@ static int find_aer_service_iter(struct device > *device, void *data) > return 0; > } > > -static struct pcie_port_service_driver *find_aer_service(struct > pci_dev *dev) > +struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev) > { > struct pcie_port_service_driver *drv = NULL; > > @@ -439,143 +254,6 @@ static struct pcie_port_service_driver > *find_aer_service(struct pci_dev *dev) > return drv; > } > > -static pci_ers_result_t reset_link(struct pci_dev *dev) > -{ > - struct pci_dev *udev; > - pci_ers_result_t status; > - struct pcie_port_service_driver *driver; > - > - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > - /* Reset this port for all subordinates */ > - udev = dev; > - } else { > - /* Reset the upstream component (likely downstream port) */ > - udev = dev->bus->self; > - } > - > - /* Use the aer driver of the component firstly */ > - driver = find_aer_service(udev); > - > - if (driver && driver->reset_link) { > - status = driver->reset_link(udev); > - } else if (udev->has_secondary_link) { > - status = default_reset_link(udev); > - } else { > - pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream > device %s\n", > - pci_name(udev)); > - return PCI_ERS_RESULT_DISCONNECT; > - } > - > - if (status != PCI_ERS_RESULT_RECOVERED) { > - pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s > failed\n", > - pci_name(udev)); > - return PCI_ERS_RESULT_DISCONNECT; > - } > - > - return status; > -} > - > -static pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > int severity) > -{ > - struct pci_dev *udev; > - struct pci_bus *parent; > - struct pci_dev *pdev, *temp; > - pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; > - > - if (severity == AER_FATAL) > - pci_cleanup_aer_uncorrect_error_status(dev); > - > - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > - udev = dev; > - else > - udev = dev->bus->self; > - > - parent = udev->subordinate; > - pci_lock_rescan_remove(); > - list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > - bus_list) { > - pci_dev_get(pdev); > - pci_dev_set_disconnected(pdev, NULL); > - if (pci_has_subordinate(pdev)) > - pci_walk_bus(pdev->subordinate, > - pci_dev_set_disconnected, NULL); > - pci_stop_and_remove_bus_device(pdev); > - pci_dev_put(pdev); > - } > - > - result = reset_link(udev); > - if (result == PCI_ERS_RESULT_RECOVERED) > - if (pcie_wait_for_link(udev, true)) > - pci_rescan_bus(udev->bus); > - > - pci_unlock_rescan_remove(); > - > - return result; > -} > - > -/** > - * pcie_do_recovery - handle nonfatal/fatal error recovery process > - * @dev: pointer to a pci_dev data structure of agent detecting an > error > - * @severity: error severity type > - * > - * Invoked when an error is nonfatal/fatal. Once being invoked, > broadcast > - * error detected message to all downstream drivers within a hierarchy > in > - * question and return the returned code. > - */ > -void pcie_do_recovery(struct pci_dev *dev, int severity) > -{ > - pci_ers_result_t status; > - enum pci_channel_state state; > - > - if (severity == AER_FATAL) { > - status = pcie_do_fatal_recovery(dev, severity); > - if (status != PCI_ERS_RESULT_RECOVERED) > - goto failed; > - return; > - } > - else > - state = pci_channel_io_normal; > - > - status = broadcast_error_message(dev, > - state, > - "error_detected", > - report_error_detected); > - > - if (status == PCI_ERS_RESULT_CAN_RECOVER) > - status = broadcast_error_message(dev, > - state, > - "mmio_enabled", > - report_mmio_enabled); > - > - if (status == PCI_ERS_RESULT_NEED_RESET) { > - /* > - * TODO: Should call platform-specific > - * functions to reset slot before calling > - * drivers' slot_reset callbacks? > - */ > - status = broadcast_error_message(dev, > - state, > - "slot_reset", > - report_slot_reset); > - } > - > - if (status != PCI_ERS_RESULT_RECOVERED) > - goto failed; > - > - broadcast_error_message(dev, > - state, > - "resume", > - report_resume); > - > - pci_info(dev, "AER: Device recovery successful\n"); > - return; > - > -failed: > - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > - /* TODO: Should kernel panic here? */ > - pci_info(dev, "AER: Device recovery failed\n"); > -} > - > /** > * handle_error_source - handle logging error into an event log > * @aerdev: pointer to pcie_device data structure of the root port > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > new file mode 100644 > index 0000000..55df974 > --- /dev/null > +++ b/drivers/pci/pcie/err.c > @@ -0,0 +1,377 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * This file implements the error recovery as a core part of PCIe > error > + * reporting. When a PCIe error is delivered, an error message will be > + * collected and printed to console, then, an error recovery procedure > + * will be executed by following the PCI error recovery rules. > + * > + * Copyright (C) 2006 Intel Corp. > + * Tom Long Nguyen (tom.l.nguyen@intel.com) > + * Zhang Yanmin (yanmin.zhang@intel.com) > + * > + */ > + > +#include <linux/pci.h> > +#include <linux/module.h> > +#include <linux/pci.h> > +#include <linux/kernel.h> > +#include <linux/errno.h> > +#include <linux/aer.h> > +#include "portdrv.h" > +#include "../pci.h" > + > +struct aer_broadcast_data { > + enum pci_channel_state state; > + enum pci_ers_result result; > +}; > + > +static pci_ers_result_t merge_result(enum pci_ers_result orig, > + enum pci_ers_result new) > +{ > + if (new == PCI_ERS_RESULT_NO_AER_DRIVER) > + return PCI_ERS_RESULT_NO_AER_DRIVER; > + > + if (new == PCI_ERS_RESULT_NONE) > + return orig; > + > + switch (orig) { > + case PCI_ERS_RESULT_CAN_RECOVER: > + case PCI_ERS_RESULT_RECOVERED: > + orig = new; > + break; > + case PCI_ERS_RESULT_DISCONNECT: > + if (new == PCI_ERS_RESULT_NEED_RESET) > + orig = PCI_ERS_RESULT_NEED_RESET; > + break; > + default: > + break; > + } > + > + return orig; > +} > + > +static int report_error_detected(struct pci_dev *dev, void *data) > +{ > + pci_ers_result_t vote; > + const struct pci_error_handlers *err_handler; > + struct aer_broadcast_data *result_data; > + > + result_data = (struct aer_broadcast_data *) data; > + > + device_lock(&dev->dev); > + dev->error_state = result_data->state; > + > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->error_detected) { > + if (result_data->state == pci_channel_io_frozen && > + dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { > + /* > + * In case of fatal recovery, if one of down- > + * stream device has no driver. We might be > + * unable to recover because a later insmod > + * of a driver for this device is unaware of > + * its hw state. > + */ > + pci_printk(KERN_DEBUG, dev, "device has %s\n", > + dev->driver ? > + "no AER-aware driver" : "no driver"); > + } > + > + /* > + * If there's any device in the subtree that does not > + * have an error_detected callback, returning > + * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of > + * the subsequent mmio_enabled/slot_reset/resume > + * callbacks of "any" device in the subtree. All the > + * devices in the subtree are left in the error state > + * without recovery. > + */ > + > + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) > + vote = PCI_ERS_RESULT_NO_AER_DRIVER; > + else > + vote = PCI_ERS_RESULT_NONE; > + } else { > + err_handler = dev->driver->err_handler; > + vote = err_handler->error_detected(dev, result_data->state); > + pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); > + } > + > + result_data->result = merge_result(result_data->result, vote); > + device_unlock(&dev->dev); > + return 0; > +} > + > +static int report_mmio_enabled(struct pci_dev *dev, void *data) > +{ > + pci_ers_result_t vote; > + const struct pci_error_handlers *err_handler; > + struct aer_broadcast_data *result_data; > + > + result_data = (struct aer_broadcast_data *) data; > + > + device_lock(&dev->dev); > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->mmio_enabled) > + goto out; > + > + err_handler = dev->driver->err_handler; > + vote = err_handler->mmio_enabled(dev); > + result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > + return 0; > +} > + > +static int report_slot_reset(struct pci_dev *dev, void *data) > +{ > + pci_ers_result_t vote; > + const struct pci_error_handlers *err_handler; > + struct aer_broadcast_data *result_data; > + > + result_data = (struct aer_broadcast_data *) data; > + > + device_lock(&dev->dev); > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->slot_reset) > + goto out; > + > + err_handler = dev->driver->err_handler; > + vote = err_handler->slot_reset(dev); > + result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > + return 0; > +} > + > +static int report_resume(struct pci_dev *dev, void *data) > +{ > + const struct pci_error_handlers *err_handler; > + > + device_lock(&dev->dev); > + dev->error_state = pci_channel_io_normal; > + > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->resume) > + goto out; > + > + err_handler = dev->driver->err_handler; > + err_handler->resume(dev); > + pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); > +out: > + device_unlock(&dev->dev); > + return 0; > +} > + > +/** > + * default_reset_link - default reset function > + * @dev: pointer to pci_dev data structure > + * > + * Invoked when performing link reset on a Downstream Port or a > + * Root Port with no aer driver. > + */ > +static pci_ers_result_t default_reset_link(struct pci_dev *dev) > +{ > + pci_reset_bridge_secondary_bus(dev); > + pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); > + return PCI_ERS_RESULT_RECOVERED; > +} > + > +static pci_ers_result_t reset_link(struct pci_dev *dev) > +{ > + struct pci_dev *udev; > + pci_ers_result_t status; > + struct pcie_port_service_driver *driver; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + /* Reset this port for all subordinates */ > + udev = dev; > + } else { > + /* Reset the upstream component (likely downstream port) */ > + udev = dev->bus->self; > + } > + > +#if IS_ENABLED(CONFIG_PCIEAER) > + /* Use the aer driver of the component firstly */ > + driver = find_aer_service(udev); > +#endif > + > + if (driver && driver->reset_link) { > + status = driver->reset_link(udev); > + } else if (udev->has_secondary_link) { > + status = default_reset_link(udev); > + } else { > + pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream > device %s\n", > + pci_name(udev)); > + return PCI_ERS_RESULT_DISCONNECT; > + } > + > + if (status != PCI_ERS_RESULT_RECOVERED) { > + pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s > failed\n", > + pci_name(udev)); > + return PCI_ERS_RESULT_DISCONNECT; > + } > + > + return status; > +} > + > +/** > + * broadcast_error_message - handle message broadcast to downstream > drivers > + * @dev: pointer to from where in a hierarchy message is broadcasted > down > + * @state: error state > + * @error_mesg: message to print > + * @cb: callback to be broadcasted > + * > + * Invoked during error recovery process. Once being invoked, the > content > + * of error severity will be broadcasted to all downstream drivers in > a > + * hierarchy in question. > + */ > +static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, > + enum pci_channel_state state, > + char *error_mesg, > + int (*cb)(struct pci_dev *, void *)) > +{ > + struct aer_broadcast_data result_data; > + > + pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); > + result_data.state = state; > + if (cb == report_error_detected) > + result_data.result = PCI_ERS_RESULT_CAN_RECOVER; > + else > + result_data.result = PCI_ERS_RESULT_RECOVERED; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + /* > + * If the error is reported by a bridge, we think this error > + * is related to the downstream link of the bridge, so we > + * do error recovery on all subordinates of the bridge instead > + * of the bridge and clear the error status of the bridge. > + */ > + if (cb == report_error_detected) > + dev->error_state = state; > + pci_walk_bus(dev->subordinate, cb, &result_data); > + if (cb == report_resume) { > + pci_cleanup_aer_uncorrect_error_status(dev); > + dev->error_state = pci_channel_io_normal; > + } > + } else { > + /* > + * If the error is reported by an end point, we think this > + * error is related to the upstream link of the end point. > + */ > + if (state == pci_channel_io_normal) > + /* > + * the error is non fatal so the bus is ok, just invoke > + * the callback for the function that logged the error. > + */ > + cb(dev, &result_data); > + else > + pci_walk_bus(dev->bus, cb, &result_data); > + } > + > + return result_data.result; > +} > + > +static pci_ers_result_t do_fatal_recovery(struct pci_dev *dev, int > severity) > +{ > + struct pci_dev *udev; > + struct pci_bus *parent; > + struct pci_dev *pdev, *temp; > + pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; > + > + if (severity == AER_FATAL) > + pci_cleanup_aer_uncorrect_error_status(dev); > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > + udev = dev; > + else > + udev = dev->bus->self; > + > + parent = udev->subordinate; > + pci_lock_rescan_remove(); > + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > + bus_list) { > + pci_dev_get(pdev); > + pci_dev_set_disconnected(pdev, NULL); > + if (pci_has_subordinate(pdev)) > + pci_walk_bus(pdev->subordinate, > + pci_dev_set_disconnected, NULL); > + pci_stop_and_remove_bus_device(pdev); > + pci_dev_put(pdev); > + } > + > + result = reset_link(udev); > + if (result == PCI_ERS_RESULT_RECOVERED) > + if (pcie_wait_for_link(udev, true)) > + pci_rescan_bus(udev->bus); > + > + pci_unlock_rescan_remove(); > + > + return result; > +} > + > +/** > + * pcie_do_recovery - handle nonfatal/fatal error recovery process > + * @dev: pointer to a pci_dev data structure of agent detecting an > error > + * @severity: error severity type > + * > + * Invoked when an error is nonfatal/fatal. Once being invoked, > broadcast > + * error detected message to all downstream drivers within a hierarchy > in > + * question and return the returned code. > + */ > +void pcie_do_recovery(struct pci_dev *dev, int severity) > +{ > + pci_ers_result_t status; > + enum pci_channel_state state; > + > + if (severity == AER_FATAL) { > + status = do_fatal_recovery(dev, severity); > + if (status != PCI_ERS_RESULT_RECOVERED) > + goto failed; > + return; > + } else > + state = pci_channel_io_normal; > + > + status = broadcast_error_message(dev, > + state, > + "error_detected", > + report_error_detected); > + > + if (status == PCI_ERS_RESULT_CAN_RECOVER) > + status = broadcast_error_message(dev, > + state, > + "mmio_enabled", > + report_mmio_enabled); > + > + if (status == PCI_ERS_RESULT_NEED_RESET) { > + /* > + * TODO: Should call platform-specific > + * functions to reset slot before calling > + * drivers' slot_reset callbacks? > + */ > + status = broadcast_error_message(dev, > + state, > + "slot_reset", > + report_slot_reset); > + } > + > + if (status != PCI_ERS_RESULT_RECOVERED) > + goto failed; > + > + broadcast_error_message(dev, > + state, > + "resume", > + report_resume); > + > + pci_info(dev, "AER: Device recovery successful\n"); > + return; > + > +failed: > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > + /* TODO: Should kernel panic here? */ > + pci_info(dev, "AER: Device recovery failed\n"); > +} > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > index d0c6783..47c9824 100644 > --- a/drivers/pci/pcie/portdrv.h > +++ b/drivers/pci/pcie/portdrv.h > @@ -112,4 +112,5 @@ static inline bool pcie_pme_no_msi(void) { return > false; } > static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool > en) {} > #endif /* !CONFIG_PCIE_PME */ > > +struct pcie_port_service_driver *find_aer_service(struct pci_dev > *dev); > #endif /* _PORTDRV_H_ */ Hi Bjorn, I will be fixing kbuild error (for x86) along with the comments you might have. Regards, Oza.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index 800e1d4..03f4e0b 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -2,7 +2,7 @@ # # Makefile for PCI Express features and port driver -pcieportdrv-y := portdrv_core.o portdrv_pci.o +pcieportdrv-y := portdrv_core.o portdrv_pci.o err.o obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h index 08b4584..b4c9506 100644 --- a/drivers/pci/pcie/aer/aerdrv.h +++ b/drivers/pci/pcie/aer/aerdrv.h @@ -76,36 +76,6 @@ struct aer_rpc { */ }; -struct aer_broadcast_data { - enum pci_channel_state state; - enum pci_ers_result result; -}; - -static inline pci_ers_result_t merge_result(enum pci_ers_result orig, - enum pci_ers_result new) -{ - if (new == PCI_ERS_RESULT_NO_AER_DRIVER) - return PCI_ERS_RESULT_NO_AER_DRIVER; - - if (new == PCI_ERS_RESULT_NONE) - return orig; - - switch (orig) { - case PCI_ERS_RESULT_CAN_RECOVER: - case PCI_ERS_RESULT_RECOVERED: - orig = new; - break; - case PCI_ERS_RESULT_DISCONNECT: - if (new == PCI_ERS_RESULT_NEED_RESET) - orig = PCI_ERS_RESULT_NEED_RESET; - break; - default: - break; - } - - return orig; -} - extern struct bus_type pcie_port_bus_type; void aer_isr(struct work_struct *work); void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index be4ee3b..51515d1 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -228,191 +228,6 @@ static bool find_source_device(struct pci_dev *parent, return true; } -static int report_error_detected(struct pci_dev *dev, void *data) -{ - pci_ers_result_t vote; - const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - result_data = (struct aer_broadcast_data *) data; - - device_lock(&dev->dev); - dev->error_state = result_data->state; - - if (!dev->driver || - !dev->driver->err_handler || - !dev->driver->err_handler->error_detected) { - if (result_data->state == pci_channel_io_frozen && - dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { - /* - * In case of fatal recovery, if one of down- - * stream device has no driver. We might be - * unable to recover because a later insmod - * of a driver for this device is unaware of - * its hw state. - */ - pci_printk(KERN_DEBUG, dev, "device has %s\n", - dev->driver ? - "no AER-aware driver" : "no driver"); - } - - /* - * If there's any device in the subtree that does not - * have an error_detected callback, returning - * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of - * the subsequent mmio_enabled/slot_reset/resume - * callbacks of "any" device in the subtree. All the - * devices in the subtree are left in the error state - * without recovery. - */ - - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) - vote = PCI_ERS_RESULT_NO_AER_DRIVER; - else - vote = PCI_ERS_RESULT_NONE; - } else { - err_handler = dev->driver->err_handler; - vote = err_handler->error_detected(dev, result_data->state); - pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); - } - - result_data->result = merge_result(result_data->result, vote); - device_unlock(&dev->dev); - return 0; -} - -static int report_mmio_enabled(struct pci_dev *dev, void *data) -{ - pci_ers_result_t vote; - const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - result_data = (struct aer_broadcast_data *) data; - - device_lock(&dev->dev); - if (!dev->driver || - !dev->driver->err_handler || - !dev->driver->err_handler->mmio_enabled) - goto out; - - err_handler = dev->driver->err_handler; - vote = err_handler->mmio_enabled(dev); - result_data->result = merge_result(result_data->result, vote); -out: - device_unlock(&dev->dev); - return 0; -} - -static int report_slot_reset(struct pci_dev *dev, void *data) -{ - pci_ers_result_t vote; - const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - result_data = (struct aer_broadcast_data *) data; - - device_lock(&dev->dev); - if (!dev->driver || - !dev->driver->err_handler || - !dev->driver->err_handler->slot_reset) - goto out; - - err_handler = dev->driver->err_handler; - vote = err_handler->slot_reset(dev); - result_data->result = merge_result(result_data->result, vote); -out: - device_unlock(&dev->dev); - return 0; -} - -static int report_resume(struct pci_dev *dev, void *data) -{ - const struct pci_error_handlers *err_handler; - - device_lock(&dev->dev); - dev->error_state = pci_channel_io_normal; - - if (!dev->driver || - !dev->driver->err_handler || - !dev->driver->err_handler->resume) - goto out; - - err_handler = dev->driver->err_handler; - err_handler->resume(dev); - pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); -out: - device_unlock(&dev->dev); - return 0; -} - -/** - * broadcast_error_message - handle message broadcast to downstream drivers - * @dev: pointer to from where in a hierarchy message is broadcasted down - * @state: error state - * @error_mesg: message to print - * @cb: callback to be broadcasted - * - * Invoked during error recovery process. Once being invoked, the content - * of error severity will be broadcasted to all downstream drivers in a - * hierarchy in question. - */ -static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, - enum pci_channel_state state, - char *error_mesg, - int (*cb)(struct pci_dev *, void *)) -{ - struct aer_broadcast_data result_data; - - pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); - result_data.state = state; - if (cb == report_error_detected) - result_data.result = PCI_ERS_RESULT_CAN_RECOVER; - else - result_data.result = PCI_ERS_RESULT_RECOVERED; - - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - if (cb == report_error_detected) - dev->error_state = state; - pci_walk_bus(dev->subordinate, cb, &result_data); - if (cb == report_resume) { - pci_cleanup_aer_uncorrect_error_status(dev); - dev->error_state = pci_channel_io_normal; - } - } else { - /* - * If the error is reported by an end point, we think this - * error is related to the upstream link of the end point. - */ - if (state == pci_channel_io_normal) - /* - * the error is non fatal so the bus is ok, just invoke - * the callback for the function that logged the error. - */ - cb(dev, &result_data); - else - pci_walk_bus(dev->bus, cb, &result_data); - } - - return result_data.result; -} - -/** - * default_reset_link - default reset function - * @dev: pointer to pci_dev data structure - * - * Invoked when performing link reset on a Downstream Port or a - * Root Port with no aer driver. - */ -static pci_ers_result_t default_reset_link(struct pci_dev *dev) -{ - pci_reset_bridge_secondary_bus(dev); - pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); - return PCI_ERS_RESULT_RECOVERED; -} - static int find_aer_service_iter(struct device *device, void *data) { struct pcie_port_service_driver *service_driver, **drv; @@ -430,7 +245,7 @@ static int find_aer_service_iter(struct device *device, void *data) return 0; } -static struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev) +struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev) { struct pcie_port_service_driver *drv = NULL; @@ -439,143 +254,6 @@ static struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev) return drv; } -static pci_ers_result_t reset_link(struct pci_dev *dev) -{ - struct pci_dev *udev; - pci_ers_result_t status; - struct pcie_port_service_driver *driver; - - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* Reset this port for all subordinates */ - udev = dev; - } else { - /* Reset the upstream component (likely downstream port) */ - udev = dev->bus->self; - } - - /* Use the aer driver of the component firstly */ - driver = find_aer_service(udev); - - if (driver && driver->reset_link) { - status = driver->reset_link(udev); - } else if (udev->has_secondary_link) { - status = default_reset_link(udev); - } else { - pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n", - pci_name(udev)); - return PCI_ERS_RESULT_DISCONNECT; - } - - if (status != PCI_ERS_RESULT_RECOVERED) { - pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n", - pci_name(udev)); - return PCI_ERS_RESULT_DISCONNECT; - } - - return status; -} - -static pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, int severity) -{ - struct pci_dev *udev; - struct pci_bus *parent; - struct pci_dev *pdev, *temp; - pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; - - if (severity == AER_FATAL) - pci_cleanup_aer_uncorrect_error_status(dev); - - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - udev = dev; - else - udev = dev->bus->self; - - parent = udev->subordinate; - pci_lock_rescan_remove(); - list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, - bus_list) { - pci_dev_get(pdev); - pci_dev_set_disconnected(pdev, NULL); - if (pci_has_subordinate(pdev)) - pci_walk_bus(pdev->subordinate, - pci_dev_set_disconnected, NULL); - pci_stop_and_remove_bus_device(pdev); - pci_dev_put(pdev); - } - - result = reset_link(udev); - if (result == PCI_ERS_RESULT_RECOVERED) - if (pcie_wait_for_link(udev, true)) - pci_rescan_bus(udev->bus); - - pci_unlock_rescan_remove(); - - return result; -} - -/** - * pcie_do_recovery - handle nonfatal/fatal error recovery process - * @dev: pointer to a pci_dev data structure of agent detecting an error - * @severity: error severity type - * - * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast - * error detected message to all downstream drivers within a hierarchy in - * question and return the returned code. - */ -void pcie_do_recovery(struct pci_dev *dev, int severity) -{ - pci_ers_result_t status; - enum pci_channel_state state; - - if (severity == AER_FATAL) { - status = pcie_do_fatal_recovery(dev, severity); - if (status != PCI_ERS_RESULT_RECOVERED) - goto failed; - return; - } - else - state = pci_channel_io_normal; - - status = broadcast_error_message(dev, - state, - "error_detected", - report_error_detected); - - if (status == PCI_ERS_RESULT_CAN_RECOVER) - status = broadcast_error_message(dev, - state, - "mmio_enabled", - report_mmio_enabled); - - if (status == PCI_ERS_RESULT_NEED_RESET) { - /* - * TODO: Should call platform-specific - * functions to reset slot before calling - * drivers' slot_reset callbacks? - */ - status = broadcast_error_message(dev, - state, - "slot_reset", - report_slot_reset); - } - - if (status != PCI_ERS_RESULT_RECOVERED) - goto failed; - - broadcast_error_message(dev, - state, - "resume", - report_resume); - - pci_info(dev, "AER: Device recovery successful\n"); - return; - -failed: - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - /* TODO: Should kernel panic here? */ - pci_info(dev, "AER: Device recovery failed\n"); -} - /** * handle_error_source - handle logging error into an event log * @aerdev: pointer to pcie_device data structure of the root port diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c new file mode 100644 index 0000000..55df974 --- /dev/null +++ b/drivers/pci/pcie/err.c @@ -0,0 +1,377 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This file implements the error recovery as a core part of PCIe error + * reporting. When a PCIe error is delivered, an error message will be + * collected and printed to console, then, an error recovery procedure + * will be executed by following the PCI error recovery rules. + * + * Copyright (C) 2006 Intel Corp. + * Tom Long Nguyen (tom.l.nguyen@intel.com) + * Zhang Yanmin (yanmin.zhang@intel.com) + * + */ + +#include <linux/pci.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/kernel.h> +#include <linux/errno.h> +#include <linux/aer.h> +#include "portdrv.h" +#include "../pci.h" + +struct aer_broadcast_data { + enum pci_channel_state state; + enum pci_ers_result result; +}; + +static pci_ers_result_t merge_result(enum pci_ers_result orig, + enum pci_ers_result new) +{ + if (new == PCI_ERS_RESULT_NO_AER_DRIVER) + return PCI_ERS_RESULT_NO_AER_DRIVER; + + if (new == PCI_ERS_RESULT_NONE) + return orig; + + switch (orig) { + case PCI_ERS_RESULT_CAN_RECOVER: + case PCI_ERS_RESULT_RECOVERED: + orig = new; + break; + case PCI_ERS_RESULT_DISCONNECT: + if (new == PCI_ERS_RESULT_NEED_RESET) + orig = PCI_ERS_RESULT_NEED_RESET; + break; + default: + break; + } + + return orig; +} + +static int report_error_detected(struct pci_dev *dev, void *data) +{ + pci_ers_result_t vote; + const struct pci_error_handlers *err_handler; + struct aer_broadcast_data *result_data; + + result_data = (struct aer_broadcast_data *) data; + + device_lock(&dev->dev); + dev->error_state = result_data->state; + + if (!dev->driver || + !dev->driver->err_handler || + !dev->driver->err_handler->error_detected) { + if (result_data->state == pci_channel_io_frozen && + dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { + /* + * In case of fatal recovery, if one of down- + * stream device has no driver. We might be + * unable to recover because a later insmod + * of a driver for this device is unaware of + * its hw state. + */ + pci_printk(KERN_DEBUG, dev, "device has %s\n", + dev->driver ? + "no AER-aware driver" : "no driver"); + } + + /* + * If there's any device in the subtree that does not + * have an error_detected callback, returning + * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of + * the subsequent mmio_enabled/slot_reset/resume + * callbacks of "any" device in the subtree. All the + * devices in the subtree are left in the error state + * without recovery. + */ + + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) + vote = PCI_ERS_RESULT_NO_AER_DRIVER; + else + vote = PCI_ERS_RESULT_NONE; + } else { + err_handler = dev->driver->err_handler; + vote = err_handler->error_detected(dev, result_data->state); + pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); + } + + result_data->result = merge_result(result_data->result, vote); + device_unlock(&dev->dev); + return 0; +} + +static int report_mmio_enabled(struct pci_dev *dev, void *data) +{ + pci_ers_result_t vote; + const struct pci_error_handlers *err_handler; + struct aer_broadcast_data *result_data; + + result_data = (struct aer_broadcast_data *) data; + + device_lock(&dev->dev); + if (!dev->driver || + !dev->driver->err_handler || + !dev->driver->err_handler->mmio_enabled) + goto out; + + err_handler = dev->driver->err_handler; + vote = err_handler->mmio_enabled(dev); + result_data->result = merge_result(result_data->result, vote); +out: + device_unlock(&dev->dev); + return 0; +} + +static int report_slot_reset(struct pci_dev *dev, void *data) +{ + pci_ers_result_t vote; + const struct pci_error_handlers *err_handler; + struct aer_broadcast_data *result_data; + + result_data = (struct aer_broadcast_data *) data; + + device_lock(&dev->dev); + if (!dev->driver || + !dev->driver->err_handler || + !dev->driver->err_handler->slot_reset) + goto out; + + err_handler = dev->driver->err_handler; + vote = err_handler->slot_reset(dev); + result_data->result = merge_result(result_data->result, vote); +out: + device_unlock(&dev->dev); + return 0; +} + +static int report_resume(struct pci_dev *dev, void *data) +{ + const struct pci_error_handlers *err_handler; + + device_lock(&dev->dev); + dev->error_state = pci_channel_io_normal; + + if (!dev->driver || + !dev->driver->err_handler || + !dev->driver->err_handler->resume) + goto out; + + err_handler = dev->driver->err_handler; + err_handler->resume(dev); + pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); +out: + device_unlock(&dev->dev); + return 0; +} + +/** + * default_reset_link - default reset function + * @dev: pointer to pci_dev data structure + * + * Invoked when performing link reset on a Downstream Port or a + * Root Port with no aer driver. + */ +static pci_ers_result_t default_reset_link(struct pci_dev *dev) +{ + pci_reset_bridge_secondary_bus(dev); + pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); + return PCI_ERS_RESULT_RECOVERED; +} + +static pci_ers_result_t reset_link(struct pci_dev *dev) +{ + struct pci_dev *udev; + pci_ers_result_t status; + struct pcie_port_service_driver *driver; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + /* Reset this port for all subordinates */ + udev = dev; + } else { + /* Reset the upstream component (likely downstream port) */ + udev = dev->bus->self; + } + +#if IS_ENABLED(CONFIG_PCIEAER) + /* Use the aer driver of the component firstly */ + driver = find_aer_service(udev); +#endif + + if (driver && driver->reset_link) { + status = driver->reset_link(udev); + } else if (udev->has_secondary_link) { + status = default_reset_link(udev); + } else { + pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n", + pci_name(udev)); + return PCI_ERS_RESULT_DISCONNECT; + } + + if (status != PCI_ERS_RESULT_RECOVERED) { + pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n", + pci_name(udev)); + return PCI_ERS_RESULT_DISCONNECT; + } + + return status; +} + +/** + * broadcast_error_message - handle message broadcast to downstream drivers + * @dev: pointer to from where in a hierarchy message is broadcasted down + * @state: error state + * @error_mesg: message to print + * @cb: callback to be broadcasted + * + * Invoked during error recovery process. Once being invoked, the content + * of error severity will be broadcasted to all downstream drivers in a + * hierarchy in question. + */ +static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, + enum pci_channel_state state, + char *error_mesg, + int (*cb)(struct pci_dev *, void *)) +{ + struct aer_broadcast_data result_data; + + pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); + result_data.state = state; + if (cb == report_error_detected) + result_data.result = PCI_ERS_RESULT_CAN_RECOVER; + else + result_data.result = PCI_ERS_RESULT_RECOVERED; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + /* + * If the error is reported by a bridge, we think this error + * is related to the downstream link of the bridge, so we + * do error recovery on all subordinates of the bridge instead + * of the bridge and clear the error status of the bridge. + */ + if (cb == report_error_detected) + dev->error_state = state; + pci_walk_bus(dev->subordinate, cb, &result_data); + if (cb == report_resume) { + pci_cleanup_aer_uncorrect_error_status(dev); + dev->error_state = pci_channel_io_normal; + } + } else { + /* + * If the error is reported by an end point, we think this + * error is related to the upstream link of the end point. + */ + if (state == pci_channel_io_normal) + /* + * the error is non fatal so the bus is ok, just invoke + * the callback for the function that logged the error. + */ + cb(dev, &result_data); + else + pci_walk_bus(dev->bus, cb, &result_data); + } + + return result_data.result; +} + +static pci_ers_result_t do_fatal_recovery(struct pci_dev *dev, int severity) +{ + struct pci_dev *udev; + struct pci_bus *parent; + struct pci_dev *pdev, *temp; + pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; + + if (severity == AER_FATAL) + pci_cleanup_aer_uncorrect_error_status(dev); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + udev = dev; + else + udev = dev->bus->self; + + parent = udev->subordinate; + pci_lock_rescan_remove(); + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, + bus_list) { + pci_dev_get(pdev); + pci_dev_set_disconnected(pdev, NULL); + if (pci_has_subordinate(pdev)) + pci_walk_bus(pdev->subordinate, + pci_dev_set_disconnected, NULL); + pci_stop_and_remove_bus_device(pdev); + pci_dev_put(pdev); + } + + result = reset_link(udev); + if (result == PCI_ERS_RESULT_RECOVERED) + if (pcie_wait_for_link(udev, true)) + pci_rescan_bus(udev->bus); + + pci_unlock_rescan_remove(); + + return result; +} + +/** + * pcie_do_recovery - handle nonfatal/fatal error recovery process + * @dev: pointer to a pci_dev data structure of agent detecting an error + * @severity: error severity type + * + * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast + * error detected message to all downstream drivers within a hierarchy in + * question and return the returned code. + */ +void pcie_do_recovery(struct pci_dev *dev, int severity) +{ + pci_ers_result_t status; + enum pci_channel_state state; + + if (severity == AER_FATAL) { + status = do_fatal_recovery(dev, severity); + if (status != PCI_ERS_RESULT_RECOVERED) + goto failed; + return; + } else + state = pci_channel_io_normal; + + status = broadcast_error_message(dev, + state, + "error_detected", + report_error_detected); + + if (status == PCI_ERS_RESULT_CAN_RECOVER) + status = broadcast_error_message(dev, + state, + "mmio_enabled", + report_mmio_enabled); + + if (status == PCI_ERS_RESULT_NEED_RESET) { + /* + * TODO: Should call platform-specific + * functions to reset slot before calling + * drivers' slot_reset callbacks? + */ + status = broadcast_error_message(dev, + state, + "slot_reset", + report_slot_reset); + } + + if (status != PCI_ERS_RESULT_RECOVERED) + goto failed; + + broadcast_error_message(dev, + state, + "resume", + report_resume); + + pci_info(dev, "AER: Device recovery successful\n"); + return; + +failed: + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + /* TODO: Should kernel panic here? */ + pci_info(dev, "AER: Device recovery failed\n"); +} diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index d0c6783..47c9824 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -112,4 +112,5 @@ static inline bool pcie_pme_no_msi(void) { return false; } static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {} #endif /* !CONFIG_PCIE_PME */ +struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev); #endif /* _PORTDRV_H_ */
This patch factors out error reporting callbacks, which are currently tightly coupled with AER. DPC should be able to register callbacks and attempt recovery when DPC trigger event occurs. Signed-off-by: Oza Pawandeep <poza@codeaurora.org>