Message ID | 20180831212639.10196-14-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI, error handling and hot plug | expand |
On Fri, Aug 31, 2018 at 03:26:36PM -0600, Keith Busch wrote: > + /* > + * Shutdown notification to ignore hotplug events during error > + * handling. We'll recheck the status when error handling completes. > + * > + * It is possible link event related to this error handling may have > + * triggered a the hotplug interrupt ahead of this notification, but we > + * can't do anything about that race. > + */ > + pcie_shutdown_notification(ctrl); No, if you look at pciehp_remove() you'll notice that I call pci_hp_del() before pcie_shutdown_notification(). The IRQ thread is needed to respond to user requests to enable/disable the slot. If you just release the IRQ, the sysfs interface is still there but will no longer function while the reset is handled. Not good. I think what you want to do instead is "down_write(&ctrl->reset_lock)", see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset"). And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does. > +static void pciehp_slot_reset(struct pci_dev *dev) [...] > + /* > + * Cache presence detect change, but ignore other hotplug events that > + * may occur during error handling. Ports that implement only in-band > + * presence detection may inadvertently believe the device has changed, > + * so those devices will have to be re-enumerated. > + */ > + pciehp_get_adapter_changed(ctrl->slot, &changed); > + pcie_clear_hotplug_events(ctrl); > + if (changed) > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > + pcie_init_notification(ctrl); > +} Hm, can we be certain that error handling does not affect PDC? Because pciehp_reset_slot() does mask it and Sinan once said that in-band presence detect may cause presence to flap as well during reset: https://lkml.org/lkml/2018/7/3/693 Thanks, Lukas
On Sun, Sep 02, 2018 at 12:39:55PM +0200, Lukas Wunner wrote: > On Fri, Aug 31, 2018 at 03:26:36PM -0600, Keith Busch wrote: > > + /* > > + * Shutdown notification to ignore hotplug events during error > > + * handling. We'll recheck the status when error handling completes. > > + * > > + * It is possible link event related to this error handling may have > > + * triggered a the hotplug interrupt ahead of this notification, but we > > + * can't do anything about that race. > > + */ > > + pcie_shutdown_notification(ctrl); > > No, if you look at pciehp_remove() you'll notice that I call pci_hp_del() > before pcie_shutdown_notification(). The IRQ thread is needed to respond > to user requests to enable/disable the slot. If you just release the IRQ, > the sysfs interface is still there but will no longer function while the > reset is handled. Not good. > > I think what you want to do instead is "down_write(&ctrl->reset_lock)", > see commit 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset"). > And possibly mask PDCE/DLLSCE like pciehp_reset_slot() does. > > > > +static void pciehp_slot_reset(struct pci_dev *dev) > [...] > > + /* > > + * Cache presence detect change, but ignore other hotplug events that > > + * may occur during error handling. Ports that implement only in-band > > + * presence detection may inadvertently believe the device has changed, > > + * so those devices will have to be re-enumerated. > > + */ > > + pciehp_get_adapter_changed(ctrl->slot, &changed); > > + pcie_clear_hotplug_events(ctrl); > > + if (changed) > > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > > + pcie_init_notification(ctrl); > > +} > > Hm, can we be certain that error handling does not affect PDC? No, I'm actually saying exactly the opposite in the code comments. The error handling may affect PDC if the port doesn't have an out-of-band presence detection capability. > Because pciehp_reset_slot() does mask it and Sinan once said that in-band > presence detect may cause presence to flap as well during reset: > https://lkml.org/lkml/2018/7/3/693 Right, but how do you know which PDC is okay to ignore and which one isn't? Hotplug events freqently trigger other error handling so we can't just ignore hot plug during error handling. Link events should be safe to ignore, though.
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 811cf83f956d..27abfb504b35 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -195,6 +195,7 @@ void pciehp_get_attention_status(struct slot *slot, u8 *status); void pciehp_set_attention_status(struct slot *slot, u8 status); void pciehp_get_latch_status(struct slot *slot, u8 *status); void pciehp_get_adapter_status(struct slot *slot, u8 *status); +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed); int pciehp_query_power_fault(struct slot *slot); void pciehp_green_led_on(struct slot *slot); void pciehp_green_led_off(struct slot *slot); diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ec48c9433ae5..acb4d864b4e7 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -301,6 +301,55 @@ static void pciehp_remove(struct pcie_device *dev) pciehp_release_ctrl(ctrl); } +static void pciehp_error_detected(struct pci_dev *dev) +{ + struct controller *ctrl; + struct pcie_device *pcie_dev; + struct device *device = pcie_port_find_device(dev, PCIE_PORT_SERVICE_HP); + + if (!device) + return; + + pcie_dev = to_pcie_device(device); + ctrl = get_service_data(pcie_dev); + + /* + * Shutdown notification to ignore hotplug events during error + * handling. We'll recheck the status when error handling completes. + * + * It is possible link event related to this error handling may have + * triggered a the hotplug interrupt ahead of this notification, but we + * can't do anything about that race. + */ + pcie_shutdown_notification(ctrl); +} + +static void pciehp_slot_reset(struct pci_dev *dev) +{ + u8 changed; + struct controller *ctrl; + struct pcie_device *pcie_dev; + struct device *device = pcie_port_find_device(dev, PCIE_PORT_SERVICE_HP); + + if (!device) + return; + + pcie_dev = to_pcie_device(device); + ctrl = get_service_data(pcie_dev); + + /* + * Cache presence detect change, but ignore other hotplug events that + * may occur during error handling. Ports that implement only in-band + * presence detection may inadvertently believe the device has changed, + * so those devices will have to be re-enumerated. + */ + pciehp_get_adapter_changed(ctrl->slot, &changed); + pcie_clear_hotplug_events(ctrl); + if (changed) + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); + pcie_init_notification(ctrl); +} + #ifdef CONFIG_PM static int pciehp_suspend(struct pcie_device *dev) { @@ -340,6 +389,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = { .probe = pciehp_probe, .remove = pciehp_remove, + .error_detected = pciehp_error_detected, + .slot_reset = pciehp_slot_reset, #ifdef CONFIG_PM .suspend = pciehp_suspend, diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index c6116e516e1e..7c43336e08ba 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -398,6 +398,15 @@ void pciehp_get_adapter_status(struct slot *slot, u8 *status) *status = !!(slot_status & PCI_EXP_SLTSTA_PDS); } +void pciehp_get_adapter_changed(struct slot *slot, u8 *changed) +{ + struct pci_dev *pdev = ctrl_dev(slot->ctrl); + u16 slot_status; + + pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); + *changed = !!(slot_status & PCI_EXP_SLTSTA_PDC); +} + int pciehp_query_power_fault(struct slot *slot) { struct pci_dev *pdev = ctrl_dev(slot->ctrl);
Error handling may trigger spurious link events, which may trigger hotplug handling to re-enumerate the topology. This patch temporarily disables notification during such error handling and checks the topology for changes after reset. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/hotplug/pciehp.h | 1 + drivers/pci/hotplug/pciehp_core.c | 51 +++++++++++++++++++++++++++++++++++++++ drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++ 3 files changed, 61 insertions(+)