Message ID | 20180831212639.10196-15-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI, error handling and hot plug | expand |
On 8/31/2018 2:26 PM, Keith Busch wrote: > This is safe because the pciehp and DPC drivers share the same > interrupt. The DPC driver sets the bus state in the top-half interrupt > context, and the pciehp driver checks and masks off link events in its > bottom-half error handler. Where is this coming from? Is there a spec reference? DPC and HP interrupts can be implemented as MSI-x interrupts and could be unrelated interrupt IDs?
On Fri, Aug 31, 2018 at 03:18:15PM -0700, Sinan Kaya wrote: > On 8/31/2018 2:26 PM, Keith Busch wrote: > > This is safe because the pciehp and DPC drivers share the same > > interrupt. The DPC driver sets the bus state in the top-half interrupt > > context, and the pciehp driver checks and masks off link events in its > > bottom-half error handler. > > Where is this coming from? Is there a spec reference? > > DPC and HP interrupts can be implemented as MSI-x interrupts and could > be unrelated interrupt IDs? Darn, you're right. The kernel allocates up to 32 vectors and the port is free to divvy them up however it wants amont its supported services. It just so happened most of the ports I tested used the same one. There's no way to really close this race if they are separate vectors though, so maybe just leave this as a 'best effort' approach and update the change log accodingly.
On 8/31/2018 3:33 PM, Keith Busch wrote: > Darn, you're right. The kernel allocates up to 32 vectors and the port is > free to divvy them up however it wants amont its supported services. It > just so happened most of the ports I tested used the same one. There's > no way to really close this race if they are separate vectors though, > so maybe just leave this as a 'best effort' approach and update the > change log accodingly. or take the big hammer and move them into a single workqueue?
On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote: > On 8/31/2018 3:33 PM, Keith Busch wrote: > > Darn, you're right. The kernel allocates up to 32 vectors and the port is > > free to divvy them up however it wants amont its supported services. It > > just so happened most of the ports I tested used the same one. There's > > no way to really close this race if they are separate vectors though, > > so maybe just leave this as a 'best effort' approach and update the > > change log accodingly. > > or take the big hammer and move them into a single workqueue? That wouldn't really help if they're different interrupt vectors since they could happen in either order with a delay between the CPU seeing each of them.
On 8/31/2018 3:59 PM, Keith Busch wrote: > On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote: >> On 8/31/2018 3:33 PM, Keith Busch wrote: >>> Darn, you're right. The kernel allocates up to 32 vectors and the port is >>> free to divvy them up however it wants amont its supported services. It >>> just so happened most of the ports I tested used the same one. There's >>> no way to really close this race if they are separate vectors though, >>> so maybe just leave this as a 'best effort' approach and update the >>> change log accodingly. >> >> or take the big hammer and move them into a single workqueue? > > That wouldn't really help if they're different interrupt vectors since > they could happen in either order with a delay between the CPU seeing > each of them. > I was trying to make it look like single interrupt in software even though they are sourced from different interrupt handlers. But, you are right. Interrupts can arrive in arbitrary order.
On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote: > This patch adds a channel state to a subordinate bus. When a DPC event is > triggered, the DPC driver will set the channel state to frozen, and the > pciehp driver will ignore link events if the subordinate bus is being > managed by DPC error handling. > > This is safe because the pciehp and DPC drivers share the same > interrupt. The DPC driver sets the bus state in the top-half interrupt > context, and the pciehp driver checks and masks off link events in its > bottom-half error handler. I really liked Sinan's approach of checking in pciehp whether a fatal error is pending and waiting for it to be handled: https://patchwork.ozlabs.org/patch/959464/ This seemed to avoid any races with DPC and is small and simple. Can we pursue a solution along those lines? Thanks, Lukas
On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote: > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote: > > This patch adds a channel state to a subordinate bus. When a DPC event is > > triggered, the DPC driver will set the channel state to frozen, and the > > pciehp driver will ignore link events if the subordinate bus is being > > managed by DPC error handling. > > > > This is safe because the pciehp and DPC drivers share the same > > interrupt. The DPC driver sets the bus state in the top-half interrupt > > context, and the pciehp driver checks and masks off link events in its > > bottom-half error handler. > > I really liked Sinan's approach of checking in pciehp whether a fatal > error is pending and waiting for it to be handled: > https://patchwork.ozlabs.org/patch/959464/ > > This seemed to avoid any races with DPC and is small and simple. > Can we pursue a solution along those lines? That introduces a completely different race between the error handling and hotplug threads. We don't control which interrupt fires first or any way ensure they're even the same event.
On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote: > On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote: > > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote: > > > This patch adds a channel state to a subordinate bus. When a DPC event is > > > triggered, the DPC driver will set the channel state to frozen, and the > > > pciehp driver will ignore link events if the subordinate bus is being > > > managed by DPC error handling. > > > > > > This is safe because the pciehp and DPC drivers share the same > > > interrupt. The DPC driver sets the bus state in the top-half interrupt > > > context, and the pciehp driver checks and masks off link events in its > > > bottom-half error handler. > > > > I really liked Sinan's approach of checking in pciehp whether a fatal > > error is pending and waiting for it to be handled: > > https://patchwork.ozlabs.org/patch/959464/ > > > > This seemed to avoid any races with DPC and is small and simple. > > Can we pursue a solution along those lines? > > That introduces a completely different race between the error handling > and hotplug threads. We don't control which interrupt fires first or > any way ensure they're even the same event. pciehp may react quicker than dpc, hence needs to determine a fatal error is pending without relying on dpc. My understanding is that this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from pciehp. For the case when dpc reacts quicker and clears the error before pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional synchronization mechanism between dpc and pciehp, such as a flag that is set by dpc before clearing the error, and that is checked by pciehp. Though you need to take care that pciehp does not see a stale flag when the next error occurs. Thanks, Lukas
On Tue, Sep 04, 2018 at 04:40:14PM +0200, Lukas Wunner wrote: > On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote: > > On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote: > > > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote: > > > > This patch adds a channel state to a subordinate bus. When a DPC event is > > > > triggered, the DPC driver will set the channel state to frozen, and the > > > > pciehp driver will ignore link events if the subordinate bus is being > > > > managed by DPC error handling. > > > > > > > > This is safe because the pciehp and DPC drivers share the same > > > > interrupt. The DPC driver sets the bus state in the top-half interrupt > > > > context, and the pciehp driver checks and masks off link events in its > > > > bottom-half error handler. > > > > > > I really liked Sinan's approach of checking in pciehp whether a fatal > > > error is pending and waiting for it to be handled: > > > https://patchwork.ozlabs.org/patch/959464/ > > > > > > This seemed to avoid any races with DPC and is small and simple. > > > Can we pursue a solution along those lines? > > > > That introduces a completely different race between the error handling > > and hotplug threads. We don't control which interrupt fires first or > > any way ensure they're even the same event. > > pciehp may react quicker than dpc, hence needs to determine a fatal > error is pending without relying on dpc. My understanding is that > this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from > pciehp. That's only true if the bridge detects ERR_FATAL, which is one of several ways to trigger DPC or AER. If the message comes from the end device, then PCI_EXP_DEVSTA_FED won't be set in the bridge that pciehp can read. > For the case when dpc reacts quicker and clears the error before > pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional > synchronization mechanism between dpc and pciehp, such as a flag > that is set by dpc before clearing the error, and that is checked > by pciehp. Though you need to take care that pciehp does not see > a stale flag when the next error occurs. Yes, the pci_bus error_state this patch creates was intended to be that flag.
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7c43336e08ba..a928dcccf78b 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -643,6 +643,15 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) synchronize_hardirq(irq); events = atomic_xchg(&ctrl->pending_events, 0); + + /* + * Ignore link events on the suborinate bus if error handling is + * active, as link down may be expected. We'll continue to handle + * presence detect changes. + */ + if (pdev->subordinate && pci_bus_offline(pdev->subordinate)) + events &= ~PCI_EXP_SLTSTA_DLLSC; + if (!events) { pci_config_pm_runtime_put(pdev); return IRQ_NONE; diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index eadfd835af13..93ce26191a2b 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -92,7 +92,8 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); - + if (pdev->subordinate) + pdev->subordinate->error_state = pci_channel_io_normal; return PCI_ERS_RESULT_RECOVERED; } @@ -204,8 +205,11 @@ static irqreturn_t dpc_irq(int irq, void *context) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_INTERRUPT); - if (status & PCI_EXP_DPC_STATUS_TRIGGER) + if (status & PCI_EXP_DPC_STATUS_TRIGGER) { + if (pdev->subordinate) + pdev->subordinate->error_state = pci_channel_io_frozen; return IRQ_WAKE_THREAD; + } return IRQ_HANDLED; } diff --git a/include/linux/pci.h b/include/linux/pci.h index e72ca8dd6241..a76e10049b08 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -573,9 +573,15 @@ struct pci_bus { struct device dev; struct bin_attribute *legacy_io; /* Legacy I/O for this bus */ struct bin_attribute *legacy_mem; /* Legacy mem */ + pci_channel_state_t error_state; /* Current connectivity state */ unsigned int is_added:1; }; +static inline int pci_bus_offline(struct pci_bus *bus) +{ + return (bus->error_state != pci_channel_io_normal); +} + #define to_pci_bus(n) container_of(n, struct pci_bus, dev) /*
This patch adds a channel state to a subordinate bus. When a DPC event is triggered, the DPC driver will set the channel state to frozen, and the pciehp driver will ignore link events if the subordinate bus is being managed by DPC error handling. This is safe because the pciehp and DPC drivers share the same interrupt. The DPC driver sets the bus state in the top-half interrupt context, and the pciehp driver checks and masks off link events in its bottom-half error handler. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++++ drivers/pci/pcie/dpc.c | 8 ++++++-- include/linux/pci.h | 6 ++++++ 3 files changed, 21 insertions(+), 2 deletions(-)