diff mbox series

[14/16] pciehp: Ignore link events during DPC event

Message ID 20180831212639.10196-15-keith.busch@intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI, error handling and hot plug | expand

Commit Message

Keith Busch Aug. 31, 2018, 9:26 p.m. UTC
This patch adds a channel state to a subordinate bus. When a DPC event is
triggered, the DPC driver will set the channel state to frozen, and the
pciehp driver will ignore link events if the subordinate bus is being
managed by DPC error handling.

This is safe because the pciehp and DPC drivers share the same
interrupt. The DPC driver sets the bus state in the top-half interrupt
context, and the pciehp driver checks and masks off link events in its
bottom-half error handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++++
 drivers/pci/pcie/dpc.c           | 8 ++++++--
 include/linux/pci.h              | 6 ++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

Comments

Sinan Kaya Aug. 31, 2018, 10:18 p.m. UTC | #1
On 8/31/2018 2:26 PM, Keith Busch wrote:
> This is safe because the pciehp and DPC drivers share the same
> interrupt. The DPC driver sets the bus state in the top-half interrupt
> context, and the pciehp driver checks and masks off link events in its
> bottom-half error handler.

Where is this coming from? Is there a spec reference?

DPC and HP interrupts can be implemented as MSI-x interrupts and could
be unrelated interrupt IDs?
Keith Busch Aug. 31, 2018, 10:33 p.m. UTC | #2
On Fri, Aug 31, 2018 at 03:18:15PM -0700, Sinan Kaya wrote:
> On 8/31/2018 2:26 PM, Keith Busch wrote:
> > This is safe because the pciehp and DPC drivers share the same
> > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > context, and the pciehp driver checks and masks off link events in its
> > bottom-half error handler.
> 
> Where is this coming from? Is there a spec reference?
> 
> DPC and HP interrupts can be implemented as MSI-x interrupts and could
> be unrelated interrupt IDs?

Darn, you're right. The kernel allocates up to 32 vectors and the port is
free to divvy them up however it wants amont its supported services. It
just so happened most of the ports I tested used the same one. There's
no way to really close this race if they are separate vectors though,
so maybe just leave this as  a 'best effort' approach and update the
change log accodingly.
Sinan Kaya Aug. 31, 2018, 10:55 p.m. UTC | #3
On 8/31/2018 3:33 PM, Keith Busch wrote:
> Darn, you're right. The kernel allocates up to 32 vectors and the port is
> free to divvy them up however it wants amont its supported services. It
> just so happened most of the ports I tested used the same one. There's
> no way to really close this race if they are separate vectors though,
> so maybe just leave this as  a 'best effort' approach and update the
> change log accodingly.

or take the big hammer and move them into a single workqueue?
Keith Busch Aug. 31, 2018, 10:59 p.m. UTC | #4
On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote:
> On 8/31/2018 3:33 PM, Keith Busch wrote:
> > Darn, you're right. The kernel allocates up to 32 vectors and the port is
> > free to divvy them up however it wants amont its supported services. It
> > just so happened most of the ports I tested used the same one. There's
> > no way to really close this race if they are separate vectors though,
> > so maybe just leave this as  a 'best effort' approach and update the
> > change log accodingly.
> 
> or take the big hammer and move them into a single workqueue?

That wouldn't really help if they're different interrupt vectors since
they could happen in either order with a delay between the CPU seeing
each of them.
Sinan Kaya Aug. 31, 2018, 11:07 p.m. UTC | #5
On 8/31/2018 3:59 PM, Keith Busch wrote:
> On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote:
>> On 8/31/2018 3:33 PM, Keith Busch wrote:
>>> Darn, you're right. The kernel allocates up to 32 vectors and the port is
>>> free to divvy them up however it wants amont its supported services. It
>>> just so happened most of the ports I tested used the same one. There's
>>> no way to really close this race if they are separate vectors though,
>>> so maybe just leave this as  a 'best effort' approach and update the
>>> change log accodingly.
>>
>> or take the big hammer and move them into a single workqueue?
> 
> That wouldn't really help if they're different interrupt vectors since
> they could happen in either order with a delay between the CPU seeing
> each of them.
> 

I was trying to make it look like single interrupt in software even though
they are sourced from different interrupt handlers.

But, you are right. Interrupts can arrive in arbitrary order.
Lukas Wunner Sept. 2, 2018, 2:27 p.m. UTC | #6
On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> This patch adds a channel state to a subordinate bus. When a DPC event is
> triggered, the DPC driver will set the channel state to frozen, and the
> pciehp driver will ignore link events if the subordinate bus is being
> managed by DPC error handling.
> 
> This is safe because the pciehp and DPC drivers share the same
> interrupt. The DPC driver sets the bus state in the top-half interrupt
> context, and the pciehp driver checks and masks off link events in its
> bottom-half error handler.

I really liked Sinan's approach of checking in pciehp whether a fatal
error is pending and waiting for it to be handled:
https://patchwork.ozlabs.org/patch/959464/

This seemed to avoid any races with DPC and is small and simple.
Can we pursue a solution along those lines?

Thanks,

Lukas
Keith Busch Sept. 4, 2018, 2:16 p.m. UTC | #7
On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > This patch adds a channel state to a subordinate bus. When a DPC event is
> > triggered, the DPC driver will set the channel state to frozen, and the
> > pciehp driver will ignore link events if the subordinate bus is being
> > managed by DPC error handling.
> > 
> > This is safe because the pciehp and DPC drivers share the same
> > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > context, and the pciehp driver checks and masks off link events in its
> > bottom-half error handler.
> 
> I really liked Sinan's approach of checking in pciehp whether a fatal
> error is pending and waiting for it to be handled:
> https://patchwork.ozlabs.org/patch/959464/
> 
> This seemed to avoid any races with DPC and is small and simple.
> Can we pursue a solution along those lines?

That introduces a completely different race between the error handling
and hotplug threads. We don't control  which interrupt fires first or
any way ensure they're even the same event.
Lukas Wunner Sept. 4, 2018, 2:40 p.m. UTC | #8
On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote:
> On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > > This patch adds a channel state to a subordinate bus. When a DPC event is
> > > triggered, the DPC driver will set the channel state to frozen, and the
> > > pciehp driver will ignore link events if the subordinate bus is being
> > > managed by DPC error handling.
> > > 
> > > This is safe because the pciehp and DPC drivers share the same
> > > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > > context, and the pciehp driver checks and masks off link events in its
> > > bottom-half error handler.
> > 
> > I really liked Sinan's approach of checking in pciehp whether a fatal
> > error is pending and waiting for it to be handled:
> > https://patchwork.ozlabs.org/patch/959464/
> > 
> > This seemed to avoid any races with DPC and is small and simple.
> > Can we pursue a solution along those lines?
> 
> That introduces a completely different race between the error handling
> and hotplug threads. We don't control  which interrupt fires first or
> any way ensure they're even the same event.

pciehp may react quicker than dpc, hence needs to determine a fatal
error is pending without relying on dpc.  My understanding is that
this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from
pciehp.

For the case when dpc reacts quicker and clears the error before
pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional
synchronization mechanism between dpc and pciehp, such as a flag
that is set by dpc before clearing the error, and that is checked
by pciehp.  Though you need to take care that pciehp does not see
a stale flag when the next error occurs.

Thanks,

Lukas
Keith Busch Sept. 4, 2018, 3:31 p.m. UTC | #9
On Tue, Sep 04, 2018 at 04:40:14PM +0200, Lukas Wunner wrote:
> On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote:
> > On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> > > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > > > This patch adds a channel state to a subordinate bus. When a DPC event is
> > > > triggered, the DPC driver will set the channel state to frozen, and the
> > > > pciehp driver will ignore link events if the subordinate bus is being
> > > > managed by DPC error handling.
> > > > 
> > > > This is safe because the pciehp and DPC drivers share the same
> > > > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > > > context, and the pciehp driver checks and masks off link events in its
> > > > bottom-half error handler.
> > > 
> > > I really liked Sinan's approach of checking in pciehp whether a fatal
> > > error is pending and waiting for it to be handled:
> > > https://patchwork.ozlabs.org/patch/959464/
> > > 
> > > This seemed to avoid any races with DPC and is small and simple.
> > > Can we pursue a solution along those lines?
> > 
> > That introduces a completely different race between the error handling
> > and hotplug threads. We don't control  which interrupt fires first or
> > any way ensure they're even the same event.
> 
> pciehp may react quicker than dpc, hence needs to determine a fatal
> error is pending without relying on dpc.  My understanding is that
> this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from
> pciehp.

That's only true if the bridge detects ERR_FATAL, which is one of several
ways to trigger DPC or AER. If the message comes from the end device,
then PCI_EXP_DEVSTA_FED won't be set in the bridge that pciehp can
read.

> For the case when dpc reacts quicker and clears the error before
> pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional
> synchronization mechanism between dpc and pciehp, such as a flag
> that is set by dpc before clearing the error, and that is checked
> by pciehp.  Though you need to take care that pciehp does not see
> a stale flag when the next error occurs.

Yes, the pci_bus error_state this patch creates was intended to be
that flag.
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 7c43336e08ba..a928dcccf78b 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -643,6 +643,15 @@  static irqreturn_t pciehp_ist(int irq, void *dev_id)
 
 	synchronize_hardirq(irq);
 	events = atomic_xchg(&ctrl->pending_events, 0);
+
+	/*
+	 * Ignore link events on the suborinate bus if error handling is
+	 * active, as link down may be expected. We'll continue to handle
+	 * presence detect changes.
+	 */
+	if (pdev->subordinate && pci_bus_offline(pdev->subordinate))
+		events &= ~PCI_EXP_SLTSTA_DLLSC;
+
 	if (!events) {
 		pci_config_pm_runtime_put(pdev);
 		return IRQ_NONE;
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index eadfd835af13..93ce26191a2b 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -92,7 +92,8 @@  static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
-
+	if (pdev->subordinate)
+		pdev->subordinate->error_state = pci_channel_io_normal;
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
@@ -204,8 +205,11 @@  static irqreturn_t dpc_irq(int irq, void *context)
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_INTERRUPT);
-	if (status & PCI_EXP_DPC_STATUS_TRIGGER)
+	if (status & PCI_EXP_DPC_STATUS_TRIGGER) {
+		if (pdev->subordinate)
+			pdev->subordinate->error_state = pci_channel_io_frozen;
 		return IRQ_WAKE_THREAD;
+	}
 	return IRQ_HANDLED;
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..a76e10049b08 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -573,9 +573,15 @@  struct pci_bus {
 	struct device		dev;
 	struct bin_attribute	*legacy_io;	/* Legacy I/O for this bus */
 	struct bin_attribute	*legacy_mem;	/* Legacy mem */
+	pci_channel_state_t error_state;	/* Current connectivity state */
 	unsigned int		is_added:1;
 };
 
+static inline int pci_bus_offline(struct pci_bus *bus)
+{
+	return (bus->error_state != pci_channel_io_normal);
+}
+
 #define to_pci_bus(n)	container_of(n, struct pci_bus, dev)
 
 /*