[14/16] pciehp: Ignore link events during DPC event

Message ID	20180831212639.10196-15-keith.busch@intel.com (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> From: Keith Busch <keith.busch@intel.com> To: Linux PCI <linux-pci@vger.kernel.org>, Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>, poza@codeaurora.org, Lukas Wunner <lukas@wunner.de>, Keith Busch <keith.busch@intel.com> Subject: [PATCH 14/16] pciehp: Ignore link events during DPC event Date: Fri, 31 Aug 2018 15:26:37 -0600 Message-Id: <20180831212639.10196-15-keith.busch@intel.com> In-Reply-To: <20180831212639.10196-1-keith.busch@intel.com> References: <20180831212639.10196-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk
Series	PCI, error handling and hot plug \| expand [00/16] PCI, error handling and hot plug [01/16] PCI: Simplify disconnected marking [02/16] PCI: Fix pci_reset_bus [03/16] PCI/AER: Remove dead code [04/16] PCI/ERR: Use slot reset if available [05/16] PCI/ERR: Handle fatal error recovery [06/16] PCI/ERR: Remove devices on recovery failure [07/16] PCI/ERR: Always use the first downstream port [08/16] PCI/ERR: Simplify broadcast callouts [09/16] PCI/ERR: Report current recovery status for udev [10/16] PCI/portdrv: Provide pci error callbacks [11/16] PCI/portdrv: Restore pci state on slot reset [12/16] PCI/pciehp: Fix powerfault detection order [13/16] PCI/pciehp: Implement error handling callbacks [14/16] pciehp: Ignore link events during DPC event [15/16] PCI/DPC: Wait for reset complete [16/16] PCI: Unify device inaccessible

Keith Busch Aug. 31, 2018, 9:26 p.m. UTC

This patch adds a channel state to a subordinate bus. When a DPC event is
triggered, the DPC driver will set the channel state to frozen, and the
pciehp driver will ignore link events if the subordinate bus is being
managed by DPC error handling.

This is safe because the pciehp and DPC drivers share the same
interrupt. The DPC driver sets the bus state in the top-half interrupt
context, and the pciehp driver checks and masks off link events in its
bottom-half error handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 9 +++++++++
 drivers/pci/pcie/dpc.c           | 8 ++++++--
 include/linux/pci.h              | 6 ++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

Sinan Kaya Aug. 31, 2018, 10:18 p.m. UTC | #1

On 8/31/2018 2:26 PM, Keith Busch wrote:
> This is safe because the pciehp and DPC drivers share the same
> interrupt. The DPC driver sets the bus state in the top-half interrupt
> context, and the pciehp driver checks and masks off link events in its
> bottom-half error handler.

Where is this coming from? Is there a spec reference?

DPC and HP interrupts can be implemented as MSI-x interrupts and could
be unrelated interrupt IDs?

Keith Busch Aug. 31, 2018, 10:33 p.m. UTC | #2

On Fri, Aug 31, 2018 at 03:18:15PM -0700, Sinan Kaya wrote:
> On 8/31/2018 2:26 PM, Keith Busch wrote:
> > This is safe because the pciehp and DPC drivers share the same
> > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > context, and the pciehp driver checks and masks off link events in its
> > bottom-half error handler.
> 
> Where is this coming from? Is there a spec reference?
> 
> DPC and HP interrupts can be implemented as MSI-x interrupts and could
> be unrelated interrupt IDs?

Darn, you're right. The kernel allocates up to 32 vectors and the port is
free to divvy them up however it wants amont its supported services. It
just so happened most of the ports I tested used the same one. There's
no way to really close this race if they are separate vectors though,
so maybe just leave this as  a 'best effort' approach and update the
change log accodingly.

Sinan Kaya Aug. 31, 2018, 10:55 p.m. UTC | #3

On 8/31/2018 3:33 PM, Keith Busch wrote:
> Darn, you're right. The kernel allocates up to 32 vectors and the port is
> free to divvy them up however it wants amont its supported services. It
> just so happened most of the ports I tested used the same one. There's
> no way to really close this race if they are separate vectors though,
> so maybe just leave this as  a 'best effort' approach and update the
> change log accodingly.

or take the big hammer and move them into a single workqueue?

Keith Busch Aug. 31, 2018, 10:59 p.m. UTC | #4

On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote:
> On 8/31/2018 3:33 PM, Keith Busch wrote:
> > Darn, you're right. The kernel allocates up to 32 vectors and the port is
> > free to divvy them up however it wants amont its supported services. It
> > just so happened most of the ports I tested used the same one. There's
> > no way to really close this race if they are separate vectors though,
> > so maybe just leave this as  a 'best effort' approach and update the
> > change log accodingly.
> 
> or take the big hammer and move them into a single workqueue?

That wouldn't really help if they're different interrupt vectors since
they could happen in either order with a delay between the CPU seeing
each of them.

Sinan Kaya Aug. 31, 2018, 11:07 p.m. UTC | #5

On 8/31/2018 3:59 PM, Keith Busch wrote:
> On Fri, Aug 31, 2018 at 03:55:58PM -0700, Sinan Kaya wrote:
>> On 8/31/2018 3:33 PM, Keith Busch wrote:
>>> Darn, you're right. The kernel allocates up to 32 vectors and the port is
>>> free to divvy them up however it wants amont its supported services. It
>>> just so happened most of the ports I tested used the same one. There's
>>> no way to really close this race if they are separate vectors though,
>>> so maybe just leave this as  a 'best effort' approach and update the
>>> change log accodingly.
>>
>> or take the big hammer and move them into a single workqueue?
> 
> That wouldn't really help if they're different interrupt vectors since
> they could happen in either order with a delay between the CPU seeing
> each of them.
> 

I was trying to make it look like single interrupt in software even though
they are sourced from different interrupt handlers.

But, you are right. Interrupts can arrive in arbitrary order.

Lukas Wunner Sept. 2, 2018, 2:27 p.m. UTC | #6

On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> This patch adds a channel state to a subordinate bus. When a DPC event is
> triggered, the DPC driver will set the channel state to frozen, and the
> pciehp driver will ignore link events if the subordinate bus is being
> managed by DPC error handling.
> 
> This is safe because the pciehp and DPC drivers share the same
> interrupt. The DPC driver sets the bus state in the top-half interrupt
> context, and the pciehp driver checks and masks off link events in its
> bottom-half error handler.

I really liked Sinan's approach of checking in pciehp whether a fatal
error is pending and waiting for it to be handled:
https://patchwork.ozlabs.org/patch/959464/

This seemed to avoid any races with DPC and is small and simple.
Can we pursue a solution along those lines?

Thanks,

Lukas

Keith Busch Sept. 4, 2018, 2:16 p.m. UTC | #7

On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > This patch adds a channel state to a subordinate bus. When a DPC event is
> > triggered, the DPC driver will set the channel state to frozen, and the
> > pciehp driver will ignore link events if the subordinate bus is being
> > managed by DPC error handling.
> > 
> > This is safe because the pciehp and DPC drivers share the same
> > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > context, and the pciehp driver checks and masks off link events in its
> > bottom-half error handler.
> 
> I really liked Sinan's approach of checking in pciehp whether a fatal
> error is pending and waiting for it to be handled:
> https://patchwork.ozlabs.org/patch/959464/
> 
> This seemed to avoid any races with DPC and is small and simple.
> Can we pursue a solution along those lines?

That introduces a completely different race between the error handling
and hotplug threads. We don't control  which interrupt fires first or
any way ensure they're even the same event.

Lukas Wunner Sept. 4, 2018, 2:40 p.m. UTC | #8

On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote:
> On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > > This patch adds a channel state to a subordinate bus. When a DPC event is
> > > triggered, the DPC driver will set the channel state to frozen, and the
> > > pciehp driver will ignore link events if the subordinate bus is being
> > > managed by DPC error handling.
> > > 
> > > This is safe because the pciehp and DPC drivers share the same
> > > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > > context, and the pciehp driver checks and masks off link events in its
> > > bottom-half error handler.
> > 
> > I really liked Sinan's approach of checking in pciehp whether a fatal
> > error is pending and waiting for it to be handled:
> > https://patchwork.ozlabs.org/patch/959464/
> > 
> > This seemed to avoid any races with DPC and is small and simple.
> > Can we pursue a solution along those lines?
> 
> That introduces a completely different race between the error handling
> and hotplug threads. We don't control  which interrupt fires first or
> any way ensure they're even the same event.

pciehp may react quicker than dpc, hence needs to determine a fatal
error is pending without relying on dpc.  My understanding is that
this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from
pciehp.

For the case when dpc reacts quicker and clears the error before
pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional
synchronization mechanism between dpc and pciehp, such as a flag
that is set by dpc before clearing the error, and that is checked
by pciehp.  Though you need to take care that pciehp does not see
a stale flag when the next error occurs.

Thanks,

Lukas

Keith Busch Sept. 4, 2018, 3:31 p.m. UTC | #9

On Tue, Sep 04, 2018 at 04:40:14PM +0200, Lukas Wunner wrote:
> On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote:
> > On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote:
> > > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote:
> > > > This patch adds a channel state to a subordinate bus. When a DPC event is
> > > > triggered, the DPC driver will set the channel state to frozen, and the
> > > > pciehp driver will ignore link events if the subordinate bus is being
> > > > managed by DPC error handling.
> > > > 
> > > > This is safe because the pciehp and DPC drivers share the same
> > > > interrupt. The DPC driver sets the bus state in the top-half interrupt
> > > > context, and the pciehp driver checks and masks off link events in its
> > > > bottom-half error handler.
> > > 
> > > I really liked Sinan's approach of checking in pciehp whether a fatal
> > > error is pending and waiting for it to be handled:
> > > https://patchwork.ozlabs.org/patch/959464/
> > > 
> > > This seemed to avoid any races with DPC and is small and simple.
> > > Can we pursue a solution along those lines?
> > 
> > That introduces a completely different race between the error handling
> > and hotplug threads. We don't control  which interrupt fires first or
> > any way ensure they're even the same event.
> 
> pciehp may react quicker than dpc, hence needs to determine a fatal
> error is pending without relying on dpc.  My understanding is that
> this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from
> pciehp.

That's only true if the bridge detects ERR_FATAL, which is one of several
ways to trigger DPC or AER. If the message comes from the end device,
then PCI_EXP_DEVSTA_FED won't be set in the bridge that pciehp can
read.

> For the case when dpc reacts quicker and clears the error before
> pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional
> synchronization mechanism between dpc and pciehp, such as a flag
> that is set by dpc before clearing the error, and that is checked
> by pciehp.  Though you need to take care that pciehp does not see
> a stale flag when the next error occurs.

Yes, the pci_bus error_state this patch creates was intended to be
that flag.

[14/16] pciehp: Ignore link events during DPC event

Commit Message

Comments

Patch