diff mbox

[V5,3/3] PCI: Mask and unmask hotplug interrupts during reset

Message ID 1530571967-19099-4-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya July 2, 2018, 10:52 p.m. UTC
If a bridge supports hotplug and observes a PCIe fatal error, the following
events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset waits
for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan lock
but devices are already removed by the AER driver and AER driver is waiting
for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the link
state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices again.

If a bridge is a hotplug capable bridge, mask hotplug interrupts before the
reset and unmask afterwards.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/pcie/err.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Sinan Kaya July 3, 2018, 11:30 a.m. UTC | #1
On 2018-07-03 04:34, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>> following
>> events happen:
>> 
>> 1. AER driver removes the devices from PCI tree on fatal error
>> 2. AER driver brings down the link by issuing a secondary bus reset 
>> waits
>> for the link to come up.
>> 3. Hotplug driver observes a link down interrupt
>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>> lock
>> but devices are already removed by the AER driver and AER driver is 
>> waiting
>> for the link to come back up.
>> 5. AER driver tries to re-enumerate devices after polling for the link
>> state to go up.
>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>> again.
>> 
>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>> before the
>> reset and unmask afterwards.
> 
> Would it work for you if you just amended the AER driver to skip
> removal and re-enumeration of devices if the port is a hotplug bridge?
> Just check for is_hotplug_bridge in struct pci_dev.

The reason why we want to remove devices before secondary bus reset is 
to quiesce pcie bus traffic before issuing a reset.

Skipping this step might cause transactions to be lost in the middle of 
the reset as there will be active traffic flowing and drivers will 
suddenly start reading ffs.

I don't think we can skip this step.


> 
> That would seem like a much simpler solution, given that it is known
> that the link will flap on reset, causing the hotplug driver to remove
> and re-enumerate devices.  That would also cover cases where hotplug is
> handled by a different driver than pciehp, or by the platform firmware.
> 
> Thanks,
> 
> Lukas
Oza Pawandeep July 3, 2018, 1:11 p.m. UTC | #2
On 2018-07-03 17:00, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>>> following
>>> events happen:
>>> 
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset 
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>>> lock
>>> but devices are already removed by the AER driver and AER driver is 
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the 
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>>> again.
>>> 
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>>> before the
>>> reset and unmask afterwards.
>> 
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
> 
> The reason why we want to remove devices before secondary bus reset is
> to quiesce pcie bus traffic before issuing a reset.
> 
> Skipping this step might cause transactions to be lost in the middle
> of the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
> 
> I don't think we can skip this step.
> 

what if we only have conditional enumeration ?  (leaving removing 
devices followed by SBR as is) ?

following code is doing little more extra work than our normal ERR_FATAL 
path.

pciehp_unconfigure_device doing little more than enumeration to 
quiescence the bus.

/*
		 * Ensure that no new Requests will be generated from
		 * the device.
		 */
		if (presence) {
			pci_read_config_word(dev, PCI_COMMAND, &command);
			command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
			command |= PCI_COMMAND_INTX_DISABLE;
			pci_write_config_word(dev, PCI_COMMAND, command);
		}



> 
>> 
>> That would seem like a much simpler solution, given that it is known
>> that the link will flap on reset, causing the hotplug driver to remove
>> and re-enumerate devices.  That would also cover cases where hotplug 
>> is
>> handled by a different driver than pciehp, or by the platform 
>> firmware.
>> 
>> Thanks,
>> 
>> Lukas
Sinan Kaya July 3, 2018, 1:25 p.m. UTC | #3
> 
> what if we only have conditional enumeration ?  (leaving removing devices followed by SBR as is) ?
> 

If we leave it as is, hotplug driver observes a link down interrupt as soon
as secondary bus reset is issued. Hotplug driver will try device removal while
AER driver is working on the currently observe FATAL error causing a race
condition. Hotplug driver wait for rescan lock to be obtained.

> following code is doing little more extra work than our normal ERR_FATAL path.
> 

See this comment and the pseudo code below.

/*
 * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
 * bus reset of the bridge, but at the same time we want to ensure that it is
 * not seen as a hot-unplug, followed by the hot-plug of the device. Thus,
 * disable link state notification and presence detection change notification
 * momentarily, if we see that they could interfere. Also, clear any spurious
 * events after.
 */
int pciehp_reset_slot(struct slot *slot, int probe)
{
	<mask hotplug interrupts>
	<issue secondary bus reset>
	<unmask hotplug interrupts>
}

This patch is trying to mask the interrupt before secondary bus reset in AER
path not after.

Otherwise, hotplug driver will remove the devices as soon as it obtains the
rescan lock following AER recovery since hotplug interrupts will be pending.

> pciehp_unconfigure_device doing little more than enumeration to quiescence the bus.
> 
> /*
>          * Ensure that no new Requests will be generated from
>          * the device.
>          */
>         if (presence) {
>             pci_read_config_word(dev, PCI_COMMAND, &command);
>             command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
>             command |= PCI_COMMAND_INTX_DISABLE;
>             pci_write_config_word(dev, PCI_COMMAND, command);
>         }
> 
> 
> 
>>
>>>
>>> That would seem like a much simpler solution, given that it is known
>>> that the link will flap on reset, causing the hotplug driver to remove
>>> and re-enumerate devices.  That would also cover cases where hotplug is
>>> handled by a different driver than pciehp, or by the platform firmware.
>>>
>>> Thanks,
>>>
>>> Lukas
>
Sinan Kaya July 3, 2018, 1:31 p.m. UTC | #4
On 7/3/2018 9:25 AM, Sinan Kaya wrote:
>> what if we only have conditional enumeration ?  (leaving removing devices followed by SBR as is) ?
>>
Sorry, I think I didn't quite get your question. 

Are you asking about the enumeration following link up while keeping the
current code as is?

Issue is observing hotplug link down event in the middle of AER recovery as in
my previous reply.

If we mask hotplug interrupts before secondary bus reset via my patch,
then hotplug driver will not observe both link up and link down interrupts.

If we don't mask hotplug interrupts, we have a race condition.

I hope I answered your question.
Oza Pawandeep July 3, 2018, 2:10 p.m. UTC | #5
On 2018-07-03 19:29, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
>> Issue is observing hotplug link down event in the middle of AER 
>> recovery
>> as in my previous reply.
>> 
>> If we mask hotplug interrupts before secondary bus reset via my patch,
>> then hotplug driver will not observe both link up and link down 
>> interrupts.
>> 
>> If we don't mask hotplug interrupts, we have a race condition.
> 
> I assume that a bus reset not only causes a link and presence event but
> also clears the Presence Detect State bit in the Slot Status register
> and the Data Link Layer Link Active bit in the Link Status register
> momentarily.
> 
> pciehp may access those two bits concurrently to the AER driver
> performing a slot reset.  So it may not be sufficient to mask
> the interrupt.

Was just wondering that you are protecting Presence Detect State bit 
with reset_lock, mainly in pciehp_ist
but with hotplug interrupt disabled, is there another way that it 
hotplug code gets activated ?

> 
> I've posted this patch to address the issue:
> https://patchwork.ozlabs.org/patch/930391/
> 
> Thanks,
> 
> Lukas
Oza Pawandeep July 3, 2018, 2:29 p.m. UTC | #6
On 2018-07-03 19:42, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote:
>> On 2018-07-03 04:34, Lukas Wunner wrote:
>> >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> >>If a bridge supports hotplug and observes a PCIe fatal error, the
>> >>following
>> >>events happen:
>> >>
>> >>1. AER driver removes the devices from PCI tree on fatal error
>> >>2. AER driver brings down the link by issuing a secondary bus reset
>> >>waits
>> >>for the link to come up.
>> >>3. Hotplug driver observes a link down interrupt
>> >>4. Hotplug driver tries to remove the devices waiting for the rescan
>> >>lock
>> >>but devices are already removed by the AER driver and AER driver is
>> >>waiting
>> >>for the link to come back up.
>> >>5. AER driver tries to re-enumerate devices after polling for the link
>> >>state to go up.
>> >>6. Hotplug driver obtains the lock and tries to remove the devices
>> >>again.
>> >>
>> >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
>> >>the
>> >>reset and unmask afterwards.
>> >
>> >Would it work for you if you just amended the AER driver to skip
>> >removal and re-enumeration of devices if the port is a hotplug bridge?
>> >Just check for is_hotplug_bridge in struct pci_dev.
>> 
>> The reason why we want to remove devices before secondary bus reset is 
>> to
>> quiesce pcie bus traffic before issuing a reset.
>> 
>> Skipping this step might cause transactions to be lost in the middle 
>> of the
>> reset as there will be active traffic flowing and drivers will 
>> suddenly
>> start reading ffs.
> 
> Interesting, I think that merits a code comment.
> 
> FWIW, macOS has a "PCI pause" callback to quiesce a device:
> https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf
> 
> They're using it to reconfigure a device's BAR and bus number
> at runtime (sic!), e.g. if mmio windows need to be moved around
> on Thunderbolt hotplug if there's insufficient space:
> 
> 	"During pause reconfiguration, the following may be changed:
> 	 - device BAR registers
> 	 - the devices bus number
> 	 - registry properties reflecting these values ("ranges",
> 	   "assigned-addresses", "reg")
> 	 - device MSI block values for address and value, but not the
> 	   number of MSIs allocated"
> 
> Conceptually, "PCI pause" is similar to putting the device in a suspend
> state.  I'm wondering if suspending the devices below the bridge would
> make more sense than removing them in the AER driver.
> 

the code is shared by not only AER but also DPC, where if DPC event 
happens the devices are removed.
also if the bridge is hotplug capable, then the devices beneath might 
have changed and resume might break.

> Lukas
Oza Pawandeep July 3, 2018, 3:12 p.m. UTC | #7
On 2018-07-03 20:04, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
>> u32 service)
>>  		pci_dev_put(pdev);
>>  	}
>> 
>> +	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
>> +	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
>> +
>> +	if (hpdev && hpsvc)
>> +		hpsvc->mask_irq(to_pcie_device(hpdev));
>> +
>>  	result = reset_link(udev, service);
>> 
>> +	if (hpdev && hpsvc)
>> +		hpsvc->unmask_irq(to_pcie_device(hpdev));
>> +
> 
> We've already got the ->reset_slot callback in struct hotplug_slot_ops,
> I'm wondering if we really need additional ones for this use case.
> 
> If you just do...
> 
> 	if (!pci_probe_reset_slot(dev->slot))
> 		pci_reset_slot(dev->slot)
> 	else {
> 		/* regular code path */
> 	}
> 
> would that not be equivalent?
> 

pcie_do_fatal_recovery() calls
           reset_link()
              which calls dpc_reset_link() or aer_root_reset() depending 
on the event.

and dpc_reset_link() and aer_root_reset() do their own cleanup stuffs.
infact, aer_root_reset() is the only function which actually triggers 
pci_reset_bridge_secondary_bus().

So if we try to fit in your suggestion:

if (!pci_probe_reset_slot(dev->slot))
{
     pci_reset_slot(dev->slot)
     result = reset_link(udev, service); >> in this case aer_root_reset 
must not call pci_reset_bridge_secondary_bus()
} else
     result = reset_link(udev, service); >> in this case aer_root_reset 
must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug 
capable)

Did I get your suggestion right ?

Regards,
Oza.

> (It's worth noting though that pciehp is the only hotplug driver
> implementing the ->reset_slot callback.  If hotplug is handled by
> a different driver or by the platform firmware, devices may still
> be removed and re-enumerated twice.)
> 

> Thanks,
> 
> Lukas
Sinan Kaya July 3, 2018, 3:34 p.m. UTC | #8
On 7/3/2018 9:59 AM, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
>> Issue is observing hotplug link down event in the middle of AER recovery
>> as in my previous reply.
>>
>> If we mask hotplug interrupts before secondary bus reset via my patch,
>> then hotplug driver will not observe both link up and link down interrupts.
>>
>> If we don't mask hotplug interrupts, we have a race condition.
> 
> I assume that a bus reset not only causes a link and presence event but
> also clears the Presence Detect State bit in the Slot Status register
> and the Data Link Layer Link Active bit in the Link Status register
> momentarily.
> 
> pciehp may access those two bits concurrently to the AER driver
> performing a slot reset.  So it may not be sufficient to mask
> the interrupt.
> 
> I've posted this patch to address the issue:
> https://patchwork.ozlabs.org/patch/930391/

Very interesting!

I missed this completely.

I know for a fact that bus reset clears the Data Link Layer Active bit
as soon as link goes down. It gets set again following link up.

Presence detect depends on the HW implementation. 

QDT root ports don't change presence detect for instance since nobody
actually removed the card.

If an implementation supports in-band presence detect, the answer is yes.
As soon as the link goes down, presence detect bit will get cleared until
recovery.

It sounds like we need to update your lock change with my proposal.

lock() in mask_irq() and unlock() in unmask_irq()

> 
> Thanks,
> 
> Lukas
>
Sinan Kaya July 3, 2018, 3:43 p.m. UTC | #9
On 7/3/2018 10:34 AM, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
>>  		pci_dev_put(pdev);
>>  	}
>>  
>> +	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
>> +	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
>> +
>> +	if (hpdev && hpsvc)
>> +		hpsvc->mask_irq(to_pcie_device(hpdev));
>> +
>>  	result = reset_link(udev, service);
>>  
>> +	if (hpdev && hpsvc)
>> +		hpsvc->unmask_irq(to_pcie_device(hpdev));
>> +
> 
> We've already got the ->reset_slot callback in struct hotplug_slot_ops,
> I'm wondering if we really need additional ones for this use case.
> 
> If you just do...
> 
> 	if (!pci_probe_reset_slot(dev->slot))
> 		pci_reset_slot(dev->slot)
> 	else {
> 		/* regular code path */
> 	}
> 
> would that not be equivalent?

As I have informed you before on my previous reply, the pdev->slot is
only valid for children objects such as endpoints not for a bridge when
using pciehp.

The pointer is NULL for the host bridge itself.

I reached out to reset_slot() callback in v4 of this implementation.

https://patchwork.kernel.org/patch/10494971/

However, as Oza explained FATAL error handling gets called from two different
paths as AER and DPC.

If the link goes down due to DPC, calling pci_reset_slot() would be a mistake
as DPC has its own recovery mechanism by programming the DPC capabilities.
pci_reset_slot() performs a secondary bus reset following hotplug interrupt
mask.

Issuing a secondary bus reset to a DPC event would be a mistake for recovery.

That's why, I extracted the hotplug mask and unmask IRQ calls into service
layer so that I can mask hotplug interrupts regardless of the source of the
FATAL error whether it is DPC or AER.

If error source is DPC, it still goes to DPC driver's reset_link() callback
for DPC specific clean up.

If error source is AER, it still goes to AER driver's reset_link() callback
for secondary bus reset.

Remember that AER driver completely bypasses pci_reset_slot() today. The lock
mechanism you are putting will not be useful for FATAL error case where
pci_secondary_bus_reset() is called directly.

pci_reset_slot() only gets called from external drivers such as VFIO to initiate
a reset to the slot if hotplug is supported.

> 
> (It's worth noting though that pciehp is the only hotplug driver
> implementing the ->reset_slot callback.  If hotplug is handled by
> a different driver or by the platform firmware, devices may still
> be removed and re-enumerated twice.)
> 
> Thanks,
> 
> Lukas
>
Sinan Kaya July 3, 2018, 3:49 p.m. UTC | #10
On 7/3/2018 11:12 AM, poza@codeaurora.org wrote:
> if (!pci_probe_reset_slot(dev->slot))
> {
>     pci_reset_slot(dev->slot)
>     result = reset_link(udev, service); >> in this case aer_root_reset must not call pci_reset_bridge_secondary_bus()
> } else
>     result = reset_link(udev, service); >> in this case aer_root_reset must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug capable)

Here are two different flow for two different FATAL error sources

dpc_irq
	link is down due to DPC
	pcie_do_fatal_recovery()
		pci_reset_slot()
			mask hotplug IRQ
			secondary bus reset
			unmask hotplug IRQ
			undefined behavior as link went down due to DPC
		dpc_reset_link()
			undefined behavior secondary bus reset happened
					   while a DPC event is pending
			link may or may not be up at this moment
			recover the link via DPC way if HW can cope with this undefined behavior.


aer_irq
	link is up
	pcie_do_fatal_recovery()
		pci_reset_slot()
			mask hotplug IRQ
			secondary bus reset
			unmask hotplug IRQ
			link goes up
		aer_reset_link()
			secondary bus reset
			hotplug link down interrupt again

I tried to change pci_reset_slot() such that we do

mask hotplug IRQ
go to AER/DPC reset_link based on error source
unmask hotplug IRQ
Sinan Kaya July 9, 2018, 2:48 p.m. UTC | #11
On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
>> On 7/3/2018 10:34 AM, Lukas Wunner wrote:
>> > We've already got the ->reset_slot callback in struct hotplug_slot_ops,
>> > I'm wondering if we really need additional ones for this use case.
>>
>> As I have informed you before on my previous reply, the pdev->slot is
>> only valid for children objects such as endpoints not for a bridge when
>> using pciehp.
>>
>> The pointer is NULL for the host bridge itself.
>
> Right, sorry, I had misremembered how this works.  So essentially the
> pointer is only set for the devices "in" the slot, but not for the bridge
> "to" that slot.  If the slot isn't occupied, *no* pci_dev points the
> struct pci_slot.  Seems counter-intuitive to be honest.

That is true. There is a bit of history here. Back in pci days, you
would have each PCI device number as a hotplug slot. There would be
multiple slots under a bridge. There is a one to many relationship.

>
> Thanks for the explanation of the various reset codepaths, I'm afraid my
> understanding of that portion of the PCI core is limited.
>
> Browsing through drivers/pci/pci.c, I notice pci_dev_lock() and
> pci_dev_save_and_disable(), both are called from reset codepaths and
> apparently seek to stop access to the device during reset.  I'm wondering
> why DPC and AER remove devices in order to avoid accesses to them during
> reset, instead of utilizing these two functions?

This was the behavior until 4.18. Since 4.18 all devices are being
removed and reenumerated following a fatal error condition.

>
> My guess is that a lot of the reset code is historically grown and
> could be simplified and made more consistent, but that requires digging
> into the code and getting a complete picture.  I've sort of done that
> for pciehp, I think I'm now intimately familiar with 90% of it,
> so I'll be happy to review patches for it and answer questions,
> but I'm pretty much stumped when it comes to reset code in the PCI core.
>
> I treated the ->reset_slot() callback as one possible entry point into
> pciehp and asked myself if it's properly serialized with the rest of the
> driver and whether driver ->probe and ->remove is ordered such that
> the driver is always properly initialized when the entry point might be
> taken.  I did not look too closely at the codepaths actually leading to
> invocation of the ->reset_slot() callback.
>

Sure, this path gets called from vfio today and possibly via a reset
file in sysfs. A bunch of things need to fail before hitting slot
reset path.

However, vfio is a direct caller if hotplug is supported.

>
>> I was curious if we could use a single work queue for all pcie portdrv
>> services. This would also eliminate the need for locks that Lukas is
>> adding.
>>
>> If hotplug starts first,  hotplug code would run to completion before AER
>> and DPC service starts recovery.
>>
>> If DPC/AER starts first, my patch would mask the hotplug interrupts.
>>
>> My solution doesn't help if link down interrupt is observed before the
>> AER
>> or DPC services.
>
> If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> my patches) remove all devices, check if the presence bit is set,

Yup.

> and
> if so, try to bring the slot up again.

Hotplug driver should only observe a link down interrupt. Link would
come up in response to a secondary bus reset initiated by the AER
driver.

Can you point me to the code that would bring up the link in hp code?
Maybe, I am missing something.

> My (limited) understanding is
> that the link will stay down until dpc/aer react.
> pciehp_check_link_status()
> will wait 1 sec for the link, wait another 100 msec, then poll the vendor
> register for 1 sec before giving up.  So if dpc/aer are locked out for this
> long, they will only be able to reset the slot after 2100 msec.
>
> I've had a brief look at the PCIe r4.0 base spec and could not find
> anything about how pciehp and dpc/aer should coordinate.  Maybe that's
> an oversight, or the PCISIG just leaves this to the OS.
>
>
>> Another possibility is to add synchronization logic between these threads
>> obviously.
>
> Maybe call pci_channel_offline() in the poll loops of pcie_wait_for_link()
> and pci_bus_check_dev() to avoid waiting for the link if an error needs to
> be acted upon first?

Let me think about this.

>
> Thanks,
>
> Lukas
>
Sinan Kaya July 10, 2018, 6:30 p.m. UTC | #12
On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > > My solution doesn't help if link down interrupt is observed before the
> > > > AER
> > > > or DPC services.
> > >
> > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> > > my patches) remove all devices, check if the presence bit is set,
> > > and if so, try to bring the slot up again.
> >
> > Hotplug driver should only observe a link down interrupt. Link would
> > come up in response to a secondary bus reset initiated by the AER
> > driver.
>
> PCIe hotplug doesn't have separate Link Down and Link Up interrupts,
> there is only a Link State *Changed* event.
>
> > Can you point me to the code that would bring up the link in hp code?
>
> I was referring to the situation with my recently posted pciehp patches
> applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to
> missed events"):
> https://patchwork.ozlabs.org/patch/930389/
>
> When I get a presence or link changed event, I turn the slot off.  That
> includes removing all devices in the slot.  Because even if the slot is
> still occupied or link is up, there was definitely a change and the safe
> behavior is to assume that the card in the slot is now a different one
> than before.
>

We do have a bit of mess unfortunately. Error handling and hotplug drivers
do not play nicely with each other.

When hotplug driver observes a link down, we are not checking if the
link down happened because user really wanted to remove a card or
if it was because it was originated by an error handling service such
as AER/DPC.

I'm thinking that we could potentially check if a hotplug event is pending
at the entrance of fatal error handling. If it is pending, we could poll until
the status bit clears. That should flush the link down event.

Even then, link down indication of hotplug seem to turn off slot power
and LED.

If AER/DPC service runs after the hotplug driver, link won't come back
up as the power to the slot is turned off.

I'd like to hear about Bjorn's opinion before we throw something else
into this problem.

> Afterwards, I check if the slot is occupied or link is up.  If either
> of those conditions is true, I try to bring the slot up again.
>
> Thanks,
>
> Lukas
Bjorn Helgaas July 20, 2018, 8:01 p.m. UTC | #13
On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote:
> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> > > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > > > My solution doesn't help if link down interrupt is observed
> > > > > before the AER or DPC services.
> > > >
> > > > If pciehp gets an interrupt quicker than dpc/aer, it will (at
> > > > least with my patches) remove all devices, check if the
> > > > presence bit is set, and if so, try to bring the slot up
> > > > again.
> > >
> > > Hotplug driver should only observe a link down interrupt. Link
> > > would come up in response to a secondary bus reset initiated by
> > > the AER driver.
> >
> > PCIe hotplug doesn't have separate Link Down and Link Up
> > interrupts, there is only a Link State *Changed* event.
> >
> > > Can you point me to the code that would bring up the link in hp
> > > code?
> >
> > I was referring to the situation with my recently posted pciehp
> > patches applied, in particular patch [21/32] ("PCI: pciehp: Become
> > resilient to missed events"):
> > https://patchwork.ozlabs.org/patch/930389/
> >
> > When I get a presence or link changed event, I turn the slot off.
> > That includes removing all devices in the slot.  Because even if
> > the slot is still occupied or link is up, there was definitely a
> > change and the safe behavior is to assume that the card in the
> > slot is now a different one than before.
> 
> We do have a bit of mess unfortunately. Error handling and hotplug
> drivers do not play nicely with each other.
> 
> When hotplug driver observes a link down, we are not checking if the
> link down happened because user really wanted to remove a card or if
> it was because it was originated by an error handling service such
> as AER/DPC.
> 
> I'm thinking that we could potentially check if a hotplug event is
> pending at the entrance of fatal error handling. If it is pending,
> we could poll until the status bit clears. That should flush the
> link down event.
> 
> Even then, link down indication of hotplug seem to turn off slot
> power and LED.
> 
> If AER/DPC service runs after the hotplug driver, link won't come
> back up as the power to the slot is turned off.
> 
> I'd like to hear about Bjorn's opinion before we throw something
> else into this problem.

You guys know way more about this than I do.

I think the separation of AER/DPC/pciehp into separate drivers is
somewhat artificial because there are many interdependencies.  The
driver model doesn't apply very well because there's only one
underlying piece of hardware, which forces us to use the portdrv as
sort of a multiplexer.  The fact that portdrv claims these bridges
also means normal drivers (e.g., for performance counters) can't use
the usual model.

All that is to say that if integrating these services more tightly
would help solve this problem, I'd be open to that.

Bjorn
Sinan Kaya July 21, 2018, 2:58 a.m. UTC | #14
On 7/20/2018 1:01 PM, Bjorn Helgaas wrote:
> On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote:
>> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
>>> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
>>>> On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
>>>>> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
>>>>>> My solution doesn't help if link down interrupt is observed
>>>>>> before the AER or DPC services.
>>>>>
>>>>> If pciehp gets an interrupt quicker than dpc/aer, it will (at
>>>>> least with my patches) remove all devices, check if the
>>>>> presence bit is set, and if so, try to bring the slot up
>>>>> again.
>>>>
>>>> Hotplug driver should only observe a link down interrupt. Link
>>>> would come up in response to a secondary bus reset initiated by
>>>> the AER driver.
>>>
>>> PCIe hotplug doesn't have separate Link Down and Link Up
>>> interrupts, there is only a Link State *Changed* event.
>>>
>>>> Can you point me to the code that would bring up the link in hp
>>>> code?
>>>
>>> I was referring to the situation with my recently posted pciehp
>>> patches applied, in particular patch [21/32] ("PCI: pciehp: Become
>>> resilient to missed events"):
>>> https://patchwork.ozlabs.org/patch/930389/
>>>
>>> When I get a presence or link changed event, I turn the slot off.
>>> That includes removing all devices in the slot.  Because even if
>>> the slot is still occupied or link is up, there was definitely a
>>> change and the safe behavior is to assume that the card in the
>>> slot is now a different one than before.
>>
>> We do have a bit of mess unfortunately. Error handling and hotplug
>> drivers do not play nicely with each other.
>>
>> When hotplug driver observes a link down, we are not checking if the
>> link down happened because user really wanted to remove a card or if
>> it was because it was originated by an error handling service such
>> as AER/DPC.
>>
>> I'm thinking that we could potentially check if a hotplug event is
>> pending at the entrance of fatal error handling. If it is pending,
>> we could poll until the status bit clears. That should flush the
>> link down event.
>>
>> Even then, link down indication of hotplug seem to turn off slot
>> power and LED.
>>
>> If AER/DPC service runs after the hotplug driver, link won't come
>> back up as the power to the slot is turned off.
>>
>> I'd like to hear about Bjorn's opinion before we throw something
>> else into this problem.
> 
> You guys know way more about this than I do.
> 
> I think the separation of AER/DPC/pciehp into separate drivers is
> somewhat artificial because there are many interdependencies.  The
> driver model doesn't apply very well because there's only one
> underlying piece of hardware, which forces us to use the portdrv as
> sort of a multiplexer.  The fact that portdrv claims these bridges
> also means normal drivers (e.g., for performance counters) can't use
> the usual model.
> 
> All that is to say that if integrating these services more tightly
> would help solve this problem, I'd be open to that.

I was looking at how to destroy the portdrv for a while. It looks like
a much more bigger task to be honest. There are multiple levels of
abstractions in the code as you highlighted.

My patch solves the problem if AER interrupt happens before the hotplug
interrupt. We are masking the data link layer active interrupt. So,
AER/DPC can perform their link operations without hotplug driver race.

We need to figure out how to gracefully return inside hotplug driver
if link down happened and there is an error pending.

My first question is why hotplug driver is reacting to the link event
if there was not an actual device insertion/removal.

Would it help to keep track of presence changed interrupts since last
link event?

IF counter is 0 and device is present, hotplug driver bails out
silently as an example.

> 
> Bjorn
>
Sinan Kaya July 21, 2018, 6:07 a.m. UTC | #15
On 7/20/2018 7:58 PM, Sinan Kaya wrote:
> We need to figure out how to gracefully return inside hotplug driver
> if link down happened and there is an error pending.

How about adding the following into the hotplug ISR?

1. check if firmware first is disabled
2. check if there is a fatal error pending in the device_status register
of the PCI Express capability on the root port.
3. bail out from hotplug routine if this is the case.
4. otherwise, existing behavior.
Oza Pawandeep July 25, 2018, 8:29 a.m. UTC | #16
On 2018-07-21 11:37, Sinan Kaya wrote:
> On 7/20/2018 7:58 PM, Sinan Kaya wrote:
>> We need to figure out how to gracefully return inside hotplug driver
>> if link down happened and there is an error pending.
> 
> How about adding the following into the hotplug ISR?
> 
> 1. check if firmware first is disabled
> 2. check if there is a fatal error pending in the device_status 
> register
> of the PCI Express capability on the root port.
> 3. bail out from hotplug routine if this is the case.
> 4. otherwise, existing behavior.

This makes sense.

from Lukas's text

"
The user may turn the slot on/off via sysfs.  If an Attention Button
is present, the user may also press that button to turn the slot on/off
after 5 seconds.  Either way, it may cause pciehp's IRQ thread to run
concurrently to a reset initiated by the AER driver, independently of
any events signalled by the slot.
"

so if device gets removed and re-enumerated other than hotplug ISR,
or any other similar path has to take care of checking ERR_FATAL status.

Regards,
Oza.
diff mbox

Patch

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index ae72f88..5a2d410 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -285,10 +285,12 @@  static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
  */
 void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 {
+	struct pcie_port_service_driver *hpsvc;
 	struct pci_dev *udev;
 	struct pci_bus *parent;
 	struct pci_dev *pdev, *temp;
 	pci_ers_result_t result;
+	struct device *hpdev;
 
 	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
 		udev = dev;
@@ -308,8 +310,17 @@  void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 		pci_dev_put(pdev);
 	}
 
+	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
+	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
+
+	if (hpdev && hpsvc)
+		hpsvc->mask_irq(to_pcie_device(hpdev));
+
 	result = reset_link(udev, service);
 
+	if (hpdev && hpsvc)
+		hpsvc->unmask_irq(to_pcie_device(hpdev));
+
 	if ((service == PCIE_PORT_SERVICE_AER) &&
 	    (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) {
 		/*