Message ID | 20200521173134.2456773-2-Jonathan.Cameron@huawei.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/AER: handling for RCiEPs | expand |
[+cc Sathy] On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote: > pci_aer_clear_device_status() currently resets the device status even when > firmware first handling is going on. In particular it resets it on the > root port. > > This has been discussed previously > https://lore.kernel.org/patchwork/patch/427375/. I don't think this reference is really pertinent, is it? That patch to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode. But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS. > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/pci/pcie/aer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index f4274d301235..43e78b97ace6 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) > { > u16 sta; > > + if (pcie_aer_get_firmware_first(dev)) > + return; This needs to be adjusted because pcie_aer_get_firmware_first() no longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST parsing for AER ownership"). This will use the _OSC AER ownership bit to gate clearing of the status bits in the PCIe capability (not the AER capability). I think that's the right thing to do, but it's certainly not obvious from the _OSC description in the PCI Firmware Spec r3.2. I think we need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of: System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 > pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); > pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); > } > -- > 2.19.1 >
Hi Jonathan, On 6/16/20 10:47 AM, Bjorn Helgaas wrote: > [+cc Sathy] > > On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote: >> pci_aer_clear_device_status() currently resets the device status even when >> firmware first handling is going on. In particular it resets it on the >> root port. >> >> This has been discussed previously >> https://lore.kernel.org/patchwork/patch/427375/. pci_aer_clear_device_status() is only used by handle_error_source(). And I don't think handle_error_source() is called in FF mode. Can you give more details on this issue ? > > I don't think this reference is really pertinent, is it? That patch > to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it > doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode. > > But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS. > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> --- >> drivers/pci/pcie/aer.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index f4274d301235..43e78b97ace6 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) >> { >> u16 sta; >> >> + if (pcie_aer_get_firmware_first(dev)) >> + return; > > This needs to be adjusted because pcie_aer_get_firmware_first() no > longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST > parsing for AER ownership"). > > This will use the _OSC AER ownership bit to gate clearing of the > status bits in the PCIe capability (not the AER capability). > > I think that's the right thing to do, but it's certainly not obvious > from the _OSC description in the PCI Firmware Spec r3.2. I think we > need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of: > > System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, > 2020, affecting PCI Firmware Specification, Rev. 3.2 > https://members.pcisig.com/wg/PCI-SIG/document/14076 > >> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); >> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); >> } >> -- >> 2.19.1 >>
On Tue, 16 Jun 2020 12:47:31 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc Sathy] > > On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote: > > pci_aer_clear_device_status() currently resets the device status even when > > firmware first handling is going on. In particular it resets it on the > > root port. > > > > This has been discussed previously > > https://lore.kernel.org/patchwork/patch/427375/. > > I don't think this reference is really pertinent, is it? That patch > to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it > doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode. > > But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS. I'll be honest I've mostly forgotten my reasoning behind including that reference. Might have been as simple as I got lost in the renames. I'll drop the reference. > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > drivers/pci/pcie/aer.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index f4274d301235..43e78b97ace6 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) > > { > > u16 sta; > > > > + if (pcie_aer_get_firmware_first(dev)) > > + return; > > This needs to be adjusted because pcie_aer_get_firmware_first() no > longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST > parsing for AER ownership"). > > This will use the _OSC AER ownership bit to gate clearing of the > status bits in the PCIe capability (not the AER capability). > > I think that's the right thing to do, but it's certainly not obvious > from the _OSC description in the PCI Firmware Spec r3.2. I think we > need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of: > > System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, > 2020, affecting PCI Firmware Specification, Rev. 3.2 > https://members.pcisig.com/wg/PCI-SIG/document/14076 Thanks. I'll add that (though can't check the document currently for reasons you can probably figure out *sigh*) Note this patch is rather tangential to patch 2 which is the one I really need feedback on. Whilst this appeared to be wrong it is 'mostly harmless'. > > > pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); > > pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); > > } > > -- > > 2.19.1 > >
On Tue, 16 Jun 2020 11:00:32 -0700 "Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > Hi Jonathan, > > On 6/16/20 10:47 AM, Bjorn Helgaas wrote: > > [+cc Sathy] > > > > On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote: > >> pci_aer_clear_device_status() currently resets the device status even when > >> firmware first handling is going on. In particular it resets it on the > >> root port. > >> > >> This has been discussed previously > >> https://lore.kernel.org/patchwork/patch/427375/. > pci_aer_clear_device_status() is only used by handle_error_source(). And > I don't think handle_error_source() is called in FF mode. Can you > give more details on this issue ? It's called in pcie_do_recovery https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200 Which is called from both handle_error_source and aer_recover_work_func. indirectly called from ghes_handle_aer / ghes_do_proc This particular flow will only happen (I think) on hardware reduced ACPI systems. Jonathan > > > > I don't think this reference is really pertinent, is it? That patch > > to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it > > doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode. > > > > But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS. > > > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> --- > >> drivers/pci/pcie/aer.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > >> index f4274d301235..43e78b97ace6 100644 > >> --- a/drivers/pci/pcie/aer.c > >> +++ b/drivers/pci/pcie/aer.c > >> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) > >> { > >> u16 sta; > >> > >> + if (pcie_aer_get_firmware_first(dev)) > >> + return; > > > > This needs to be adjusted because pcie_aer_get_firmware_first() no > > longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST > > parsing for AER ownership"). > > > > This will use the _OSC AER ownership bit to gate clearing of the > > status bits in the PCIe capability (not the AER capability). > > > > I think that's the right thing to do, but it's certainly not obvious > > from the _OSC description in the PCI Firmware Spec r3.2. I think we > > need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of: > > > > System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, > > 2020, affecting PCI Firmware Specification, Rev. 3.2 > > https://members.pcisig.com/wg/PCI-SIG/document/14076 > > > >> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); > >> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); > >> } > >> -- > >> 2.19.1 > >> >
Hi, On 6/17/20 2:31 AM, Jonathan Cameron wrote: > On Tue, 16 Jun 2020 11:00:32 -0700 > "Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > >> Hi Jonathan, >> >> On 6/16/20 10:47 AM, Bjorn Helgaas wrote: >>> [+cc Sathy] >>> >>> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote: >>>> pci_aer_clear_device_status() currently resets the device status even when >>>> firmware first handling is going on. In particular it resets it on the >>>> root port. >>>> >>>> This has been discussed previously >>>> https://lore.kernel.org/patchwork/patch/427375/. >> pci_aer_clear_device_status() is only used by handle_error_source(). And >> I don't think handle_error_source() is called in FF mode. Can you >> give more details on this issue ? > > It's called in pcie_do_recovery > > https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200 > > Which is called from both handle_error_source and aer_recover_work_func. > > indirectly called from ghes_handle_aer / ghes_do_proc > > This particular flow will only happen (I think) on hardware reduced ACPI systems. Ok. Makes sense. > > Jonathan > >>> >>> I don't think this reference is really pertinent, is it? That patch >>> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it >>> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode. >>> >>> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS. >>> >>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>> --- >>>> drivers/pci/pcie/aer.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >>>> index f4274d301235..43e78b97ace6 100644 >>>> --- a/drivers/pci/pcie/aer.c >>>> +++ b/drivers/pci/pcie/aer.c >>>> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) >>>> { >>>> u16 sta; >>>> >>>> + if (pcie_aer_get_firmware_first(dev)) use if (!pcie_aer_is_native(dev)) >>>> + return; >>> >>> This needs to be adjusted because pcie_aer_get_firmware_first() no >>> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST >>> parsing for AER ownership"). >>> >>> This will use the _OSC AER ownership bit to gate clearing of the >>> status bits in the PCIe capability (not the AER capability). >>> >>> I think that's the right thing to do, but it's certainly not obvious >>> from the _OSC description in the PCI Firmware Spec r3.2. I think we >>> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of: >>> >>> System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, >>> 2020, affecting PCI Firmware Specification, Rev. 3.2 >>> https://members.pcisig.com/wg/PCI-SIG/document/14076 >>> >>>> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); >>>> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); >>>> } >>>> -- >>>> 2.19.1 >>>> >> > >
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f4274d301235..43e78b97ace6 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) { u16 sta; + if (pcie_aer_get_firmware_first(dev)) + return; + pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); }
pci_aer_clear_device_status() currently resets the device status even when firmware first handling is going on. In particular it resets it on the root port. This has been discussed previously https://lore.kernel.org/patchwork/patch/427375/. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- drivers/pci/pcie/aer.c | 3 +++ 1 file changed, 3 insertions(+)