diff mbox series

[1/2] PCI/AER: Do not reset the device status if doing firmware first handling.

Message ID 20200521173134.2456773-2-Jonathan.Cameron@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI/AER: handling for RCiEPs | expand

Commit Message

Jonathan Cameron May 21, 2020, 5:31 p.m. UTC
pci_aer_clear_device_status() currently resets the device status even when
firmware first handling is going on.  In particular it resets it on the
root port.

This has been discussed previously
https://lore.kernel.org/patchwork/patch/427375/.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/aer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bjorn Helgaas June 16, 2020, 5:47 p.m. UTC | #1
[+cc Sathy]

On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
> pci_aer_clear_device_status() currently resets the device status even when
> firmware first handling is going on.  In particular it resets it on the
> root port.
>
> This has been discussed previously
> https://lore.kernel.org/patchwork/patch/427375/.

I don't think this reference is really pertinent, is it?  That patch
to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.

But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.

> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/pci/pcie/aer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f4274d301235..43e78b97ace6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>  {
>  	u16 sta;
>  
> +	if (pcie_aer_get_firmware_first(dev))
> +		return;

This needs to be adjusted because pcie_aer_get_firmware_first() no
longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
parsing for AER ownership").

This will use the _OSC AER ownership bit to gate clearing of the
status bits in the PCIe capability (not the AER capability).

I think that's the right thing to do, but it's certainly not obvious
from the _OSC description in the PCI Firmware Spec r3.2.  I think we
need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:

  System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
  2020, affecting PCI Firmware Specification, Rev. 3.2
  https://members.pcisig.com/wg/PCI-SIG/document/14076

>  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>  }
> -- 
> 2.19.1
>
Kuppuswamy Sathyanarayanan June 16, 2020, 6 p.m. UTC | #2
Hi Jonathan,

On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
> [+cc Sathy]
> 
> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
>> pci_aer_clear_device_status() currently resets the device status even when
>> firmware first handling is going on.  In particular it resets it on the
>> root port.
>>
>> This has been discussed previously
>> https://lore.kernel.org/patchwork/patch/427375/.
pci_aer_clear_device_status() is only used by handle_error_source(). And
I don't think handle_error_source() is called in FF mode. Can you
give more details on this issue ?
> 
> I don't think this reference is really pertinent, is it?  That patch
> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> 
> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
> 
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>   drivers/pci/pcie/aer.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index f4274d301235..43e78b97ace6 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>>   {
>>   	u16 sta;
>>   
>> +	if (pcie_aer_get_firmware_first(dev))
>> +		return;
> 
> This needs to be adjusted because pcie_aer_get_firmware_first() no
> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> parsing for AER ownership").
> 
> This will use the _OSC AER ownership bit to gate clearing of the
> status bits in the PCIe capability (not the AER capability).
> 
> I think that's the right thing to do, but it's certainly not obvious
> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> 
>    System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>    2020, affecting PCI Firmware Specification, Rev. 3.2
>    https://members.pcisig.com/wg/PCI-SIG/document/14076
> 
>>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>>   	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>>   }
>> -- 
>> 2.19.1
>>
Jonathan Cameron June 17, 2020, 9:18 a.m. UTC | #3
On Tue, 16 Jun 2020 12:47:31 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Sathy]
> 
> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
> > pci_aer_clear_device_status() currently resets the device status even when
> > firmware first handling is going on.  In particular it resets it on the
> > root port.
> >
> > This has been discussed previously
> > https://lore.kernel.org/patchwork/patch/427375/.  
> 
> I don't think this reference is really pertinent, is it?  That patch
> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> 
> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.

I'll be honest I've mostly forgotten my reasoning behind including that
reference.  Might have been as simple as I got lost in the renames.

I'll drop the reference.

> 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/pci/pcie/aer.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index f4274d301235..43e78b97ace6 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
> >  {
> >  	u16 sta;
> >  
> > +	if (pcie_aer_get_firmware_first(dev))
> > +		return;  
> 
> This needs to be adjusted because pcie_aer_get_firmware_first() no
> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> parsing for AER ownership").
> 
> This will use the _OSC AER ownership bit to gate clearing of the
> status bits in the PCIe capability (not the AER capability).
> 
> I think that's the right thing to do, but it's certainly not obvious
> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> 
>   System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>   2020, affecting PCI Firmware Specification, Rev. 3.2
>   https://members.pcisig.com/wg/PCI-SIG/document/14076

Thanks. I'll add that (though can't check the document currently
for reasons you can probably figure out *sigh*)

Note this patch is rather tangential to patch 2 which is the one
I really need feedback on.  Whilst this appeared to be
wrong it is 'mostly harmless'.

> 
> >  	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> >  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> >  }
> > -- 
> > 2.19.1
> >
Jonathan Cameron June 17, 2020, 9:31 a.m. UTC | #4
On Tue, 16 Jun 2020 11:00:32 -0700
"Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:

> Hi Jonathan,
> 
> On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
> > [+cc Sathy]
> > 
> > On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:  
> >> pci_aer_clear_device_status() currently resets the device status even when
> >> firmware first handling is going on.  In particular it resets it on the
> >> root port.
> >>
> >> This has been discussed previously
> >> https://lore.kernel.org/patchwork/patch/427375/.  
> pci_aer_clear_device_status() is only used by handle_error_source(). And
> I don't think handle_error_source() is called in FF mode. Can you
> give more details on this issue ?

It's called in pcie_do_recovery

https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200

Which is called from both handle_error_source and aer_recover_work_func.

indirectly called from ghes_handle_aer / ghes_do_proc

This particular flow will only happen (I think) on hardware reduced ACPI systems.

Jonathan

> > 
> > I don't think this reference is really pertinent, is it?  That patch
> > to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
> > doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
> > 
> > But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
> >   
> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> ---
> >>   drivers/pci/pcie/aer.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> index f4274d301235..43e78b97ace6 100644
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
> >>   {
> >>   	u16 sta;
> >>   
> >> +	if (pcie_aer_get_firmware_first(dev))
> >> +		return;  
> > 
> > This needs to be adjusted because pcie_aer_get_firmware_first() no
> > longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
> > parsing for AER ownership").
> > 
> > This will use the _OSC AER ownership bit to gate clearing of the
> > status bits in the PCIe capability (not the AER capability).
> > 
> > I think that's the right thing to do, but it's certainly not obvious
> > from the _OSC description in the PCI Firmware Spec r3.2.  I think we
> > need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
> > 
> >    System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
> >    2020, affecting PCI Firmware Specification, Rev. 3.2
> >    https://members.pcisig.com/wg/PCI-SIG/document/14076
> >   
> >>   	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> >>   	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> >>   }
> >> -- 
> >> 2.19.1
> >>  
>
Kuppuswamy Sathyanarayanan June 17, 2020, 8:57 p.m. UTC | #5
Hi,

On 6/17/20 2:31 AM, Jonathan Cameron wrote:
> On Tue, 16 Jun 2020 11:00:32 -0700
> "Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@linux.intel.com> wrote:
> 
>> Hi Jonathan,
>>
>> On 6/16/20 10:47 AM, Bjorn Helgaas wrote:
>>> [+cc Sathy]
>>>
>>> On Fri, May 22, 2020 at 01:31:33AM +0800, Jonathan Cameron wrote:
>>>> pci_aer_clear_device_status() currently resets the device status even when
>>>> firmware first handling is going on.  In particular it resets it on the
>>>> root port.
>>>>
>>>> This has been discussed previously
>>>> https://lore.kernel.org/patchwork/patch/427375/.
>> pci_aer_clear_device_status() is only used by handle_error_source(). And
>> I don't think handle_error_source() is called in FF mode. Can you
>> give more details on this issue ?
> 
> It's called in pcie_do_recovery
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/pci/pcie/err.c#L200
> 
> Which is called from both handle_error_source and aer_recover_work_func.
> 
> indirectly called from ghes_handle_aer / ghes_do_proc
> 
> This particular flow will only happen (I think) on hardware reduced ACPI systems.
Ok. Makes sense.
> 
> Jonathan
> 
>>>
>>> I don't think this reference is really pertinent, is it?  That patch
>>> to b2c8881da764 changes pci_cleanup_aer_uncorrect_error_status() so it
>>> doesn't clear PCI_ERR_UNCOR_STATUS in "firmware-first" mode.
>>>
>>> But your patch only affects PCI_EXP_DEVSTA, not PCI_ERR_UNCOR_STATUS.
>>>    
>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> ---
>>>>    drivers/pci/pcie/aer.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index f4274d301235..43e78b97ace6 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -373,6 +373,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev)
>>>>    {
>>>>    	u16 sta;
>>>>    
>>>> +	if (pcie_aer_get_firmware_first(dev))
use if (!pcie_aer_is_native(dev))
>>>> +		return;
>>>
>>> This needs to be adjusted because pcie_aer_get_firmware_first() no
>>> longer exists after 708b20003624 ("PCI/AER: Remove HEST/FIRMWARE_FIRST
>>> parsing for AER ownership").
>>>
>>> This will use the _OSC AER ownership bit to gate clearing of the
>>> status bits in the PCIe capability (not the AER capability).
>>>
>>> I think that's the right thing to do, but it's certainly not obvious
>>> from the _OSC description in the PCI Firmware Spec r3.2.  I think we
>>> need a pointer to the ECN that clarifies this, i.e., sec 4.5.1 of:
>>>
>>>     System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
>>>     2020, affecting PCI Firmware Specification, Rev. 3.2
>>>     https://members.pcisig.com/wg/PCI-SIG/document/14076
>>>    
>>>>    	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
>>>>    	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
>>>>    }
>>>> -- 
>>>> 2.19.1
>>>>   
>>
> 
>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f4274d301235..43e78b97ace6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -373,6 +373,9 @@  void pci_aer_clear_device_status(struct pci_dev *dev)
 {
 	u16 sta;
 
+	if (pcie_aer_get_firmware_first(dev))
+		return;
+
 	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
 	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
 }