diff mbox series

[V2] PCI: Clear errors logged in Secondary Status Register

Message ID 20240116143258.483235-1-vidyas@nvidia.com (mailing list archive)
State New
Delegated to: Bjorn Helgaas
Headers show
Series [V2] PCI: Clear errors logged in Secondary Status Register | expand

Commit Message

Vidya Sagar Jan. 16, 2024, 2:32 p.m. UTC
The enumeration process leaves the 'Received Master Abort' bit set in
the Secondary Status Register of the downstream port in the following
scenarios.

(1) The device connected to the downstream port has ARI capability
    and that makes the kernel set the 'ARI Forwarding Enable' bit in
    the Device Control 2 Register of the downstream port. This
    effectively makes the downstream port forward the configuration
    requests targeting the devices downstream of it, even though they
    don't exist in reality. It causes the downstream devices return
    completions with UR set in the status in turn causing 'Received
    Master Abort' bit set.

    In contrast, if the downstream device doesn't have ARI capability,
    the 'ARI Forwarding Enable' bit in the downstream port is not set
    and any configuration requests targeting the downstream devices
    that don't exist are terminated (section 6.13 of PCI Express Base
    6.0 spec) in the downstream port itself resulting in no change of
    the 'Received Master Abort' bit.

(2) A PCIe switch is connected to the downstream port and when the
    enumeration flow tries to explore the presence of devices that
    don't really exist downstream of the switch, the downstream
    port receives the completions with UR set causing the 'Received
    Master Abort' bit set.

Clear 'Received Master Abort' bit to keep the bridge device in a clean
state post enumeration.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
V2:
* Changed commit message based on Bjorn's feedback

 drivers/pci/probe.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bjorn Helgaas Jan. 22, 2024, 11 p.m. UTC | #1
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
> The enumeration process leaves the 'Received Master Abort' bit set in
> the Secondary Status Register of the downstream port in the following
> scenarios.
> 
> (1) The device connected to the downstream port has ARI capability
>     and that makes the kernel set the 'ARI Forwarding Enable' bit in
>     the Device Control 2 Register of the downstream port. This
>     effectively makes the downstream port forward the configuration
>     requests targeting the devices downstream of it, even though they
>     don't exist in reality. It causes the downstream devices return
>     completions with UR set in the status in turn causing 'Received
>     Master Abort' bit set.
> 
>     In contrast, if the downstream device doesn't have ARI capability,
>     the 'ARI Forwarding Enable' bit in the downstream port is not set
>     and any configuration requests targeting the downstream devices
>     that don't exist are terminated (section 6.13 of PCI Express Base
>     6.0 spec) in the downstream port itself resulting in no change of
>     the 'Received Master Abort' bit.
> 
> (2) A PCIe switch is connected to the downstream port and when the
>     enumeration flow tries to explore the presence of devices that
>     don't really exist downstream of the switch, the downstream
>     port receives the completions with UR set causing the 'Received
>     Master Abort' bit set.

Are these the only possible ways this error is logged?  I expected
them to be logged when we enumerate below a Root Port that has nothing
attached, for example.

Does clearing them in pci_scan_bridge_extend() cover all ways this
error might be logged during enumeration?  I can't remember whether
all enumeration goes through this path.

> Clear 'Received Master Abort' bit to keep the bridge device in a clean
> state post enumeration.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> V2:
> * Changed commit message based on Bjorn's feedback
> 
>  drivers/pci/probe.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 795534589b98..640d2871b061 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>  	}
>  
>  out:
> +	/* Clear errors in the Secondary Status Register */
> +	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
>  	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>  
>  	pm_runtime_put(&dev->dev);
> -- 
> 2.25.1
>
Vidya Sagar March 14, 2024, 12:39 a.m. UTC | #2
On 23-01-2024 04:30, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>> The enumeration process leaves the 'Received Master Abort' bit set in
>> the Secondary Status Register of the downstream port in the following
>> scenarios.
>>
>> (1) The device connected to the downstream port has ARI capability
>>      and that makes the kernel set the 'ARI Forwarding Enable' bit in
>>      the Device Control 2 Register of the downstream port. This
>>      effectively makes the downstream port forward the configuration
>>      requests targeting the devices downstream of it, even though they
>>      don't exist in reality. It causes the downstream devices return
>>      completions with UR set in the status in turn causing 'Received
>>      Master Abort' bit set.
>>
>>      In contrast, if the downstream device doesn't have ARI capability,
>>      the 'ARI Forwarding Enable' bit in the downstream port is not set
>>      and any configuration requests targeting the downstream devices
>>      that don't exist are terminated (section 6.13 of PCI Express Base
>>      6.0 spec) in the downstream port itself resulting in no change of
>>      the 'Received Master Abort' bit.
>>
>> (2) A PCIe switch is connected to the downstream port and when the
>>      enumeration flow tries to explore the presence of devices that
>>      don't really exist downstream of the switch, the downstream
>>      port receives the completions with UR set causing the 'Received
>>      Master Abort' bit set.
> Are these the only possible ways this error is logged?  I expected
> them to be logged when we enumerate below a Root Port that has nothing
> attached, for example.
In this case, there won't be any TLP sent downstream. I talked about 
this scenario in the
second paragraph of point (1) above.
> Does clearing them in pci_scan_bridge_extend() cover all ways this
> error might be logged during enumeration?  I can't remember whether
> all enumeration goes through this path.
So far in my testing, clearing it in pci_scan_bridge_extend() covers all 
the cases.

>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>> state post enumeration.
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>> V2:
>> * Changed commit message based on Bjorn's feedback
>>
>>   drivers/pci/probe.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 795534589b98..640d2871b061 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>>        }
>>
>>   out:
>> +     /* Clear errors in the Secondary Status Register */
>> +     pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>> +
>>        pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>
>>        pm_runtime_put(&dev->dev);
>> --
>> 2.25.1
>>
Vidya Sagar April 1, 2024, 7:59 a.m. UTC | #3
Hi Bjorn,
Just checking on this thread.
Is there anything else you want me to clarify on?

Thanks,
Vidya Sagar

On 14-03-2024 06:09, Vidya Sagar wrote:
>
>
> On 23-01-2024 04:30, Bjorn Helgaas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>>> The enumeration process leaves the 'Received Master Abort' bit set in
>>> the Secondary Status Register of the downstream port in the following
>>> scenarios.
>>>
>>> (1) The device connected to the downstream port has ARI capability
>>>      and that makes the kernel set the 'ARI Forwarding Enable' bit in
>>>      the Device Control 2 Register of the downstream port. This
>>>      effectively makes the downstream port forward the configuration
>>>      requests targeting the devices downstream of it, even though they
>>>      don't exist in reality. It causes the downstream devices return
>>>      completions with UR set in the status in turn causing 'Received
>>>      Master Abort' bit set.
>>>
>>>      In contrast, if the downstream device doesn't have ARI capability,
>>>      the 'ARI Forwarding Enable' bit in the downstream port is not set
>>>      and any configuration requests targeting the downstream devices
>>>      that don't exist are terminated (section 6.13 of PCI Express Base
>>>      6.0 spec) in the downstream port itself resulting in no change of
>>>      the 'Received Master Abort' bit.
>>>
>>> (2) A PCIe switch is connected to the downstream port and when the
>>>      enumeration flow tries to explore the presence of devices that
>>>      don't really exist downstream of the switch, the downstream
>>>      port receives the completions with UR set causing the 'Received
>>>      Master Abort' bit set.
>> Are these the only possible ways this error is logged?  I expected
>> them to be logged when we enumerate below a Root Port that has nothing
>> attached, for example.
> In this case, there won't be any TLP sent downstream. I talked about 
> this scenario in the
> second paragraph of point (1) above.
>> Does clearing them in pci_scan_bridge_extend() cover all ways this
>> error might be logged during enumeration?  I can't remember whether
>> all enumeration goes through this path.
> So far in my testing, clearing it in pci_scan_bridge_extend() covers 
> all the cases.
>
>>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>>> state post enumeration.
>>>
>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>> ---
>>> V2:
>>> * Changed commit message based on Bjorn's feedback
>>>
>>>   drivers/pci/probe.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 795534589b98..640d2871b061 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct 
>>> pci_bus *bus, struct pci_dev *dev,
>>>        }
>>>
>>>   out:
>>> +     /* Clear errors in the Secondary Status Register */
>>> +     pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>>> +
>>>        pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>>
>>>        pm_runtime_put(&dev->dev);
>>> -- 
>>> 2.25.1
>>>
>
Vidya Sagar April 18, 2024, 10:52 a.m. UTC | #4
Hi Bjorn,
Sorry to bug you.
Is this change good to be accepted?

Thanks,
Vidya Sagar

On 01-04-2024 13:29, Vidya Sagar wrote:
> Hi Bjorn,
> Just checking on this thread.
> Is there anything else you want me to clarify on?
>
> Thanks,
> Vidya Sagar
>
> On 14-03-2024 06:09, Vidya Sagar wrote:
>>
>>
>> On 23-01-2024 04:30, Bjorn Helgaas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
>>>> The enumeration process leaves the 'Received Master Abort' bit set in
>>>> the Secondary Status Register of the downstream port in the following
>>>> scenarios.
>>>>
>>>> (1) The device connected to the downstream port has ARI capability
>>>>      and that makes the kernel set the 'ARI Forwarding Enable' bit in
>>>>      the Device Control 2 Register of the downstream port. This
>>>>      effectively makes the downstream port forward the configuration
>>>>      requests targeting the devices downstream of it, even though they
>>>>      don't exist in reality. It causes the downstream devices return
>>>>      completions with UR set in the status in turn causing 'Received
>>>>      Master Abort' bit set.
>>>>
>>>>      In contrast, if the downstream device doesn't have ARI capability,
>>>>      the 'ARI Forwarding Enable' bit in the downstream port is not set
>>>>      and any configuration requests targeting the downstream devices
>>>>      that don't exist are terminated (section 6.13 of PCI Express Base
>>>>      6.0 spec) in the downstream port itself resulting in no change of
>>>>      the 'Received Master Abort' bit.
>>>>
>>>> (2) A PCIe switch is connected to the downstream port and when the
>>>>      enumeration flow tries to explore the presence of devices that
>>>>      don't really exist downstream of the switch, the downstream
>>>>      port receives the completions with UR set causing the 'Received
>>>>      Master Abort' bit set.
>>> Are these the only possible ways this error is logged?  I expected
>>> them to be logged when we enumerate below a Root Port that has nothing
>>> attached, for example.
>> In this case, there won't be any TLP sent downstream. I talked about this 
>> scenario in the
>> second paragraph of point (1) above.
>>> Does clearing them in pci_scan_bridge_extend() cover all ways this
>>> error might be logged during enumeration?  I can't remember whether
>>> all enumeration goes through this path.
>> So far in my testing, clearing it in pci_scan_bridge_extend() covers all the 
>> cases.
>>
>>>> Clear 'Received Master Abort' bit to keep the bridge device in a clean
>>>> state post enumeration.
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>> V2:
>>>> * Changed commit message based on Bjorn's feedback
>>>>
>>>>   drivers/pci/probe.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index 795534589b98..640d2871b061 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus 
>>>> *bus, struct pci_dev *dev,
>>>>        }
>>>>
>>>>   out:
>>>> +     /* Clear errors in the Secondary Status Register */
>>>> +     pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
>>>> +
>>>>        pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>>>>
>>>>        pm_runtime_put(&dev->dev);
>>>> -- 
>>>> 2.25.1
>>>>
>>
>
Bjorn Helgaas April 23, 2024, 9:09 p.m. UTC | #5
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote:
> The enumeration process leaves the 'Received Master Abort' bit set in
> the Secondary Status Register of the downstream port in the following
> scenarios.
> 
> (1) The device connected to the downstream port has ARI capability
>     and that makes the kernel set the 'ARI Forwarding Enable' bit in
>     the Device Control 2 Register of the downstream port. This
>     effectively makes the downstream port forward the configuration
>     requests targeting the devices downstream of it, even though they
>     don't exist in reality. It causes the downstream devices return
>     completions with UR set in the status in turn causing 'Received
>     Master Abort' bit set.
> 
>     In contrast, if the downstream device doesn't have ARI capability,
>     the 'ARI Forwarding Enable' bit in the downstream port is not set
>     and any configuration requests targeting the downstream devices
>     that don't exist are terminated (section 6.13 of PCI Express Base
>     6.0 spec) in the downstream port itself resulting in no change of
>     the 'Received Master Abort' bit.
> 
> (2) A PCIe switch is connected to the downstream port and when the
>     enumeration flow tries to explore the presence of devices that
>     don't really exist downstream of the switch, the downstream
>     port receives the completions with UR set causing the 'Received
>     Master Abort' bit set.
> 
> Clear 'Received Master Abort' bit to keep the bridge device in a clean
> state post enumeration.
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>

Applied to pci/enumeration for v6.10, thanks!

I shortened the commit log because I think this happens all the time,
not just in the specific cases you mentioned above:

    PCI: Clear Secondary Status errors after enumeration

    We enumerate devices by attempting config reads to the Vendor ID of each
    possible device.  On conventional PCI, if no device responds, the read
    terminates with a Master Abort (PCI r3.0, sec 6.1).  On PCIe, the config
    read is terminated as an Unsupported Request (PCIe r6.0, sec 2.3.2,
    7.5.1.3.7).  In either case, if the read addressed a device below a bridge,
    it is logged by setting "Received Master Abort" in the bridge Secondary
    Status register.

    Clear any errors logged in the Secondary Status register after enumeration.

> ---
> V2:
> * Changed commit message based on Bjorn's feedback
> 
>  drivers/pci/probe.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 795534589b98..640d2871b061 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>  	}
>  
>  out:
> +	/* Clear errors in the Secondary Status Register */
> +	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
> +
>  	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
>  
>  	pm_runtime_put(&dev->dev);
> -- 
> 2.25.1
>
diff mbox series

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 795534589b98..640d2871b061 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1470,6 +1470,9 @@  static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 	}
 
 out:
+	/* Clear errors in the Secondary Status Register */
+	pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff);
+
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
 
 	pm_runtime_put(&dev->dev);