diff mbox series

[1/1] PCI/portdrv: Allow DPC if the OS controls AER natively.

Message ID 20231223212235.34293-2-mattc@purestorage.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [1/1] PCI/portdrv: Allow DPC if the OS controls AER natively. | expand

Commit Message

Matthew W Carlis Dec. 23, 2023, 9:22 p.m. UTC
This change ensures the kernel will use DPC on a supporting device if
the kernel will also control AER on the Root Ports & RCECs.

The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI
specifications. It is recommended to always link control of both to the
same entity, being the OS or system firmware. The kernel wants to be
flexible by first having a default policy, but also by providing command
line parameters to enable us all to do what we want even if it might
violate the recommendations.

The following mentioned patch brought the kernels default behavior
more in line with the specification around AER, but changed its behavior
around DPC on PCIe Downstream Switch Ports; preventing the kernel from
controlling DPC on them unless using pcie_ports=dpc-native.
    * "PCI/portdrv: Allow AER service only for Root Ports & RCECs"
After this change the behavior around using DPC on PCIe switch ports
and Root Ports should be as it was before.

Signed-off-by: Matthew W Carlis <mattc@purestorage.com>
---
 drivers/pci/pcie/portdrv.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

kernel test robot Dec. 25, 2023, 5:53 p.m. UTC | #1
Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.7-rc7 next-20231222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-W-Carlis/PCI-portdrv-Allow-DPC-if-the-OS-controls-AER-natively/20231225-154046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20231223212235.34293-2-mattc%40purestorage.com
patch subject: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively.
config: arc-randconfig-002-20231225 (https://download.01.org/0day-ci/archive/20231226/202312260105.Gu2Z5jdw-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231226/202312260105.Gu2Z5jdw-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312260105.Gu2Z5jdw-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/pci/pcie/portdrv.c: In function 'get_port_device_capability':
>> drivers/pci/pcie/portdrv.c:272:19: error: 'struct pci_dev' has no member named 'aer_cap'; did you mean 'ats_cap'?
     272 |             (dev->aer_cap && host->native_aer)))
         |                   ^~~~~~~
         |                   ats_cap


vim +272 drivers/pci/pcie/portdrv.c

   244	
   245		/* Root Ports and Root Complex Event Collectors may generate PMEs */
   246		if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
   247		     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
   248		    (pcie_ports_native || host->native_pme)) {
   249			services |= PCIE_PORT_SERVICE_PME;
   250	
   251			/*
   252			 * Disable PME interrupt on this port in case it's been enabled
   253			 * by the BIOS (the PME service driver will enable it when
   254			 * necessary).
   255			 */
   256			pcie_pme_interrupt_enable(dev, false);
   257		}
   258	
   259		/*
   260		 * _OSC AER Control is required by the OS & requires OS to control AER,
   261		 * but _OSC DPC Control isn't required by the OS to control DPC; however
   262		 * it does require the OS to control DPC. _OSC DPC Control also requres
   263		 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2)
   264		 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends
   265		 * platform fw or OS always link control of DPC to AER.
   266		 *
   267		 * With dpc-native, allow Linux to use DPC even if it doesn't have
   268		 * permission to use AER.
   269		 */
   270		if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
   271		    pci_aer_available() && (pcie_ports_dpc_native ||
 > 272		    (dev->aer_cap && host->native_aer)))
   273			services |= PCIE_PORT_SERVICE_DPC;
   274	
   275		if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
   276		    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
   277			u32 linkcap;
   278	
   279			pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
   280			if (linkcap & PCI_EXP_LNKCAP_LBNC)
   281				services |= PCIE_PORT_SERVICE_BWNOTIF;
   282		}
   283	
   284		return services;
   285	}
   286
kernel test robot Dec. 25, 2023, 8:36 p.m. UTC | #2
Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.7-rc7 next-20231222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-W-Carlis/PCI-portdrv-Allow-DPC-if-the-OS-controls-AER-natively/20231225-154046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20231223212235.34293-2-mattc%40purestorage.com
patch subject: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively.
config: arm-defconfig (https://download.01.org/0day-ci/archive/20231226/202312260431.ZXppQZ1I-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231226/202312260431.ZXppQZ1I-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312260431.ZXppQZ1I-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/pci/pcie/portdrv.c:272:12: error: no member named 'aer_cap' in 'struct pci_dev'; did you mean 'acs_cap'?
               (dev->aer_cap && host->native_aer)))
                     ^~~~~~~
                     acs_cap
   include/linux/pci.h:519:7: note: 'acs_cap' declared here
           u16             acs_cap;        /* ACS Capability offset */
                           ^
   1 error generated.


vim +272 drivers/pci/pcie/portdrv.c

   244	
   245		/* Root Ports and Root Complex Event Collectors may generate PMEs */
   246		if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
   247		     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
   248		    (pcie_ports_native || host->native_pme)) {
   249			services |= PCIE_PORT_SERVICE_PME;
   250	
   251			/*
   252			 * Disable PME interrupt on this port in case it's been enabled
   253			 * by the BIOS (the PME service driver will enable it when
   254			 * necessary).
   255			 */
   256			pcie_pme_interrupt_enable(dev, false);
   257		}
   258	
   259		/*
   260		 * _OSC AER Control is required by the OS & requires OS to control AER,
   261		 * but _OSC DPC Control isn't required by the OS to control DPC; however
   262		 * it does require the OS to control DPC. _OSC DPC Control also requres
   263		 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2)
   264		 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends
   265		 * platform fw or OS always link control of DPC to AER.
   266		 *
   267		 * With dpc-native, allow Linux to use DPC even if it doesn't have
   268		 * permission to use AER.
   269		 */
   270		if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
   271		    pci_aer_available() && (pcie_ports_dpc_native ||
 > 272		    (dev->aer_cap && host->native_aer)))
   273			services |= PCIE_PORT_SERVICE_DPC;
   274	
   275		if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
   276		    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
   277			u32 linkcap;
   278	
   279			pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
   280			if (linkcap & PCI_EXP_LNKCAP_LBNC)
   281				services |= PCIE_PORT_SERVICE_BWNOTIF;
   282		}
   283	
   284		return services;
   285	}
   286
Matthew W Carlis Dec. 26, 2023, 12:02 a.m. UTC | #3
I guess we don't build aer_cap into pci_dev without CONFIG_PCIEAER...
Wonder if I should just drop the check for aer_cap since its likely
not necessary anyways.
Bjorn Helgaas Dec. 28, 2023, 9:23 p.m. UTC | #4
[+cc Keith, Lukas]

Hi Matthew, thanks for your work and the patch.

On Sat, Dec 23, 2023 at 02:22:35PM -0700, Matthew W Carlis wrote:
> This change ensures the kernel will use DPC on a supporting device if
> the kernel will also control AER on the Root Ports & RCECs.
> 
> The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI
> specifications. It is recommended to always link control of both to the
> same entity, being the OS or system firmware. The kernel wants to be
> flexible by first having a default policy, but also by providing command
> line parameters to enable us all to do what we want even if it might
> violate the recommendations.
> 
> The following mentioned patch brought the kernels default behavior
> more in line with the specification around AER, but changed its behavior
> around DPC on PCIe Downstream Switch Ports; preventing the kernel from
> controlling DPC on them unless using pcie_ports=dpc-native.
>     * "PCI/portdrv: Allow AER service only for Root Ports & RCECs"
> After this change the behavior around using DPC on PCIe switch ports
> and Root Ports should be as it was before.
> 
> Signed-off-by: Matthew W Carlis <mattc@purestorage.com>
> ---
>  drivers/pci/pcie/portdrv.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 14a4b89a3b83..8e023aa97672 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -257,12 +257,19 @@ static int get_port_device_capability(struct pci_dev *dev)
>  	}
>  
>  	/*
> +	 * _OSC AER Control is required by the OS & requires OS to control AER,
> +	 * but _OSC DPC Control isn't required by the OS to control DPC; however
> +	 * it does require the OS to control DPC. _OSC DPC Control also requres
> +	 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2)
> +	 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends
> +	 * platform fw or OS always link control of DPC to AER.
> +	 *
>  	 * With dpc-native, allow Linux to use DPC even if it doesn't have
>  	 * permission to use AER.
>  	 */
>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> -	    pci_aer_available() &&
> -	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> +	    pci_aer_available() && (pcie_ports_dpc_native ||
> +	    (dev->aer_cap && host->native_aer)))
>  		services |= PCIE_PORT_SERVICE_DPC;

This is easier to read if we retain the original line breaks, i.e.,

  -     (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
  +     (pcie_ports_dpc_native || (dev->aer_cap && host->native_aer)))

Prior to d8d2b65a940b, we set PCIE_PORT_SERVICE_AER for a device
whenever it had an AER Capability.  If it had a DPC Capability, we
also set PCIE_PORT_SERVICE_DPC so DPC would work on it.

After d8d2b65a940b, we only set PCIE_PORT_SERVICE_AER for Root Ports
and RCECs because the AER driver only binds to those devices.  We no
longer set PCIE_PORT_SERVICE_DPC for Switch Downstream Ports because
they don't have PCIE_PORT_SERVICE_AER set.

The result is that you need "pcie_ports=dpc-native" to make DPC work
on those devices when you didn't need it before d8d2b65a940b.

That's a regression that we need to fix:
#regzbot introduced: d8d2b65a940b ("PCI/portdrv: Allow AER service only for Root Ports & RCECs")

_OSC directly supports negotiation of DPC ownership, and I think we
should pay attention to what it tell us.  We already request DPC
control and set native_dpc accordingly, but we don't use it here;
currently we only look at it in the unrelated pciehp_ist() path.

Can you try the patch below and see if it resolves the problem?

I don't think we need to complicate this by trying to enforce the
AER/DPC dependencies in the OS.  The firmware spec already requires
platforms to either retain ownership of both AER and DPC, or grant
ownership of both to the OS.

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 14a4b89a3b83..423dadd6727e 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -262,7 +262,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 	 */
 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
 	    pci_aer_available() &&
-	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+	    (pcie_ports_dpc_native || host->native_dpc))
 		services |= PCIE_PORT_SERVICE_DPC;
 
 	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
Kuppuswamy Sathyanarayanan Jan. 2, 2024, 3:41 p.m. UTC | #5
On 12/28/2023 1:23 PM, Bjorn Helgaas wrote:
> [+cc Keith, Lukas]
> 
> Hi Matthew, thanks for your work and the patch.
> 
> On Sat, Dec 23, 2023 at 02:22:35PM -0700, Matthew W Carlis wrote:
>> This change ensures the kernel will use DPC on a supporting device if
>> the kernel will also control AER on the Root Ports & RCECs.
>>
>> The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI
>> specifications. It is recommended to always link control of both to the
>> same entity, being the OS or system firmware. The kernel wants to be
>> flexible by first having a default policy, but also by providing command
>> line parameters to enable us all to do what we want even if it might
>> violate the recommendations.
>>
>> The following mentioned patch brought the kernels default behavior
>> more in line with the specification around AER, but changed its behavior
>> around DPC on PCIe Downstream Switch Ports; preventing the kernel from
>> controlling DPC on them unless using pcie_ports=dpc-native.
>>     * "PCI/portdrv: Allow AER service only for Root Ports & RCECs"
>> After this change the behavior around using DPC on PCIe switch ports
>> and Root Ports should be as it was before.
>>
>> Signed-off-by: Matthew W Carlis <mattc@purestorage.com>
>> ---
>>  drivers/pci/pcie/portdrv.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
>> index 14a4b89a3b83..8e023aa97672 100644
>> --- a/drivers/pci/pcie/portdrv.c
>> +++ b/drivers/pci/pcie/portdrv.c
>> @@ -257,12 +257,19 @@ static int get_port_device_capability(struct pci_dev *dev)
>>  	}
>>  
>>  	/*
>> +	 * _OSC AER Control is required by the OS & requires OS to control AER,
>> +	 * but _OSC DPC Control isn't required by the OS to control DPC; however
>> +	 * it does require the OS to control DPC. _OSC DPC Control also requres
>> +	 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2)
>> +	 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends
>> +	 * platform fw or OS always link control of DPC to AER.
>> +	 *
>>  	 * With dpc-native, allow Linux to use DPC even if it doesn't have
>>  	 * permission to use AER.
>>  	 */
>>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
>> -	    pci_aer_available() &&
>> -	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
>> +	    pci_aer_available() && (pcie_ports_dpc_native ||
>> +	    (dev->aer_cap && host->native_aer)))
>>  		services |= PCIE_PORT_SERVICE_DPC;
> 
> This is easier to read if we retain the original line breaks, i.e.,
> 
>   -     (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
>   +     (pcie_ports_dpc_native || (dev->aer_cap && host->native_aer)))
> 
> Prior to d8d2b65a940b, we set PCIE_PORT_SERVICE_AER for a device
> whenever it had an AER Capability.  If it had a DPC Capability, we
> also set PCIE_PORT_SERVICE_DPC so DPC would work on it.
> 
> After d8d2b65a940b, we only set PCIE_PORT_SERVICE_AER for Root Ports
> and RCECs because the AER driver only binds to those devices.  We no
> longer set PCIE_PORT_SERVICE_DPC for Switch Downstream Ports because
> they don't have PCIE_PORT_SERVICE_AER set.
> 
> The result is that you need "pcie_ports=dpc-native" to make DPC work
> on those devices when you didn't need it before d8d2b65a940b.
> 
> That's a regression that we need to fix:
> #regzbot introduced: d8d2b65a940b ("PCI/portdrv: Allow AER service only for Root Ports & RCECs")
> 
> _OSC directly supports negotiation of DPC ownership, and I think we
> should pay attention to what it tell us.  We already request DPC
> control and set native_dpc accordingly, but we don't use it here;
> currently we only look at it in the unrelated pciehp_ist() path.
> 
> Can you try the patch below and see if it resolves the problem?
> 
> I don't think we need to complicate this by trying to enforce the
> AER/DPC dependencies in the OS.  The firmware spec already requires
> platforms to either retain ownership of both AER and DPC, or grant
> ownership of both to the OS.

Change looks fine to me. Once concern is, what if we are dealing with
a buggy firmware which give DPC native control, but retains AER? Do
you think it makes sense to have a sanity check to make sure this
does not happen?

> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 14a4b89a3b83..423dadd6727e 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -262,7 +262,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  	 */
>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
>  	    pci_aer_available() &&
> -	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> +	    (pcie_ports_dpc_native || host->native_dpc))
>  		services |= PCIE_PORT_SERVICE_DPC;
>  
>  	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 14a4b89a3b83..8e023aa97672 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -257,12 +257,19 @@  static int get_port_device_capability(struct pci_dev *dev)
 	}
 
 	/*
+	 * _OSC AER Control is required by the OS & requires OS to control AER,
+	 * but _OSC DPC Control isn't required by the OS to control DPC; however
+	 * it does require the OS to control DPC. _OSC DPC Control also requres
+	 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2)
+	 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends
+	 * platform fw or OS always link control of DPC to AER.
+	 *
 	 * With dpc-native, allow Linux to use DPC even if it doesn't have
 	 * permission to use AER.
 	 */
 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
-	    pci_aer_available() &&
-	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+	    pci_aer_available() && (pcie_ports_dpc_native ||
+	    (dev->aer_cap && host->native_aer)))
 		services |= PCIE_PORT_SERVICE_DPC;
 
 	if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||