Message ID | 20231223212235.34293-2-mattc@purestorage.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [1/1] PCI/portdrv: Allow DPC if the OS controls AER natively. | expand |
Hi Matthew, kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus linus/master v6.7-rc7 next-20231222] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Matthew-W-Carlis/PCI-portdrv-Allow-DPC-if-the-OS-controls-AER-natively/20231225-154046 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20231223212235.34293-2-mattc%40purestorage.com patch subject: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively. config: arc-randconfig-002-20231225 (https://download.01.org/0day-ci/archive/20231226/202312260105.Gu2Z5jdw-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231226/202312260105.Gu2Z5jdw-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202312260105.Gu2Z5jdw-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/pci/pcie/portdrv.c: In function 'get_port_device_capability': >> drivers/pci/pcie/portdrv.c:272:19: error: 'struct pci_dev' has no member named 'aer_cap'; did you mean 'ats_cap'? 272 | (dev->aer_cap && host->native_aer))) | ^~~~~~~ | ats_cap vim +272 drivers/pci/pcie/portdrv.c 244 245 /* Root Ports and Root Complex Event Collectors may generate PMEs */ 246 if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || 247 pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && 248 (pcie_ports_native || host->native_pme)) { 249 services |= PCIE_PORT_SERVICE_PME; 250 251 /* 252 * Disable PME interrupt on this port in case it's been enabled 253 * by the BIOS (the PME service driver will enable it when 254 * necessary). 255 */ 256 pcie_pme_interrupt_enable(dev, false); 257 } 258 259 /* 260 * _OSC AER Control is required by the OS & requires OS to control AER, 261 * but _OSC DPC Control isn't required by the OS to control DPC; however 262 * it does require the OS to control DPC. _OSC DPC Control also requres 263 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) 264 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends 265 * platform fw or OS always link control of DPC to AER. 266 * 267 * With dpc-native, allow Linux to use DPC even if it doesn't have 268 * permission to use AER. 269 */ 270 if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && 271 pci_aer_available() && (pcie_ports_dpc_native || > 272 (dev->aer_cap && host->native_aer))) 273 services |= PCIE_PORT_SERVICE_DPC; 274 275 if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM || 276 pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) { 277 u32 linkcap; 278 279 pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); 280 if (linkcap & PCI_EXP_LNKCAP_LBNC) 281 services |= PCIE_PORT_SERVICE_BWNOTIF; 282 } 283 284 return services; 285 } 286
Hi Matthew, kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus linus/master v6.7-rc7 next-20231222] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Matthew-W-Carlis/PCI-portdrv-Allow-DPC-if-the-OS-controls-AER-natively/20231225-154046 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20231223212235.34293-2-mattc%40purestorage.com patch subject: [PATCH 1/1] PCI/portdrv: Allow DPC if the OS controls AER natively. config: arm-defconfig (https://download.01.org/0day-ci/archive/20231226/202312260431.ZXppQZ1I-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231226/202312260431.ZXppQZ1I-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202312260431.ZXppQZ1I-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/pci/pcie/portdrv.c:272:12: error: no member named 'aer_cap' in 'struct pci_dev'; did you mean 'acs_cap'? (dev->aer_cap && host->native_aer))) ^~~~~~~ acs_cap include/linux/pci.h:519:7: note: 'acs_cap' declared here u16 acs_cap; /* ACS Capability offset */ ^ 1 error generated. vim +272 drivers/pci/pcie/portdrv.c 244 245 /* Root Ports and Root Complex Event Collectors may generate PMEs */ 246 if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || 247 pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && 248 (pcie_ports_native || host->native_pme)) { 249 services |= PCIE_PORT_SERVICE_PME; 250 251 /* 252 * Disable PME interrupt on this port in case it's been enabled 253 * by the BIOS (the PME service driver will enable it when 254 * necessary). 255 */ 256 pcie_pme_interrupt_enable(dev, false); 257 } 258 259 /* 260 * _OSC AER Control is required by the OS & requires OS to control AER, 261 * but _OSC DPC Control isn't required by the OS to control DPC; however 262 * it does require the OS to control DPC. _OSC DPC Control also requres 263 * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) 264 * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends 265 * platform fw or OS always link control of DPC to AER. 266 * 267 * With dpc-native, allow Linux to use DPC even if it doesn't have 268 * permission to use AER. 269 */ 270 if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && 271 pci_aer_available() && (pcie_ports_dpc_native || > 272 (dev->aer_cap && host->native_aer))) 273 services |= PCIE_PORT_SERVICE_DPC; 274 275 if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM || 276 pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) { 277 u32 linkcap; 278 279 pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); 280 if (linkcap & PCI_EXP_LNKCAP_LBNC) 281 services |= PCIE_PORT_SERVICE_BWNOTIF; 282 } 283 284 return services; 285 } 286
I guess we don't build aer_cap into pci_dev without CONFIG_PCIEAER... Wonder if I should just drop the check for aer_cap since its likely not necessary anyways.
[+cc Keith, Lukas] Hi Matthew, thanks for your work and the patch. On Sat, Dec 23, 2023 at 02:22:35PM -0700, Matthew W Carlis wrote: > This change ensures the kernel will use DPC on a supporting device if > the kernel will also control AER on the Root Ports & RCECs. > > The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI > specifications. It is recommended to always link control of both to the > same entity, being the OS or system firmware. The kernel wants to be > flexible by first having a default policy, but also by providing command > line parameters to enable us all to do what we want even if it might > violate the recommendations. > > The following mentioned patch brought the kernels default behavior > more in line with the specification around AER, but changed its behavior > around DPC on PCIe Downstream Switch Ports; preventing the kernel from > controlling DPC on them unless using pcie_ports=dpc-native. > * "PCI/portdrv: Allow AER service only for Root Ports & RCECs" > After this change the behavior around using DPC on PCIe switch ports > and Root Ports should be as it was before. > > Signed-off-by: Matthew W Carlis <mattc@purestorage.com> > --- > drivers/pci/pcie/portdrv.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..8e023aa97672 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -257,12 +257,19 @@ static int get_port_device_capability(struct pci_dev *dev) > } > > /* > + * _OSC AER Control is required by the OS & requires OS to control AER, > + * but _OSC DPC Control isn't required by the OS to control DPC; however > + * it does require the OS to control DPC. _OSC DPC Control also requres > + * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) > + * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends > + * platform fw or OS always link control of DPC to AER. > + * > * With dpc-native, allow Linux to use DPC even if it doesn't have > * permission to use AER. > */ > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && > - pci_aer_available() && > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + pci_aer_available() && (pcie_ports_dpc_native || > + (dev->aer_cap && host->native_aer))) > services |= PCIE_PORT_SERVICE_DPC; This is easier to read if we retain the original line breaks, i.e., - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || (dev->aer_cap && host->native_aer))) Prior to d8d2b65a940b, we set PCIE_PORT_SERVICE_AER for a device whenever it had an AER Capability. If it had a DPC Capability, we also set PCIE_PORT_SERVICE_DPC so DPC would work on it. After d8d2b65a940b, we only set PCIE_PORT_SERVICE_AER for Root Ports and RCECs because the AER driver only binds to those devices. We no longer set PCIE_PORT_SERVICE_DPC for Switch Downstream Ports because they don't have PCIE_PORT_SERVICE_AER set. The result is that you need "pcie_ports=dpc-native" to make DPC work on those devices when you didn't need it before d8d2b65a940b. That's a regression that we need to fix: #regzbot introduced: d8d2b65a940b ("PCI/portdrv: Allow AER service only for Root Ports & RCECs") _OSC directly supports negotiation of DPC ownership, and I think we should pay attention to what it tell us. We already request DPC control and set native_dpc accordingly, but we don't use it here; currently we only look at it in the unrelated pciehp_ist() path. Can you try the patch below and see if it resolves the problem? I don't think we need to complicate this by trying to enforce the AER/DPC dependencies in the OS. The firmware spec already requires platforms to either retain ownership of both AER and DPC, or grant ownership of both to the OS. diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 14a4b89a3b83..423dadd6727e 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -262,7 +262,7 @@ static int get_port_device_capability(struct pci_dev *dev) */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + (pcie_ports_dpc_native || host->native_dpc)) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
On 12/28/2023 1:23 PM, Bjorn Helgaas wrote: > [+cc Keith, Lukas] > > Hi Matthew, thanks for your work and the patch. > > On Sat, Dec 23, 2023 at 02:22:35PM -0700, Matthew W Carlis wrote: >> This change ensures the kernel will use DPC on a supporting device if >> the kernel will also control AER on the Root Ports & RCECs. >> >> The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI >> specifications. It is recommended to always link control of both to the >> same entity, being the OS or system firmware. The kernel wants to be >> flexible by first having a default policy, but also by providing command >> line parameters to enable us all to do what we want even if it might >> violate the recommendations. >> >> The following mentioned patch brought the kernels default behavior >> more in line with the specification around AER, but changed its behavior >> around DPC on PCIe Downstream Switch Ports; preventing the kernel from >> controlling DPC on them unless using pcie_ports=dpc-native. >> * "PCI/portdrv: Allow AER service only for Root Ports & RCECs" >> After this change the behavior around using DPC on PCIe switch ports >> and Root Ports should be as it was before. >> >> Signed-off-by: Matthew W Carlis <mattc@purestorage.com> >> --- >> drivers/pci/pcie/portdrv.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c >> index 14a4b89a3b83..8e023aa97672 100644 >> --- a/drivers/pci/pcie/portdrv.c >> +++ b/drivers/pci/pcie/portdrv.c >> @@ -257,12 +257,19 @@ static int get_port_device_capability(struct pci_dev *dev) >> } >> >> /* >> + * _OSC AER Control is required by the OS & requires OS to control AER, >> + * but _OSC DPC Control isn't required by the OS to control DPC; however >> + * it does require the OS to control DPC. _OSC DPC Control also requres >> + * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) >> + * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends >> + * platform fw or OS always link control of DPC to AER. >> + * >> * With dpc-native, allow Linux to use DPC even if it doesn't have >> * permission to use AER. >> */ >> if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && >> - pci_aer_available() && >> - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) >> + pci_aer_available() && (pcie_ports_dpc_native || >> + (dev->aer_cap && host->native_aer))) >> services |= PCIE_PORT_SERVICE_DPC; > > This is easier to read if we retain the original line breaks, i.e., > > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + (pcie_ports_dpc_native || (dev->aer_cap && host->native_aer))) > > Prior to d8d2b65a940b, we set PCIE_PORT_SERVICE_AER for a device > whenever it had an AER Capability. If it had a DPC Capability, we > also set PCIE_PORT_SERVICE_DPC so DPC would work on it. > > After d8d2b65a940b, we only set PCIE_PORT_SERVICE_AER for Root Ports > and RCECs because the AER driver only binds to those devices. We no > longer set PCIE_PORT_SERVICE_DPC for Switch Downstream Ports because > they don't have PCIE_PORT_SERVICE_AER set. > > The result is that you need "pcie_ports=dpc-native" to make DPC work > on those devices when you didn't need it before d8d2b65a940b. > > That's a regression that we need to fix: > #regzbot introduced: d8d2b65a940b ("PCI/portdrv: Allow AER service only for Root Ports & RCECs") > > _OSC directly supports negotiation of DPC ownership, and I think we > should pay attention to what it tell us. We already request DPC > control and set native_dpc accordingly, but we don't use it here; > currently we only look at it in the unrelated pciehp_ist() path. > > Can you try the patch below and see if it resolves the problem? > > I don't think we need to complicate this by trying to enforce the > AER/DPC dependencies in the OS. The firmware spec already requires > platforms to either retain ownership of both AER and DPC, or grant > ownership of both to the OS. Change looks fine to me. Once concern is, what if we are dealing with a buggy firmware which give DPC native control, but retains AER? Do you think it makes sense to have a sanity check to make sure this does not happen? > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..423dadd6727e 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -262,7 +262,7 @@ static int get_port_device_capability(struct pci_dev *dev) > */ > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && > pci_aer_available() && > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + (pcie_ports_dpc_native || host->native_dpc)) > services |= PCIE_PORT_SERVICE_DPC; > > if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 14a4b89a3b83..8e023aa97672 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -257,12 +257,19 @@ static int get_port_device_capability(struct pci_dev *dev) } /* + * _OSC AER Control is required by the OS & requires OS to control AER, + * but _OSC DPC Control isn't required by the OS to control DPC; however + * it does require the OS to control DPC. _OSC DPC Control also requres + * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) + * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends + * platform fw or OS always link control of DPC to AER. + * * With dpc-native, allow Linux to use DPC even if it doesn't have * permission to use AER. */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && - pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + pci_aer_available() && (pcie_ports_dpc_native || + (dev->aer_cap && host->native_aer))) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
This change ensures the kernel will use DPC on a supporting device if the kernel will also control AER on the Root Ports & RCECs. The rules around controlling DPC/AER are somewhat clear in PCIe/ACPI specifications. It is recommended to always link control of both to the same entity, being the OS or system firmware. The kernel wants to be flexible by first having a default policy, but also by providing command line parameters to enable us all to do what we want even if it might violate the recommendations. The following mentioned patch brought the kernels default behavior more in line with the specification around AER, but changed its behavior around DPC on PCIe Downstream Switch Ports; preventing the kernel from controlling DPC on them unless using pcie_ports=dpc-native. * "PCI/portdrv: Allow AER service only for Root Ports & RCECs" After this change the behavior around using DPC on PCIe switch ports and Root Ports should be as it was before. Signed-off-by: Matthew W Carlis <mattc@purestorage.com> --- drivers/pci/pcie/portdrv.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)