Message ID | 20221115031115.1666464-1-LeoLiu-oc@zhaoxin.com (mailing list archive) |
---|---|
Headers | show |
Series | Parse the PCIe AER and set to relevant registers | expand |
[+cc Sathy, Ming, since they commented on the previous version] On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: > From: leoliu-oc <leoliu-oc@zhaoxin.com> > > According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the > register values form HEST PCI Express AER Structure should be written to > relevant PCIe Device's AER Capabilities. So the purpose of the patch set > is to extract register values from HEST PCI Express AER structures and > program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more > detailed description. I wasn't involved in this part of the ACPI spec, and I don't understand how this is intended to work. I see that this series extracts AER mask, severity, and control information from the ACPI HEST table and uses it to configure PCIe devices as they are enumerated. What I don't understand is how this relates to ownership of the AER capability as negotiated by the _OSC method. Firmware can configure the AER capability itself, and if it retains control of the AER capability, the OS can't write to it (with the exception of clearing EDR error status), so this wouldn't be necessary. If the OS owns the AER capability, I assume it gets to decide for itself how to configure AER, no matter what the ACPI HEST says. Maybe this is intended for the case where firmware retains AER ownership but the OS uses native hotplug (pciehp), and this is a way for the OS to configure new devices as the firmware expects? But in that case, we still have the problem that the OS can't write to the AER capability to do this configuration. Bjorn
在 2023/4/8 7:18, Bjorn Helgaas 写道: > [+cc Sathy, Ming, since they commented on the previous version] > > On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: >> From: leoliu-oc <leoliu-oc@zhaoxin.com> >> >> According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the >> register values form HEST PCI Express AER Structure should be written to >> relevant PCIe Device's AER Capabilities. So the purpose of the patch set >> is to extract register values from HEST PCI Express AER structures and >> program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more >> detailed description. > > I wasn't involved in this part of the ACPI spec, and I don't > understand how this is intended to work. > > I see that this series extracts AER mask, severity, and control > information from the ACPI HEST table and uses it to configure PCIe > devices as they are enumerated. > > What I don't understand is how this relates to ownership of the AER > capability as negotiated by the _OSC method. Firmware can configure > the AER capability itself, and if it retains control of the AER > capability, the OS can't write to it (with the exception of clearing > EDR error status), so this wouldn't be necessary. There is no relationship between the ownership of the AER related register and the ownership of the AER capability in the OS or Firmware. The processing here is to initialize the AER related register, not the AER event. If Firmware is configured with AER register, it will not be able to handle the runtime hot reset and link retrain cases in addition to the hotplug case you mentioned below. > > If the OS owns the AER capability, I assume it gets to decide for > itself how to configure AER, no matter what the ACPI HEST says. > What information does the OS use to decide how to configure AER? The ACPI Spec has the following description: PCI Express (PCIe) root ports may implement PCIe Advanced Error Reporting (AER) support. This table(HEST) contains information platform firmware supplies to OSPM for configuring AER support on a given root port. We understand that HEST stands for user to express expectations. In the current implementation, the OS already configures a PCIE device based on _HPP/_HPX method when configuring a PCI device inserted into a hot-plug slot or initial configuration of a PCI device at system boot. HEST is just another way to express the desired configuration of the user. Yours sincerely, Leoliu-oc > Maybe this is intended for the case where firmware retains AER > ownership but the OS uses native hotplug (pciehp), and this is a way > for the OS to configure new devices as the firmware expects? But in > that case, we still have the problem that the OS can't write to the > AER capability to do this configuration. > > Bjorn
On Wed, Apr 12, 2023 at 05:11:28PM +0800, LeoLiuoc wrote: > 在 2023/4/8 7:18, Bjorn Helgaas 写道: > > On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: > > > From: leoliu-oc <leoliu-oc@zhaoxin.com> > > > > > > According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the > > > register values form HEST PCI Express AER Structure should be written to > > > relevant PCIe Device's AER Capabilities. So the purpose of the patch set > > > is to extract register values from HEST PCI Express AER structures and > > > program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more > > > detailed description. > > > > I wasn't involved in this part of the ACPI spec, and I don't > > understand how this is intended to work. > > > > I see that this series extracts AER mask, severity, and control > > information from the ACPI HEST table and uses it to configure PCIe > > devices as they are enumerated. > > > > What I don't understand is how this relates to ownership of the AER > > capability as negotiated by the _OSC method. Firmware can configure > > the AER capability itself, and if it retains control of the AER > > capability, the OS can't write to it (with the exception of clearing > > EDR error status), so this wouldn't be necessary. > > There is no relationship between the ownership of the AER related > register and the ownership of the AER capability in the OS or > Firmware. I don't understand this; can you say it another way? "Ownership of the AER related register" and "ownership of the AER capability" sound exactly the same to me. > The processing here is to initialize the AER related register, not > the AER event. If Firmware is configured with AER register, it will > not be able to handle the runtime hot reset and link retrain cases > in addition to the hotplug case you mentioned below. > > > If the OS owns the AER capability, I assume it gets to decide for > > itself how to configure AER, no matter what the ACPI HEST says. > > What information does the OS use to decide how to configure AER? The > ACPI Spec has the following description: PCI Express (PCIe) root > ports may implement PCIe Advanced Error Reporting (AER) support. > This table(HEST) contains information platform firmware supplies to > OSPM for configuring AER support on a given root port. We understand > that HEST stands for user to express expectations. > > In the current implementation, the OS already configures a PCIE > device based on _HPP/_HPX method when configuring a PCI device > inserted into a hot-plug slot or initial configuration of a PCI > device at system boot. HEST is just another way to express the > desired configuration of the user. Why was the HEST mechanism added if the functionality is equivalent to the existing _HPP/_HPX? There must be something that HEST supplies that _HPP/_HPX did not. I think we need some things in the commit log (and short comments in the code) to help maintain this in the future: - What problem does this solve, e.g., is there some bug that happens because we lack this functionality? - How is this HEST mechanism related to _HPP/_HPX? What are the differences? - How is this related to _OSC AER ownership? I think we ignore _OSC ownership in the existing _HPP/_HPX code, but that seems like a potential problem. The PCI Firmware spec (r3.3, sec 4.5.1) is pretty clear: If control of this feature was requested and denied or was not requested, firmware returns this bit set to 0, and the operating system must not modify the Advanced Error Reporting Capability or the other error enable/status bits listed above. > > Maybe this is intended for the case where firmware retains AER > > ownership but the OS uses native hotplug (pciehp), and this is a way > > for the OS to configure new devices as the firmware expects? But in > > that case, we still have the problem that the OS can't write to the > > AER capability to do this configuration. > > > > Bjorn
On 4/12/23 2:11 AM, LeoLiuoc wrote: > > > 在 2023/4/8 7:18, Bjorn Helgaas 写道: >> [+cc Sathy, Ming, since they commented on the previous version] >> >> On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: >>> From: leoliu-oc <leoliu-oc@zhaoxin.com> >>> >>> According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the >>> register values form HEST PCI Express AER Structure should be written to >>> relevant PCIe Device's AER Capabilities. So the purpose of the patch set >>> is to extract register values from HEST PCI Express AER structures and >>> program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more >>> detailed description. >> >> I wasn't involved in this part of the ACPI spec, and I don't >> understand how this is intended to work. >> >> I see that this series extracts AER mask, severity, and control >> information from the ACPI HEST table and uses it to configure PCIe >> devices as they are enumerated. >> >> What I don't understand is how this relates to ownership of the AER >> capability as negotiated by the _OSC method. Firmware can configure >> the AER capability itself, and if it retains control of the AER >> capability, the OS can't write to it (with the exception of clearing >> EDR error status), so this wouldn't be necessary. > > There is no relationship between the ownership of the AER related register and the ownership of the AER capability in the OS or Firmware. The processing here is to initialize the AER related register, not the AER event. If Firmware is configured No, the above statement is not correct. Let's assume that if the AER feature is owned by firmware and OS arbitrarily configures the AER registers, does it seem right? If firmware or OS owns a feature, after _OSC negotiation, it assumed that other component will not touch the relevant registers. There could be exceptions (like EDR), but it needs to be documented in the spec. with AER register, it will not be able to handle the runtime hot reset and link retrain cases in addition to the hotplug case you mentioned below. IIUC, here we are trying to use HEST table to configure AER registers. Does HEST table override the _OSC based ownership? Can we assume if HEST table exist, then irrespective who owns the feature (firmware or OS), OS is allowed to configure the AER registers? Is there a spec statement confirming the above assumption? > >> >> If the OS owns the AER capability, I assume it gets to decide for >> itself how to configure AER, no matter what the ACPI HEST says. >> > > What information does the OS use to decide how to configure AER? The ACPI Spec has the following description: PCI Express (PCIe) root ports may implement PCIe Advanced Error Reporting (AER) support. This table(HEST) contains information platform firmware supplies to OSPM for configuring AER support on a given root port. We understand that HEST stands for user to express expectations. > > In the current implementation, the OS already configures a PCIE device based on _HPP/_HPX method when configuring a PCI device inserted into a hot-plug slot or initial configuration of a PCI device at system boot. HEST is just another way to express the desired configuration of the user. > > Yours sincerely, > Leoliu-oc > >> Maybe this is intended for the case where firmware retains AER >> ownership but the OS uses native hotplug (pciehp), and this is a way >> for the OS to configure new devices as the firmware expects? But in >> that case, we still have the problem that the OS can't write to the >> AER capability to do this configuration. >> >> Bjorn >
在 2023/4/13 0:32, Bjorn Helgaas 写道: > On Wed, Apr 12, 2023 at 05:11:28PM +0800, LeoLiuoc wrote: >> 在 2023/4/8 7:18, Bjorn Helgaas 写道: >>> On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: >>>> From: leoliu-oc <leoliu-oc@zhaoxin.com> >>>> >>>> According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the >>>> register values form HEST PCI Express AER Structure should be written to >>>> relevant PCIe Device's AER Capabilities. So the purpose of the patch set >>>> is to extract register values from HEST PCI Express AER structures and >>>> program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more >>>> detailed description. >>> >>> I wasn't involved in this part of the ACPI spec, and I don't >>> understand how this is intended to work. >>> >>> I see that this series extracts AER mask, severity, and control >>> information from the ACPI HEST table and uses it to configure PCIe >>> devices as they are enumerated. >>> >>> What I don't understand is how this relates to ownership of the AER >>> capability as negotiated by the _OSC method. Firmware can configure >>> the AER capability itself, and if it retains control of the AER >>> capability, the OS can't write to it (with the exception of clearing >>> EDR error status), so this wouldn't be necessary. >> >> There is no relationship between the ownership of the AER related >> register and the ownership of the AER capability in the OS or >> Firmware. > > I don't understand this; can you say it another way? "Ownership of > the AER related register" and "ownership of the AER capability" sound > exactly the same to me. > I would like to state that the operation of writing the AER capability register of the relevant PCIe device through the HEST PCI Express AER structure has nothing to do with the ownership of the AER. I do not find a direct statement from ACPI Spec r6.5 that allows the OS to write the value of the HEST AER register to the AER register of the corresponding device without AER control but I looked in ACPI Spec for a description of the relationship between writing to the AER register through the _HPP/_HPX method and whether the OS requires AER control: The expression are as follows: 1. OSPM uses the information returned by _HPX to determine how ①to configure PCI Functions that are hot- plugged into the system, ②to configure Functions not configured by the platform firmware during initial system boot, ③and to configure Functions any time they lose configuration space settings (e.g. OSPM issues a Secondary Bus Reset/Function Level Reset or Downstream Port Containment is triggered). 2. _HPX may return multiple types or Record Settings (each setting in a single sub-package.) OSPM is responsible for detecting the type of Function and for applying the appropriate settings. OSPM is also responsible for detecting the device / port type of the PCI Express Function and applying the appropriate settings provided. For example, the Secondary Uncorrectable Error Severity and Secondary Uncorrectable Error Mask settings of Type 2 record are only applicable to PCI Express to PCI-X/PCI Bridge whose device / port type is 1000b. Similarly, AER settings are only applicable to hot plug PCI Express devices that support the optional AER capability. 3. Note: OSPM may override the settings provided by the _HPX object’s Type2 record (PCI Express Settings) or Type3 record (PCI Express Descriptor Settings) when OSPM has assumed native control of the corresponding feature. For example, if OSPM has assumed ownership of AER (via _OSC), OSPM may override AER related settings returned by _HPX. This means that writing the AER register value by _HPX does not require the OS to gain control of the AER. Also from the usage description of _HPX, I think ownership of AER means who decides the configuration value of the AER register rather than who can write the configuration value. Even though the OS does not have control or ownership of the AER, it should still write the configuration values determined by the firmware to the AER register at the request of the firmware. Therefore, considering that HEST AER patch is an effective supplement to _HPP/_HPX method when the Firmware does not support the _HPP/_HPX method, I think the question about whether OS has control of AER to write the information in the HEST AER structure to the AER register of the corresponding device is similar to the question about _HPX/_HPP method to write the AER information to the AER register of the corresponding device. Therefore, the ownership of AER is not considered in this patch. >> The processing here is to initialize the AER related register, not >> the AER event. If Firmware is configured with AER register, it will >> not be able to handle the runtime hot reset and link retrain cases >> in addition to the hotplug case you mentioned below. >> >>> If the OS owns the AER capability, I assume it gets to decide for >>> itself how to configure AER, no matter what the ACPI HEST says. >> >> What information does the OS use to decide how to configure AER? The >> ACPI Spec has the following description: PCI Express (PCIe) root >> ports may implement PCIe Advanced Error Reporting (AER) support. >> This table(HEST) contains information platform firmware supplies to >> OSPM for configuring AER support on a given root port. We understand >> that HEST stands for user to express expectations. >> >> In the current implementation, the OS already configures a PCIE >> device based on _HPP/_HPX method when configuring a PCI device >> inserted into a hot-plug slot or initial configuration of a PCI >> device at system boot. HEST is just another way to express the >> desired configuration of the user. > > Why was the HEST mechanism added if the functionality is equivalent > to the existing _HPP/_HPX? There must be something that HEST supplies > that _HPP/_HPX did not. > > I think we need some things in the commit log (and short comments in > the code) to help maintain this in the future: > > - What problem does this solve, e.g., is there some bug that happens > because we lack this functionality? > > - How is this HEST mechanism related to _HPP/_HPX? What are the > differences? > > - How is this related to _OSC AER ownership? > Yes, I'll add explanations of these issues to the commit log in the next release. > I think we ignore _OSC ownership in the existing _HPP/_HPX code, but > that seems like a potential problem. The PCI Firmware spec (r3.3, sec > 4.5.1) is pretty clear: > > If control of this feature was requested and denied or was not > requested, firmware returns this bit set to 0, and the operating > system must not modify the Advanced Error Reporting Capability or > the other error enable/status bits listed above. > PCI Firmware Spec is not very clear about the relationship between configuring the AER register and the ownership of the AER. ACPI Spec v6.5 does specify the use of _HPP or _HPX: writing to the AER register through the _HPP/HPX method does not require the OS to acquire control of the AER. Your Sincerely, LeoLiu-oc >>> Maybe this is intended for the case where firmware retains AER >>> ownership but the OS uses native hotplug (pciehp), and this is a way >>> for the OS to configure new devices as the firmware expects? But in >>> that case, we still have the problem that the OS can't write to the >>> AER capability to do this configuration. >>> >>> Bjorn
在 2023/4/13 0:40, Sathyanarayanan Kuppuswamy 写道: > > > On 4/12/23 2:11 AM, LeoLiuoc wrote: >> >> >> 在 2023/4/8 7:18, Bjorn Helgaas 写道: >>> [+cc Sathy, Ming, since they commented on the previous version] >>> >>> On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote: >>>> From: leoliu-oc <leoliu-oc@zhaoxin.com> >>>> >>>> According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the >>>> register values form HEST PCI Express AER Structure should be written to >>>> relevant PCIe Device's AER Capabilities. So the purpose of the patch set >>>> is to extract register values from HEST PCI Express AER structures and >>>> program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more >>>> detailed description. >>> >>> I wasn't involved in this part of the ACPI spec, and I don't >>> understand how this is intended to work. >>> >>> I see that this series extracts AER mask, severity, and control >>> information from the ACPI HEST table and uses it to configure PCIe >>> devices as they are enumerated. >>> >>> What I don't understand is how this relates to ownership of the AER >>> capability as negotiated by the _OSC method. Firmware can configure >>> the AER capability itself, and if it retains control of the AER >>> capability, the OS can't write to it (with the exception of clearing >>> EDR error status), so this wouldn't be necessary. >> >> There is no relationship between the ownership of the AER related register and the ownership of the AER capability in the OS or Firmware. The processing here is to initialize the AER related register, not the AER event. If Firmware is configured > > No, the above statement is not correct. Let's assume that if the AER > feature is owned by firmware and OS arbitrarily configures the AER > registers, does it seem right? If firmware or OS owns a feature, after > _OSC negotiation, it assumed that other component will not touch the > relevant registers. There could be exceptions (like EDR), but it needs > to be documented in the spec. > I do not find a direct statement from ACPI Spec r6.5 that allows the OS to write the value of the HEST AER register to the AER register of the corresponding device without AER control but I looked in ACPI Spec for a description of the relationship between writing to the AER register through the _HPP/_HPX method and whether the OS requires AER control: The expression are as follows: 1. OSPM uses the information returned by _HPX to determine how ①to configure PCI Functions that are hot- plugged into the system, ②to configure Functions not configured by the platform firmware during initial system boot, ③and to configure Functions any time they lose configuration space settings (e.g. OSPM issues a Secondary Bus Reset/Function Level Reset or Downstream Port Containment is triggered). 2. _HPX may return multiple types or Record Settings (each setting in a single sub-package.) OSPM is responsible for detecting the type of Function and for applying the appropriate settings. OSPM is also responsible for detecting the device / port type of the PCI Express Function and applying the appropriate settings provided. For example, the Secondary Uncorrectable Error Severity and Secondary Uncorrectable Error Mask settings of Type 2 record are only applicable to PCI Express to PCI-X/PCI Bridge whose device / port type is 1000b. Similarly, AER settings are only applicable to hot plug PCI Express devices that support the optional AER capability. 3. Note: OSPM may override the settings provided by the _HPX object’s Type2 record (PCI Express Settings) or Type3 record (PCI Express Descriptor Settings) when OSPM has assumed native control of the corresponding feature. For example, if OSPM has assumed ownership of AER (via _OSC), OSPM may override AER related settings returned by _HPX. This means that writing the AER register value by _HPX does not require the OS to gain control of the AER. Also from the usage description of _HPX, I think ownership of AER means who decides the configuration value of the AER register rather than who can write the configuration value. Even though the OS does not have control or ownership of the AER, it should still write the configuration values determined by the firmware to the AER register at the request of the firmware. Therefore, considering that HEST AER patch is an effective supplement to _HPP/_HPX method when the Firmware does not support the _HPP/_HPX method, I think the question about whether OS has control of AER to write the information in the HEST AER structure to the AER register of the corresponding device is similar to the question about _HPX/_HPP method to write the AER information to the AER register of the corresponding device. Therefore, the ownership of AER is not considered in this patch. > with AER register, it will not be able to handle the runtime hot reset and link retrain cases in addition to the hotplug case you mentioned below. > > IIUC, here we are trying to use HEST table to configure AER registers. > Does HEST table override the _OSC based ownership? Can we assume if > HEST table exist, then irrespective who owns the feature (firmware or > OS), OS is allowed to configure the AER registers? Is there a spec > statement confirming the above assumption? > No direct statement to support this view is explicitly found in ACPI Spec v6.5. Considering that HEST AER patch is an effective supplement to _HPP/_HPX method when the Firmware does not support the _HPP/_HPX method, I think the question about whether OS has control of AER to write the information in the HEST AER structure to the AER register of the corresponding device is similar to the question about _HPX/_HPP method to write the AER information to the AER register of the corresponding device. Since writing the AER register value by _HPX does not require the OS to gain control of the AER, the ownership of AER is not considered in this patch. Your sincerely, Leoliu-oc >> >>> >>> If the OS owns the AER capability, I assume it gets to decide for >>> itself how to configure AER, no matter what the ACPI HEST says. >>> >> >> What information does the OS use to decide how to configure AER? The ACPI Spec has the following description: PCI Express (PCIe) root ports may implement PCIe Advanced Error Reporting (AER) support. This table(HEST) contains information platform firmware supplies to OSPM for configuring AER support on a given root port. We understand that HEST stands for user to express expectations. >> >> In the current implementation, the OS already configures a PCIE device based on _HPP/_HPX method when configuring a PCI device inserted into a hot-plug slot or initial configuration of a PCI device at system boot. HEST is just another way to express the desired configuration of the user. >> >> Yours sincerely, >> Leoliu-oc >> >>> Maybe this is intended for the case where firmware retains AER >>> ownership but the OS uses native hotplug (pciehp), and this is a way >>> for the OS to configure new devices as the firmware expects? But in >>> that case, we still have the problem that the OS can't write to the >>> AER capability to do this configuration. >>> >>> Bjorn >> >
From: leoliu-oc <leoliu-oc@zhaoxin.com> According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the register values form HEST PCI Express AER Structure should be written to relevant PCIe Device's AER Capabilities. So the purpose of the patch set is to extract register values from HEST PCI Express AER structures and program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more detailed description. v2: - Optimize the description of patches. - Adjusted the code logic in function apei_hest_parse_aer. leoliu-oc (5): ACPI/APEI: Add apei_hest_parse_aer() ACPI/APEI: Remove static from apei_hest_parse() ACPI/PCI: Add AER bits #defines for PCIe to PCI/PCI-X Bridge ACPI/PCI: Add pci_acpi_program_hest_aer_params() ACPI/PCI: Config PCIe devices's AER register drivers/acpi/apei/hest.c | 117 +++++++++++++++++++++++++++++++++- drivers/pci/pci-acpi.c | 92 ++++++++++++++++++++++++++ drivers/pci/pci.h | 5 ++ drivers/pci/probe.c | 1 + include/acpi/actbl1.h | 69 ++++++++++++++++++++ include/acpi/apei.h | 9 +++ include/uapi/linux/pci_regs.h | 5 ++ 7 files changed, 295 insertions(+), 3 deletions(-)