Message ID | 20230418210526.36514-2-Smita.KoralahalliChannabasappa@amd.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: pciehp: Add support for native AER and DPC handling on async remove | expand |
On 4/18/23 2:05 PM, Smita Koralahalli wrote: > According to Section 6.7.6 of PCIe Base Specification [1], async removal > with DPC and EDR may be unexpected and may result in a Surprise Down error. > This error is just a side effect of hot remove. Most of the time, these > errors will be abstract to the OS as current systems rely on Firmware-First > model for AER and DPC, where the error handling (side effects of async > remove) and other necessary HW sequencing actions is taken care by the FW > and is abstract to the OS. However, FW-First model poses issues while > rolling out updates or fixing bugs as the servers need to be brought down > for firmware updates. > > Add support for async hot-plug with native AER and DPC/EDR. Here, OS is > responsible for handling async add and remove along with handling of AER > and DPC events which are generated as a side-effect of async remove. PCIe spec r6.0, sec 6.7.6 mentions that the async removal can be handled via DPC. So why treat it as a special case here? What do we gain with this patch other than preventing the error recovery process? > > The implementation is as follows: On an async remove a DPC is triggered > along with a Presence Detect State change. Determine it's an async remove > by checking for DPC Trigger Status in DPC Status Register and Surprise Down > Error Status in AER Uncorrected Error Status to be non-zero. If true, treat > the DPC event as a side-effect of async remove, clear the error status > registers and continue with hot-plug tear down routines. If not, follow the > existing routine to handle AER and DPC errors. > > Dmesg before: > > pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 > pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected > pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) > pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 > pcieport 0000:00:01.4: [ 5] SDES (First) > nvme nvme2: frozen state error detected, reset controller > pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec > pcieport 0000:00:01.4: AER: subordinate device reset failed > pcieport 0000:00:01.4: AER: device recovery failed > pcieport 0000:00:01.4: pciehp: Slot(16): Link Down > nvme2n1: detected capacity change from 1953525168 to 0 > pci 0000:04:00.0: Removing from iommu group 49 > > Dmesg after: > > pcieport 0000:00:01.4: pciehp: Slot(16): Link Down > nvme1n1: detected capacity change from 1953525168 to 0 > pci 0000:04:00.0: Removing from iommu group 37 > > [1] PCI Express Base Specification Revision 6.0, Dec 16 2021. > https://members.pcisig.com/wg/PCI-SIG/document/16609 > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> > --- > v2: > Indentation is taken care. (Bjorn) > Unrelevant dmesg logs are removed. (Bjorn) > Rephrased commit message, to be clear on native vs FW-First > handling. (Bjorn and Sathyanarayanan) > Prefix changed from pciehp_ to dpc_. (Lukas) > Clearing ARI and AtomicOp Requester are performed as a part of > (de-)enumeration in pciehp_unconfigure_device(). (Lukas) > Changed to clearing all optional capabilities in DEVCTL2. > OS-First -> native. (Sathyanarayanan) > > Please note that, I have provided explanation why I'm not setting the > Surprise Down bit in uncorrectable error mask register in AER. > https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/ > > Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set > on an async remove and will not be cleared while the device is brought > down. I have included clearing them here in order to mask any kind of > appearance that there was an error and as well duplicating our BIOS > functionality. I can remove if its not necessary. > > On AMD systems we observe Presence Detect State change along with DPC > event on an async remove. Hence, the errors observed are benign on AMD > systems and the device will be brought down normally with PDSC. But the > errors logged might confuse users. > --- > drivers/pci/pcie/dpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index a5d7c69b764e..78559188b9ac 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -293,10 +293,60 @@ void dpc_process_error(struct pci_dev *pdev) > } > } > > +static void pci_clear_surpdn_errors(struct pci_dev *pdev) > +{ > + u16 reg16; > + u32 reg32; > + > + pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, ®32); > + pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32); > + > + pci_read_config_word(pdev, PCI_STATUS, ®16); > + pci_write_config_word(pdev, PCI_STATUS, reg16); > + > + pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, ®16); > + pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16); > +} > + > +static void dpc_handle_surprise_removal(struct pci_dev *pdev) > +{ > + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) > + return; > + > + /* > + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async > + * removal might be unexpected, errors might be reported as a side > + * effect of the event and software should handle them as an expected > + * part of this event. > + */ > + pci_aer_raw_clear_status(pdev); > + pci_clear_surpdn_errors(pdev); > + > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, > + PCI_EXP_DPC_STATUS_TRIGGER); > +} > + > +static bool dpc_is_surprise_removal(struct pci_dev *pdev) > +{ > + u16 status; > + > + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); > + > + if (!(status & PCI_ERR_UNC_SURPDN)) > + return false; > + > + dpc_handle_surprise_removal(pdev); > + > + return true; > +} > + > static irqreturn_t dpc_handler(int irq, void *context) > { > struct pci_dev *pdev = context; > > + if (dpc_is_surprise_removal(pdev)) > + return IRQ_HANDLED; > + > dpc_process_error(pdev); > > /* We configure DPC so it only triggers on ERR_FATAL */
On Wed, May 10, 2023 at 02:42:13PM -0700, Smita Koralahalli wrote: > As far as I can see, async removal solely with DPC is not handled properly > in Linux. The dpc driver can react to a DPC event, attempt reset recovery. But it doesn't do de-enumeration or re-enumeration of subordinate devices. It also doesn't do slot handling (enable/disable Power Controller etc). That's only implemented in the hotplug driver. PCIe r6.0.1 contains appendix I.2 which basically suggests to "use DPC" for async hot-plug but that doesn't really seem to make sense. > On AMD systems, PDSC is triggered along with DPC on a async remove. And this > PDSC event (hotplug handler) will unconfigure and uninitialize the driver > and device. > This is one thing which I wanted clarity on as per my question in v1. > Whether all systems > trigger PDSC on a async remove along with DPC? In principle, yes. Actually the hotplug driver will see both a DLLSC *and* a PDC event and will react to whichever comes first. Experience has shown that the two events may occur in arbitrary order and with significant delays in-between. There are systems which erroneously hardwire Presence Detect to zero. The hotplug driver works even with those. It solely relies on the DLLSC event then, see commit 80696f991424 ("PCI: pciehp: Tolerate Presence Detect hardwired to zero"). > I feel there are two approaches going forward. Since, hotplug handler is > also > triggered with PDSC, rely on it to bring down the device and prevent calling > the > error_recovery process in dpc handler as its not a true error event. I have > taken this > approach. > > Or, don't call the hotplug handler at all and rely on DPC solely to bring > down the device > but here, there should be additional callbacks to unconfigure and > uninitialize the pcie > driver and currently I only see report_slot_reset() being called from > error_recovery() > and I don't think it unconfigures the driver/device. The latter approach doesn't really make sense to me because we'd have to duplicate all the slot handling and device de-/re-enumeration in the dpc driver. Let's try masking Surprise Down Errors first and see how that goes. Thanks, Lukas
On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote: > According to Section 6.7.6 of PCIe Base Specification [1], async removal > with DPC and EDR may be unexpected and may result in a Surprise Down error. > This error is just a side effect of hot remove. Most of the time, these > errors will be abstract to the OS as current systems rely on Firmware-First > model for AER and DPC, where the error handling (side effects of async > remove) and other necessary HW sequencing actions is taken care by the FW > and is abstract to the OS. However, FW-First model poses issues while > rolling out updates or fixing bugs as the servers need to be brought down > for firmware updates. > > Add support for async hot-plug with native AER and DPC/EDR. Here, OS is > responsible for handling async add and remove along with handling of AER > and DPC events which are generated as a side-effect of async remove. I think you can probably leave out the details about Firmware-First. Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as an expected side-effect of surprise removal is sufficient. They should be treated as such. You also want to point out what you're trying to achieve here, i.e. get rid of irritating log messages and a 1 sec delay while pciehp waits for DPC recovery. > Please note that, I have provided explanation why I'm not setting the > Surprise Down bit in uncorrectable error mask register in AER. > https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/ Add an explanation to the commit message that masking Surprise Down Errors was explored as an alternative approach, but scrapped due to the odd behavior that masking only avoids the interrupt, but still records an error per PCIe r6.0.1 sec 6.2.3.2.2. That stale error is going to be reported the next time some error other than Surprise Down is handled. > Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set > on an async remove and will not be cleared while the device is brought > down. I have included clearing them here in order to mask any kind of > appearance that there was an error and as well duplicating our BIOS > functionality. I can remove if its not necessary. Which bits are set exactly? Can you constrain the register write to those bits? > +static void dpc_handle_surprise_removal(struct pci_dev *pdev) > +{ > + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) > + return; Emit an error message here? > + /* > + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async > + * removal might be unexpected, errors might be reported as a side > + * effect of the event and software should handle them as an expected > + * part of this event. > + */ I'd move that code comment to into dpc_handler() above the "if (dpc_is_surprise_removal(pdev))" check. > + pci_aer_raw_clear_status(pdev); > + pci_clear_surpdn_errors(pdev); > + > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, > + PCI_EXP_DPC_STATUS_TRIGGER); > +} Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end of the function to wake up a pciehp handler waiting for DPC recovery? > +static bool dpc_is_surprise_removal(struct pci_dev *pdev) > +{ > + u16 status; > + > + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); > + > + if (!(status & PCI_ERR_UNC_SURPDN)) > + return false; > + You need an additional check for pdev->is_hotplug_bridge here. And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS. Return false if either of them isn't set. > + dpc_handle_surprise_removal(pdev); > + > + return true; > +} A function named "..._is_..." should not have any side effects. So move the dpc_handle_surprise_removal() call down into dpc_handler() before the "return IRQ_HANDLED;" statement. What about ports which support AER but not DPC? Don't you need to amend aer.c in that case? I suppose you don't have hardware without DPC to test that? Thanks, Lukas
Hi, On 5/16/2023 3:10 AM, Lukas Wunner wrote: > On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote: >> According to Section 6.7.6 of PCIe Base Specification [1], async removal >> with DPC and EDR may be unexpected and may result in a Surprise Down error. >> This error is just a side effect of hot remove. Most of the time, these >> errors will be abstract to the OS as current systems rely on Firmware-First >> model for AER and DPC, where the error handling (side effects of async >> remove) and other necessary HW sequencing actions is taken care by the FW >> and is abstract to the OS. However, FW-First model poses issues while >> rolling out updates or fixing bugs as the servers need to be brought down >> for firmware updates. >> >> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is >> responsible for handling async add and remove along with handling of AER >> and DPC events which are generated as a side-effect of async remove. > > I think you can probably leave out the details about Firmware-First. > Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as > an expected side-effect of surprise removal is sufficient. They should > be treated as such. > > You also want to point out what you're trying to achieve here, i.e. get > rid of irritating log messages and a 1 sec delay while pciehp waits for > DPC recovery. Okay. > > >> Please note that, I have provided explanation why I'm not setting the >> Surprise Down bit in uncorrectable error mask register in AER. >> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/ > > Add an explanation to the commit message that masking Surprise Down Errors > was explored as an alternative approach, but scrapped due to the odd > behavior that masking only avoids the interrupt, but still records an > error per PCIe r6.0.1 sec 6.2.3.2.2. That stale error is going to be > reported the next time some error other than Surprise Down is handled. > > Okay. >> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set >> on an async remove and will not be cleared while the device is brought >> down. I have included clearing them here in order to mask any kind of >> appearance that there was an error and as well duplicating our BIOS >> functionality. I can remove if its not necessary. > > Which bits are set exactly? Can you constrain the register write to > those bits? > Hmm, I was mostly trying to follow the similar approach done for AER. pci_aer_raw_clear_status(), where they clear status registers unconditionally. Also, thought might be better if we are dealing with legacy endpoints or so. I see these bits set in status registers: PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down) PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion) PCI_EXP_DEVSTA 0x604 (fatal error detected) > >> +static void dpc_handle_surprise_removal(struct pci_dev *pdev) >> +{ >> + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) >> + return; > > Emit an error message here? Okay > > >> + /* >> + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async >> + * removal might be unexpected, errors might be reported as a side >> + * effect of the event and software should handle them as an expected >> + * part of this event. >> + */ > > I'd move that code comment to into dpc_handler() above the > "if (dpc_is_surprise_removal(pdev))" check. Okay. > > >> + pci_aer_raw_clear_status(pdev); >> + pci_clear_surpdn_errors(pdev); >> + >> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, >> + PCI_EXP_DPC_STATUS_TRIGGER); >> +} > > Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end > of the function to wake up a pciehp handler waiting for DPC recovery? I don't think so. The pciehp handler is however getting invoked simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm missing anything here. > > >> +static bool dpc_is_surprise_removal(struct pci_dev *pdev) >> +{ >> + u16 status; >> + >> + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); >> + >> + if (!(status & PCI_ERR_UNC_SURPDN)) >> + return false; >> + > > You need an additional check for pdev->is_hotplug_bridge here. > > And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS. > > Return false if either of them isn't set. Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS should be disabled if DPC is enabled. Implementation notes in 6.7.6 says that: "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug Surprise bit in the Slot Capabilities Register being Set, is deprecated for use with async hot-plug. DPC is the recommended mechanism for supporting async hot-plug." Platform FW will disable the SLTCAP_HPS bit at boot time to enable async hotplug on AMD devices. Probably check if SLTCAP_HPS bit is set and return false? > > >> + dpc_handle_surprise_removal(pdev); >> + >> + return true; >> +} > > A function named "..._is_..." should not have any side effects. > So move the dpc_handle_surprise_removal() call down into dpc_handler() > before the "return IRQ_HANDLED;" statement. Okay. > > > What about ports which support AER but not DPC? Don't you need to > amend aer.c in that case? I suppose you don't have hardware without > DPC to test that? Yeah right. Also, if DPC isn't supported we leave HPS bit set and we won't see the DPC event at that time. Thanks, Smita > > Thanks, > > Lukas
On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote: > On 5/16/2023 3:10 AM, Lukas Wunner wrote: > > On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote: > > > Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set > > > on an async remove and will not be cleared while the device is brought > > > down. I have included clearing them here in order to mask any kind of > > > appearance that there was an error and as well duplicating our BIOS > > > functionality. I can remove if its not necessary. > > > > Which bits are set exactly? Can you constrain the register write to > > those bits? > > Hmm, I was mostly trying to follow the similar approach done for AER. > pci_aer_raw_clear_status(), where they clear status registers > unconditionally. Also, thought might be better if we are dealing with legacy > endpoints or so. > > I see these bits set in status registers: > PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down) > PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion) > PCI_EXP_DEVSTA 0x604 (fatal error detected) I'd recommend clearing only PCI_EXP_DEVSTA_FED in PCI_EXP_DEVSTA. As for PCI_EXP_DPC_RP_PIO_STATUS, PCIe r6.1 sec 2.9.3 says that during DPC, either UR or CA completions are returned depending on the DPC Completion Control bit in the DPC Control register. The kernel doesn't touch that bit, so it will contain whatever value the BIOS has set. It seems fine to me to just clear all bits in PCI_EXP_DPC_RP_PIO_STATUS, as you've done in your patch. However, the RP PIO Status register is present only in Root Ports that support RP Extensions for DPC, per PCIe r6.1 sec 7.9.14.6. So you need to constrain that to "if (pdev->dpc_rp_extensions)". > > > + pci_aer_raw_clear_status(pdev); > > > + pci_clear_surpdn_errors(pdev); > > > + > > > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, > > > + PCI_EXP_DPC_STATUS_TRIGGER); > > > +} > > > > Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end > > of the function to wake up a pciehp handler waiting for DPC recovery? > > I don't think so. The pciehp handler is however getting invoked > simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm > missing anything here. I think you need to follow the procedure in dpc_reset_link(). That function first waits for the link to go down, in accordance with PCIe r6.1 sec 6.2.11: if (!pcie_wait_for_link(pdev, false)) ... Note that the link should not come back up due to a newly hot-added device until DPC Trigger Status is cleared. The function then waits for the Root Port to quiesce: if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) ... And only then does the function clear DPC Trigger Status. You definitely need to wake_up_all(&dpc_completed_waitqueue) because pciehp may be waiting for DPC Trigger Status to clear. And you need to "clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags)" before calling wake_up_all(). > > > +static bool dpc_is_surprise_removal(struct pci_dev *pdev) > > > +{ > > > + u16 status; > > > + > > > + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); > > > + > > > + if (!(status & PCI_ERR_UNC_SURPDN)) > > > + return false; > > > + > > > > You need an additional check for pdev->is_hotplug_bridge here. > > > > And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS. > > > > Return false if either of them isn't set. > > Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS > should be disabled if DPC is enabled. > > Implementation notes in 6.7.6 says that: > "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug > Surprise bit in the Slot Capabilities Register being Set, is deprecated > for use with async hot-plug. DPC is the recommended mechanism for supporting > async hot-plug." > > Platform FW will disable the SLTCAP_HPS bit at boot time to enable async > hotplug on AMD devices. Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question? If it's not set, why do you get Surprise Down Errors in the first place? How do you bring down the slot without surprise-removal capability? Via sysfs? > Probably check if SLTCAP_HPS bit is set and return false? Quite the opposite! If it's not set, return false. Thanks, Lukas
On 6/16/2023 10:31 AM, Lukas Wunner wrote: > On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote: >> On 5/16/2023 3:10 AM, Lukas Wunner wrote: >>> On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote: > > I'd recommend clearing only PCI_EXP_DEVSTA_FED in PCI_EXP_DEVSTA. > > As for PCI_EXP_DPC_RP_PIO_STATUS, PCIe r6.1 sec 2.9.3 says that > during DPC, either UR or CA completions are returned depending on > the DPC Completion Control bit in the DPC Control register. > The kernel doesn't touch that bit, so it will contain whatever value > the BIOS has set. It seems fine to me to just clear all bits in > PCI_EXP_DPC_RP_PIO_STATUS, as you've done in your patch. > > However, the RP PIO Status register is present only in Root Ports > that support RP Extensions for DPC, per PCIe r6.1 sec 7.9.14.6. > So you need to constrain that to "if (pdev->dpc_rp_extensions)". > Okay will make changes. > >>>> + pci_aer_raw_clear_status(pdev); >>>> + pci_clear_surpdn_errors(pdev); >>>> + >>>> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, >>>> + PCI_EXP_DPC_STATUS_TRIGGER); >>>> +} >>> >>> Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end >>> of the function to wake up a pciehp handler waiting for DPC recovery? >> >> I don't think so. The pciehp handler is however getting invoked >> simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm >> missing anything here. > > I think you need to follow the procedure in dpc_reset_link(). > > That function first waits for the link to go down, in accordance with > PCIe r6.1 sec 6.2.11: > > if (!pcie_wait_for_link(pdev, false)) > ... > > Note that the link should not come back up due to a newly hot-added > device until DPC Trigger Status is cleared. > > The function then waits for the Root Port to quiesce: > > if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) > ... > > And only then does the function clear DPC Trigger Status. > > You definitely need to wake_up_all(&dpc_completed_waitqueue) because > pciehp may be waiting for DPC Trigger Status to clear. > > And you need to "clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags)" > before calling wake_up_all(). > > Okay. I did not consider the fact that pciehp handler "may" wait on DPC Trigger Status to be cleared. Because in my case both the handlers were invoked due to their respective bit changes and I did not come across the case where pciehp handler was waiting on DPC to complete. >>>> +static bool dpc_is_surprise_removal(struct pci_dev *pdev) >>>> +{ >>>> + u16 status; >>>> + >>>> + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); >>>> + >>>> + if (!(status & PCI_ERR_UNC_SURPDN)) >>>> + return false; >>>> + >>> >>> You need an additional check for pdev->is_hotplug_bridge here. >>> >>> And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS. >>> >>> Return false if either of them isn't set. >> >> Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS >> should be disabled if DPC is enabled. >> >> Implementation notes in 6.7.6 says that: >> "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug >> Surprise bit in the Slot Capabilities Register being Set, is deprecated >> for use with async hot-plug. DPC is the recommended mechanism for supporting >> async hot-plug." >> >> Platform FW will disable the SLTCAP_HPS bit at boot time to enable async >> hotplug on AMD devices. > > Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question? > > If it's not set, why do you get Surprise Down Errors in the first place? > > How do you bring down the slot without surprise-removal capability? > Via sysfs? > As per SPEC 6.7.6, "Either Downstream Port Containment (DPC) or the Hot-Plug Surprise (HPS) mechanism may be used to support async removal as part of an overall async hot-plug architecture". Also, the implementation notes below, it conveys that HPS is deprecated and DPC is recommended mechanism. More details can be found in Appendix I, I.1 Async Hot-Plug Initial Configuration: ... If DPC capability then, If HPS bit not Set, use DPC Else attempt to Clear HPS bit (§ Section 6.7.4.4 ) If successful, use DPC Else use HPS ... So, this is "likely" a new feature support patch where DPC supports async remove. HPS bit will be disabled by BIOS if DPC is chosen as recommended mechanism to handle async removal. I see the slot is being brought down by PDC or DLLSC event, which is triggered alongside DPC. pciehp_handle_presence_or_link_change() -> pciehp_disable_slot() -> __pciehp_disable_slot() -> remove_board().. But I want to clear one thing, are you implying that PDC or DLLSC shouldn't be triggered when HPS is disabled? Thanks, Smita > >> Probably check if SLTCAP_HPS bit is set and return false? > > Quite the opposite! If it's not set, return false. > > > Thanks, > > Lukas
On Fri, Jun 16, 2023 at 04:30:49PM -0700, Smita Koralahalli wrote: > On 6/16/2023 10:31 AM, Lukas Wunner wrote: > > On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote: > > > On 5/16/2023 3:10 AM, Lukas Wunner wrote: > > > > On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote: > > > > > +static bool dpc_is_surprise_removal(struct pci_dev *pdev) > > > > > +{ > > > > > + u16 status; > > > > > + > > > > > + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); > > > > > + > > > > > + if (!(status & PCI_ERR_UNC_SURPDN)) > > > > > + return false; > > > > > + > > > > > > > > You need an additional check for pdev->is_hotplug_bridge here. > > > > > > > > And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS. > > > > > > > > Return false if either of them isn't set. > > > > > > Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS > > > should be disabled if DPC is enabled. > > > > > > Implementation notes in 6.7.6 says that: > > > "The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug > > > Surprise bit in the Slot Capabilities Register being Set, is deprecated > > > for use with async hot-plug. DPC is the recommended mechanism for supporting > > > async hot-plug." > > > > > > Platform FW will disable the SLTCAP_HPS bit at boot time to enable async > > > hotplug on AMD devices. > > > > Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question? > > > > If it's not set, why do you get Surprise Down Errors in the first place? > > > > How do you bring down the slot without surprise-removal capability? > > Via sysfs? > > As per SPEC 6.7.6, "Either Downstream Port Containment (DPC) or the Hot-Plug > Surprise (HPS) mechanism may be used to support async removal as part of an > overall async hot-plug architecture". > > Also, the implementation notes below, it conveys that HPS is deprecated and > DPC is recommended mechanism. More details can be found in Appendix I, I.1 > Async Hot-Plug Initial Configuration: > ... > If DPC capability then, > If HPS bit not Set, use DPC > Else attempt to Clear HPS bit (§ Section 6.7.4.4 ) > If successful, use DPC > Else use HPS > ... > > So, this is "likely" a new feature support patch where DPC supports async > remove. HPS bit will be disabled by BIOS if DPC is chosen as recommended > mechanism to handle async removal. > > I see the slot is being brought down by PDC or DLLSC event, which is > triggered alongside DPC. > > pciehp_handle_presence_or_link_change() -> pciehp_disable_slot() -> > __pciehp_disable_slot() -> remove_board().. > > But I want to clear one thing, are you implying that PDC or DLLSC shouldn't > be triggered when HPS is disabled? Sorry, please ignore my advice to check PCI_EXP_SLTCAP_HPS in dpc_is_surprise_removal(). Instead, only check for pdev->is_hotplug_bridge. The rationale being that Surprise Down errors are par for the course on hotplug ports, but they're an anomaly that must be reported on non-hotplug ports. PCI_EXP_SLTCAP_HPS has no bearing on pciehp's behavior. If there's a hotplug port and hotplug control was granted to the OS, pciehp will bind to the device. pciehp will react to any DLLSC and PDC event. I think the target audience for PCIe r6.1 sec 6.7.6 are hardware designers and the advice given there is to add DPC capability to hotplug ports. That's fine. It doesn't affect how Linux handles async removal. I think the wording in the spec to "use DPC" for async removal is confusing. We don't want to duplicate the code for device de-/re-enumeration and slot bringup / bringdown in the dpc driver. We just continue letting pciehp do that (and thus retain compatibility with older, non-DPC-capable hardware). But we wire up pciehp with the dpc driver to add DPC-awareness for hotplug. One part of the equation was to ignore link changes which occur as a side effect of a DPC event from which we've successfully recovered. That was added with commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). The other part of the equation (which you're adding here) is to ignore Surprise Down errors in the dpc driver which occur as a side effect of async removal. I think that makes us compliant with PCIe r6.1 sec 6.7.6, although we're interpreting "use DPC" (or async removal) somewhat liberally. Actually I'd prefer the term "pragmatically" instead of "liberally". ;) We don't want to duplicate code just for the sake of being word-by-word compliant with the spec. If it works in the real world, it's fine. :) Thanks, Lukas
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a5d7c69b764e..78559188b9ac 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -293,10 +293,60 @@ void dpc_process_error(struct pci_dev *pdev) } } +static void pci_clear_surpdn_errors(struct pci_dev *pdev) +{ + u16 reg16; + u32 reg32; + + pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, ®32); + pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32); + + pci_read_config_word(pdev, PCI_STATUS, ®16); + pci_write_config_word(pdev, PCI_STATUS, reg16); + + pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, ®16); + pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16); +} + +static void dpc_handle_surprise_removal(struct pci_dev *pdev) +{ + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) + return; + + /* + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async + * removal might be unexpected, errors might be reported as a side + * effect of the event and software should handle them as an expected + * part of this event. + */ + pci_aer_raw_clear_status(pdev); + pci_clear_surpdn_errors(pdev); + + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, + PCI_EXP_DPC_STATUS_TRIGGER); +} + +static bool dpc_is_surprise_removal(struct pci_dev *pdev) +{ + u16 status; + + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status); + + if (!(status & PCI_ERR_UNC_SURPDN)) + return false; + + dpc_handle_surprise_removal(pdev); + + return true; +} + static irqreturn_t dpc_handler(int irq, void *context) { struct pci_dev *pdev = context; + if (dpc_is_surprise_removal(pdev)) + return IRQ_HANDLED; + dpc_process_error(pdev); /* We configure DPC so it only triggers on ERR_FATAL */
According to Section 6.7.6 of PCIe Base Specification [1], async removal with DPC and EDR may be unexpected and may result in a Surprise Down error. This error is just a side effect of hot remove. Most of the time, these errors will be abstract to the OS as current systems rely on Firmware-First model for AER and DPC, where the error handling (side effects of async remove) and other necessary HW sequencing actions is taken care by the FW and is abstract to the OS. However, FW-First model poses issues while rolling out updates or fixing bugs as the servers need to be brought down for firmware updates. Add support for async hot-plug with native AER and DPC/EDR. Here, OS is responsible for handling async add and remove along with handling of AER and DPC events which are generated as a side-effect of async remove. The implementation is as follows: On an async remove a DPC is triggered along with a Presence Detect State change. Determine it's an async remove by checking for DPC Trigger Status in DPC Status Register and Surprise Down Error Status in AER Uncorrected Error Status to be non-zero. If true, treat the DPC event as a side-effect of async remove, clear the error status registers and continue with hot-plug tear down routines. If not, follow the existing routine to handle AER and DPC errors. Dmesg before: pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 pcieport 0000:00:01.4: [ 5] SDES (First) nvme nvme2: frozen state error detected, reset controller pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec pcieport 0000:00:01.4: AER: subordinate device reset failed pcieport 0000:00:01.4: AER: device recovery failed pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme2n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 49 Dmesg after: pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme1n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 37 [1] PCI Express Base Specification Revision 6.0, Dec 16 2021. https://members.pcisig.com/wg/PCI-SIG/document/16609 Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> --- v2: Indentation is taken care. (Bjorn) Unrelevant dmesg logs are removed. (Bjorn) Rephrased commit message, to be clear on native vs FW-First handling. (Bjorn and Sathyanarayanan) Prefix changed from pciehp_ to dpc_. (Lukas) Clearing ARI and AtomicOp Requester are performed as a part of (de-)enumeration in pciehp_unconfigure_device(). (Lukas) Changed to clearing all optional capabilities in DEVCTL2. OS-First -> native. (Sathyanarayanan) Please note that, I have provided explanation why I'm not setting the Surprise Down bit in uncorrectable error mask register in AER. https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/ Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set on an async remove and will not be cleared while the device is brought down. I have included clearing them here in order to mask any kind of appearance that there was an error and as well duplicating our BIOS functionality. I can remove if its not necessary. On AMD systems we observe Presence Detect State change along with DPC event on an async remove. Hence, the errors observed are benign on AMD systems and the device will be brought down normally with PDSC. But the errors logged might confuse users. --- drivers/pci/pcie/dpc.c | 50 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)