diff mbox series

[v3] PCI: Mask replay timer timeout of GL975x's rootport

Message ID 20240327024509.1071189-1-kai.heng.feng@canonical.com (mailing list archive)
State Accepted
Commit eeee3b5e6d0bf331befa57b4dcb079f827bcd829
Headers show
Series [v3] PCI: Mask replay timer timeout of GL975x's rootport | expand

Commit Message

Kai-Heng Feng March 27, 2024, 2:45 a.m. UTC
Any access to GL975x's config space, like `lspci -vv` or
pci_save_state(), can still trigger the replay timer timeout error even
after commit 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the replay
timer timeout of AER"), albeit with a lower reproduce rate.

The AER interrupt can prevent the system from suspending, or can flood
the kernel message. So mask the replay timer timeout to resolve the
issue.

Cc: Victor Shih <victor.shih@genesyslogic.com.tw>
Cc: Ben Chuang <benchuanggli@gmail.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/pci/quirks.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Bjorn Helgaas March 27, 2024, 3:23 p.m. UTC | #1
On Wed, Mar 27, 2024 at 10:45:09AM +0800, Kai-Heng Feng wrote:
> Any access to GL975x's config space, like `lspci -vv` or
> pci_save_state(), can still trigger the replay timer timeout error even
> after commit 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the replay
> timer timeout of AER"), albeit with a lower reproduce rate.
> 
> The AER interrupt can prevent the system from suspending, or can flood
> the kernel message. So mask the replay timer timeout to resolve the
> issue.
> 
> Cc: Victor Shih <victor.shih@genesyslogic.com.tw>
> Cc: Ben Chuang <benchuanggli@gmail.com>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

Applied as below to pci/aer for v6.10, thanks!

commit eeee3b5e6d0b ("PCI: Mask Replay Timer Timeout errors for Genesys GL975x SD host controller")
Author: Kai-Heng Feng <kai.heng.feng@canonical.com>
Date:   Wed Mar 27 10:45:09 2024 +0800

    PCI: Mask Replay Timer Timeout errors for Genesys GL975x SD host controller
    
    Due to a hardware defect in GL975x, config accesses when ASPM is enabled
    frequently cause Replay Timer Timeouts in the Port leading to the device.
    
    These are Correctable Errors, so the Downstream Port logs it in its AER
    Correctable Error Status register and, when the error is not masked, sends
    an ERR_COR message upstream.  The message terminates at a Root Port, which
    may generate an AER interrupt so the OS can log it.
    
    The Correctable Error logging is an annoyance but not a major issue itself.
    But when the AER interrupt happens during suspend, it can prevent the
    system from suspending.
    
    015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the replay timer timeout of
    AER") masked these errors in the GL975x itself.
    
    Mask these errors in the Port leading to GL975x as well.  Note that Replay
    Timer Timeouts will still be logged in the AER Correctable Error Status
    register, but they will not cause AER interrupts.
    
    Link: https://lore.kernel.org/r/20240327024509.1071189-1-kai.heng.feng@canonical.com
    Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    [bhelgaas: commit log, update dmesg note]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Victor Shih <victor.shih@genesyslogic.com.tw>
    Cc: Ben Chuang <benchuanggli@gmail.com>


diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index bf4833221816..5cb0f7fae3b8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6261,3 +6261,23 @@ static void pci_fixup_d3cold_delay_1sec(struct pci_dev *pdev)
 	pdev->d3cold_delay = 1000;
 }
 DECLARE_PCI_FIXUP_FINAL(0x5555, 0x0004, pci_fixup_d3cold_delay_1sec);
+
+#ifdef CONFIG_PCIEAER
+static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
+{
+	struct pci_dev *parent = pci_upstream_bridge(pdev);
+	u32 val;
+
+	if (!parent || !parent->aer_cap)
+		return;
+
+	pci_info(parent, "mask Replay Timer Timeout Correctable Errors due to %s hardware defect",
+		 pci_name(pdev));
+
+	pci_read_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, &val);
+	val |= PCI_ERR_COR_REP_TIMER;
+	pci_write_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, val);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
+#endif
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ea476252280a..7ad7141e1c54 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6223,3 +6223,22 @@  static void pci_fixup_d3cold_delay_1sec(struct pci_dev *pdev)
 	pdev->d3cold_delay = 1000;
 }
 DECLARE_PCI_FIXUP_FINAL(0x5555, 0x0004, pci_fixup_d3cold_delay_1sec);
+
+#ifdef CONFIG_PCIEAER
+static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
+{
+	struct pci_dev *parent = pci_upstream_bridge(pdev);
+	u32 val;
+
+	if (!parent || !parent->aer_cap)
+		return;
+
+	pci_info(pdev, "Mask AER due to hardware defect");
+
+	pci_read_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, &val);
+	val |= PCI_ERR_COR_REP_TIMER;
+	pci_write_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, val);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
+#endif