diff mbox series

[v1,1/4] PCI: dwc: Add quirk to fix hang issue in L2 poll of suspend

Message ID 20250408065221.1941928-2-hongxing.zhu@nxp.com (mailing list archive)
State New
Delegated to: Krzysztof Wilczyński
Headers show
Series Add quirks to proceed PME handshake in DWC PM | expand

Commit Message

Richard Zhu April 8, 2025, 6:52 a.m. UTC
i.MX6QP PCIe is hang in L2 poll during suspend when one endpoint device is
connected, for example the Intel e1000e network card.

Refer to Figure5-1 Link Power Management State Flow Diagram of PCI
Express Base Spec Rev6.0. L0 can be transferred to LDn directly.

It's harmless to let dw_pcie_suspend_noirq() proceed suspend after the
PME_Turn_Off is sent out, whatever the ltssm state is in L2 or L3 on
some PME_Turn_Off handshake broken platforms.

To fix this issue, add one quirk to remove the L2 entry poll and let
dw_pcie_suspend_noirq() proceed directly after PME_Turn_Off is sent out.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
---
 .../pci/controller/dwc/pcie-designware-host.c | 22 ++++++++++---------
 drivers/pci/controller/dwc/pcie-designware.h  |  4 ++++
 2 files changed, 16 insertions(+), 10 deletions(-)

Comments

Bjorn Helgaas April 8, 2025, 2:58 p.m. UTC | #1
On Tue, Apr 08, 2025 at 02:52:18PM +0800, Richard Zhu wrote:
> i.MX6QP PCIe is hang in L2 poll during suspend when one endpoint device is
> connected, for example the Intel e1000e network card.
> 
> Refer to Figure5-1 Link Power Management State Flow Diagram of PCI
> Express Base Spec Rev6.0. L0 can be transferred to LDn directly.

Please include the section number.  Section numbers are easy to find
because they're in the spec PDF contents, but figures are not.  E.g.,
"PCIe r6.0, sec 5.2, fig 5-1"

> It's harmless to let dw_pcie_suspend_noirq() proceed suspend after the
> PME_Turn_Off is sent out, whatever the ltssm state is in L2 or L3 on
> some PME_Turn_Off handshake broken platforms.

Maybe we don't need to poll for these LTSSM states on *any* platform,
and we could just remove the poll and timeout completely?

If not, we need to explain why it is safe to skip the poll on some
platforms.  "Skipping the poll avoids a hang" is not a sufficient
explanation.

s/ltssm/LTSSM/

> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -947,7 +947,7 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>  {
>  	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
>  	u32 val;
> -	int ret;
> +	int ret = 0;
>  
>  	/*
>  	 * If L1SS is supported, then do not put the link into L2 as some
> @@ -964,15 +964,17 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>  			return ret;
>  	}
>  
> -	ret = read_poll_timeout(dw_pcie_get_ltssm, val,
> -				val == DW_PCIE_LTSSM_L2_IDLE ||
> -				val <= DW_PCIE_LTSSM_DETECT_WAIT,
> -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> -	if (ret) {
> -		/* Only log message when LTSSM isn't in DETECT or POLL */
> -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> -		return ret;
> +	if (!dwc_check_quirk(pci, QUIRK_NOL2POLL_IN_PM)) {
> +		ret = read_poll_timeout(dw_pcie_get_ltssm, val,
> +					val == DW_PCIE_LTSSM_L2_IDLE ||
> +					val <= DW_PCIE_LTSSM_DETECT_WAIT,
> +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> +		if (ret) {
> +			/* Only log message when LTSSM isn't in DETECT or POLL */
> +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> +			return ret;
> +		}
>  	}
>  
>  	/*
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index 56aafdbcdaca..05fe654d7761 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -282,6 +282,9 @@
>  /* Default eDMA LLP memory size */
>  #define DMA_LLP_MEM_SIZE		PAGE_SIZE
>  
> +#define QUIRK_NOL2POLL_IN_PM		BIT(0)
> +#define dwc_check_quirk(pci, val)	(pci->quirk_flag & val)

Maybe just my personal preference, but I don't like things named
"check" because that just means "look at"; it doesn't give any hint
about how to interpret the result of looking at it.

>  struct dw_pcie;
>  struct dw_pcie_rp;
>  struct dw_pcie_ep;
> @@ -491,6 +494,7 @@ struct dw_pcie {
>  	const struct dw_pcie_ops *ops;
>  	u32			version;
>  	u32			type;
> +	u32			quirk_flag;
>  	unsigned long		caps;
>  	int			num_lanes;
>  	int			max_link_speed;
> -- 
> 2.37.1
>
Richard Zhu April 9, 2025, 2:31 a.m. UTC | #2
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2025年4月8日 22:59
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: jingoohan1@gmail.com; Frank Li <frank.li@nxp.com>;
> l.stach@pengutronix.de; lpieralisi@kernel.org; kw@linux.com;
> manivannan.sadhasivam@linaro.org; robh@kernel.org;
> bhelgaas@google.com; shawnguo@kernel.org; s.hauer@pengutronix.de;
> kernel@pengutronix.de; festevam@gmail.com; linux-pci@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; imx@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 1/4] PCI: dwc: Add quirk to fix hang issue in L2 poll of
> suspend
> 
> On Tue, Apr 08, 2025 at 02:52:18PM +0800, Richard Zhu wrote:
> > i.MX6QP PCIe is hang in L2 poll during suspend when one endpoint
> > device is connected, for example the Intel e1000e network card.
> >
> > Refer to Figure5-1 Link Power Management State Flow Diagram of PCI
> > Express Base Spec Rev6.0. L0 can be transferred to LDn directly.
> 
> Please include the section number.  Section numbers are easy to find
> because they're in the spec PDF contents, but figures are not.  E.g., "PCIe
> r6.0, sec 5.2, fig 5-1"
> 
Okay, would add them later.

> > It's harmless to let dw_pcie_suspend_noirq() proceed suspend after the
> > PME_Turn_Off is sent out, whatever the ltssm state is in L2 or L3 on
> > some PME_Turn_Off handshake broken platforms.
> 
> Maybe we don't need to poll for these LTSSM states on *any* platform, and
> we could just remove the poll and timeout completely?
> 
Yes, I used to suggest remove the L2 poll and timeout in the following
 discussion.
https://lkml.org/lkml/2024/11/18/200
Hi Krishna:
Is it feasible to eliminate the L2 poll and timeout in this context?

> If not, we need to explain why it is safe to skip the poll on some platforms.
> "Skipping the poll avoids a hang" is not a sufficient explanation.
> 
> s/ltssm/LTSSM/
Okay.
> 
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -947,7 +947,7 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)  {
> >  	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
> >  	u32 val;
> > -	int ret;
> > +	int ret = 0;
> >
> >  	/*
> >  	 * If L1SS is supported, then do not put the link into L2 as some @@
> > -964,15 +964,17 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> >  			return ret;
> >  	}
> >
> > -	ret = read_poll_timeout(dw_pcie_get_ltssm, val,
> > -				val == DW_PCIE_LTSSM_L2_IDLE ||
> > -				val <= DW_PCIE_LTSSM_DETECT_WAIT,
> > -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> > -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > -	if (ret) {
> > -		/* Only log message when LTSSM isn't in DETECT or POLL */
> > -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n",
> val);
> > -		return ret;
> > +	if (!dwc_check_quirk(pci, QUIRK_NOL2POLL_IN_PM)) {
> > +		ret = read_poll_timeout(dw_pcie_get_ltssm, val,
> > +					val == DW_PCIE_LTSSM_L2_IDLE ||
> > +					val <= DW_PCIE_LTSSM_DETECT_WAIT,
> > +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> > +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > +		if (ret) {
> > +			/* Only log message when LTSSM isn't in DETECT or POLL */
> > +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM:
> 0x%x\n", val);
> > +			return ret;
> > +		}
> >  	}
> >
> >  	/*
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h
> > b/drivers/pci/controller/dwc/pcie-designware.h
> > index 56aafdbcdaca..05fe654d7761 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -282,6 +282,9 @@
> >  /* Default eDMA LLP memory size */
> >  #define DMA_LLP_MEM_SIZE		PAGE_SIZE
> >
> > +#define QUIRK_NOL2POLL_IN_PM		BIT(0)
> > +#define dwc_check_quirk(pci, val)	(pci->quirk_flag & val)
> 
> Maybe just my personal preference, but I don't like things named "check"
> because that just means "look at"; it doesn't give any hint about how to
> interpret the result of looking at it.
> 
How about dwc_match_quirk(pci, val) (pci->quirk_flag & val)?

Best Regards
Richard Zhu
> >  struct dw_pcie;
> >  struct dw_pcie_rp;
> >  struct dw_pcie_ep;
> > @@ -491,6 +494,7 @@ struct dw_pcie {
> >  	const struct dw_pcie_ops *ops;
> >  	u32			version;
> >  	u32			type;
> > +	u32			quirk_flag;
> >  	unsigned long		caps;
> >  	int			num_lanes;
> >  	int			max_link_speed;
> > --
> > 2.37.1
> >
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index ecc33f6789e3..0817df5b8a59 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -947,7 +947,7 @@  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 {
 	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
 	u32 val;
-	int ret;
+	int ret = 0;
 
 	/*
 	 * If L1SS is supported, then do not put the link into L2 as some
@@ -964,15 +964,17 @@  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 			return ret;
 	}
 
-	ret = read_poll_timeout(dw_pcie_get_ltssm, val,
-				val == DW_PCIE_LTSSM_L2_IDLE ||
-				val <= DW_PCIE_LTSSM_DETECT_WAIT,
-				PCIE_PME_TO_L2_TIMEOUT_US/10,
-				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
-	if (ret) {
-		/* Only log message when LTSSM isn't in DETECT or POLL */
-		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
-		return ret;
+	if (!dwc_check_quirk(pci, QUIRK_NOL2POLL_IN_PM)) {
+		ret = read_poll_timeout(dw_pcie_get_ltssm, val,
+					val == DW_PCIE_LTSSM_L2_IDLE ||
+					val <= DW_PCIE_LTSSM_DETECT_WAIT,
+					PCIE_PME_TO_L2_TIMEOUT_US/10,
+					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
+		if (ret) {
+			/* Only log message when LTSSM isn't in DETECT or POLL */
+			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
+			return ret;
+		}
 	}
 
 	/*
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 56aafdbcdaca..05fe654d7761 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -282,6 +282,9 @@ 
 /* Default eDMA LLP memory size */
 #define DMA_LLP_MEM_SIZE		PAGE_SIZE
 
+#define QUIRK_NOL2POLL_IN_PM		BIT(0)
+#define dwc_check_quirk(pci, val)	(pci->quirk_flag & val)
+
 struct dw_pcie;
 struct dw_pcie_rp;
 struct dw_pcie_ep;
@@ -491,6 +494,7 @@  struct dw_pcie {
 	const struct dw_pcie_ops *ops;
 	u32			version;
 	u32			type;
+	u32			quirk_flag;
 	unsigned long		caps;
 	int			num_lanes;
 	int			max_link_speed;