Message ID | 1721628913-1449-1-git-send-email-hongxing.zhu@nxp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] PCI: dwc: Fix resume failure if no EP is connected at some platforms | expand |
Hello, > The dw_pcie_suspend_noirq() function currently returns success directly > if no endpoint (EP) device is connected. However, on some platforms, power > loss occurs during suspend, causing dw_resume() to do nothing in this case. > This results in a system halt because the DWC controller is not initialized > after power-on during resume. > > Change call to deinit() in suspend and init() at resume regardless of > whether there are EP device connections or not. It is not harmful to > perform deinit() and init() again for the no power-off case, and it keeps > the code simple and consistent in logic. Applied to controller/dwc, thank you! [01/01] PCI: dwc: Fix resume failure if no EP is connected at some platforms https://git.kernel.org/pci/pci/c/ec008c493c25 Krzysztof
On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > The dw_pcie_suspend_noirq() function currently returns success directly > if no endpoint (EP) device is connected. However, on some platforms, power > loss occurs during suspend, causing dw_resume() to do nothing in this case. > This results in a system halt because the DWC controller is not initialized > after power-on during resume. dw_resume() doesn't exist. What function did you mean? System halt? In dw_pcie_resume_noirq()? What causes the halt? A NULL pointer dereference? A CPU hang because a read of some controller register never completes? Feels a little hand-wavy. Another comment below. > Change call to deinit() in suspend and init() at resume regardless of > whether there are EP device connections or not. It is not harmful to > perform deinit() and init() again for the no power-off case, and it keeps > the code simple and consistent in logic. > > Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality") > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> > Reviewed-by: Frank Li <Frank.Li@nxp.com> > --- > .../pci/controller/dwc/pcie-designware-host.c | 30 +++++++++---------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > index a0822d5371bc5..cb8c3c2bcc790 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci) > if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1) > return 0; > > - if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT) > - return 0; > - > - if (pci->pp.ops->pme_turn_off) > - pci->pp.ops->pme_turn_off(&pci->pp); > - else > - ret = dw_pcie_pme_turn_off(pci); > + if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > + /* Only send out PME_TURN_OFF when PCIE link is up */ > + if (pci->pp.ops->pme_turn_off) > + pci->pp.ops->pme_turn_off(&pci->pp); > + else > + ret = dw_pcie_pme_turn_off(pci); This looks possibly racy since the link can go down at any point. > - if (ret) > - return ret; > + if (ret) > + return ret; > > - ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, > - PCIE_PME_TO_L2_TIMEOUT_US/10, > - PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > - if (ret) { > - dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); > - return ret; > + ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, > + PCIE_PME_TO_L2_TIMEOUT_US/10, > + PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > + if (ret) { > + dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); > + return ret; > + } > } > > if (pci->pp.ops->deinit) > -- > 2.37.1 >
On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > The dw_pcie_suspend_noirq() function currently returns success directly > if no endpoint (EP) device is connected. However, on some platforms, power > loss occurs during suspend, causing dw_resume() to do nothing in this case. > This results in a system halt because the DWC controller is not initialized > after power-on during resume. > > Change call to deinit() in suspend and init() at resume regardless of > whether there are EP device connections or not. It is not harmful to > perform deinit() and init() again for the no power-off case, and it keeps > the code simple and consistent in logic. > ... > - ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, > - PCIE_PME_TO_L2_TIMEOUT_US/10, > - PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > - if (ret) { > - dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); > - return ret; > + ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, > + PCIE_PME_TO_L2_TIMEOUT_US/10, > + PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > + if (ret) { > + dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); > + return ret; > + } Not related to this patch, but what's the reason for waiting for the link to enter L2? There are a few other drivers that do this, but most don't. Is there something else the driver needs to do after the link is in L2? > } > > if (pci->pp.ops->deinit) > -- > 2.37.1 >
> -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: 2024年11月6日 7:27 > To: Hongxing Zhu <hongxing.zhu@nxp.com> > Cc: kwilczynski@kernel.org; bhelgaas@google.com; > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org; > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at > some platforms > > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > > The dw_pcie_suspend_noirq() function currently returns success > > directly if no endpoint (EP) device is connected. However, on some > > platforms, power loss occurs during suspend, causing dw_resume() to do > nothing in this case. > > This results in a system halt because the DWC controller is not > > initialized after power-on during resume. > > dw_resume() doesn't exist. What function did you mean? Actually, it is dw_pcie_resume_noirq() > > System halt? In dw_pcie_resume_noirq()? What causes the halt? A > NULL pointer dereference? A CPU hang because a read of some controller > register never completes? Feels a little hand-wavy. When no endpoint(EP) device is connected. Power loss occurs during suspend, then the controllers isn't a ready stat anymore. Since dw_pcie_suspend_noirq() return directly with success, dw_pcie_resume_noirq() would assume that the controller is still ready, and wouldn't re-initialized the controller. At end, there would be a halt when driver accesses controller's registers. > > Another comment below. > > > Change call to deinit() in suspend and init() at resume regardless of > > whether there are EP device connections or not. It is not harmful to > > perform deinit() and init() again for the no power-off case, and it > > keeps the code simple and consistent in logic. > > > > Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume > > functionality") > > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> > > Reviewed-by: Frank Li <Frank.Li@nxp.com> > > --- > > .../pci/controller/dwc/pcie-designware-host.c | 30 > > +++++++++---------- > > 1 file changed, 15 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c > > b/drivers/pci/controller/dwc/pcie-designware-host.c > > index a0822d5371bc5..cb8c3c2bcc790 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci) > > if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & > PCI_EXP_LNKCTL_ASPM_L1) > > return 0; > > > > - if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT) > > - return 0; > > - > > - if (pci->pp.ops->pme_turn_off) > > - pci->pp.ops->pme_turn_off(&pci->pp); > > - else > > - ret = dw_pcie_pme_turn_off(pci); > > + if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > > + /* Only send out PME_TURN_OFF when PCIE link is up */ > > + if (pci->pp.ops->pme_turn_off) > > + pci->pp.ops->pme_turn_off(&pci->pp); > > + else > > + ret = dw_pcie_pme_turn_off(pci); > > This looks possibly racy since the link can go down at any point. > When link is down and without this commit changes, dw_pcie_suspend_noirq() return directly, and the PME_TURN_OFF wouldn't be kicked off. I change the behavior to issue the PME_TURN_OFF when link is up here. Best Regards Richard Zhu > > - if (ret) > > - return ret; > > + if (ret) > > + return ret; > > > > - ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == > DW_PCIE_LTSSM_L2_IDLE, > > - PCIE_PME_TO_L2_TIMEOUT_US/10, > > - PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > > - if (ret) { > > - dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", > val); > > - return ret; > > + ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == > DW_PCIE_LTSSM_L2_IDLE, > > + PCIE_PME_TO_L2_TIMEOUT_US/10, > > + PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > > + if (ret) { > > + dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: > 0x%x\n", val); > > + return ret; > > + } > > } > > > > if (pci->pp.ops->deinit) > > -- > > 2.37.1 > >
> -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: 2024年11月6日 7:35 > To: Hongxing Zhu <hongxing.zhu@nxp.com> > Cc: kwilczynski@kernel.org; bhelgaas@google.com; > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org; > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at > some platforms > > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > > The dw_pcie_suspend_noirq() function currently returns success > > directly if no endpoint (EP) device is connected. However, on some > > platforms, power loss occurs during suspend, causing dw_resume() to do > nothing in this case. > > This results in a system halt because the DWC controller is not > > initialized after power-on during resume. > > > > Change call to deinit() in suspend and init() at resume regardless of > > whether there are EP device connections or not. It is not harmful to > > perform deinit() and init() again for the no power-off case, and it > > keeps the code simple and consistent in logic. > > ... > > > - ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == > DW_PCIE_LTSSM_L2_IDLE, > > - PCIE_PME_TO_L2_TIMEOUT_US/10, > > - PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > > - if (ret) { > > - dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", > val); > > - return ret; > > + ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == > DW_PCIE_LTSSM_L2_IDLE, > > + PCIE_PME_TO_L2_TIMEOUT_US/10, > > + PCIE_PME_TO_L2_TIMEOUT_US, false, pci); > > + if (ret) { > > + dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: > 0x%x\n", val); > > + return ret; > > + } > > Not related to this patch, but what's the reason for waiting for the link to > enter L2? There are a few other drivers that do this, but most don't. Is > there something else the driver needs to do after the link is in L2? > Agree with you, I used to suggest Frank to remove the L2 polling too. Best Regards Richard Zhu > > } > > > > if (pci->pp.ops->deinit) > > -- > > 2.37.1 > >
Hello, > > Applied to controller/dwc, thank you! > > > > [01/01] PCI: dwc: Fix resume failure if no EP is connected at some platforms > > Hi Krzysztof: > Thanks for your pick up. > I combine this dwc bug fix with the other one. > Can you help to replace this commit by the following series? > https://lkml.org/lkml/2024/10/10/240 Sure thing. So, the following: - https://lore.kernel.org/linux-pci/1721628913-1449-1-git-send-email-hongxing.zhu@nxp.com has been replaced with the following: - https://lore.kernel.org/linux-pci/1728539269-1861-1-git-send-email-hongxing.zhu@nxp.com I took the entire series (consists of two patches). But let me know if you want me to drop the following, which is the second patch: PCI: dwc: Always stop link in the dw_pcie_suspend_noirq() https://git.kernel.org/pci/pci/c/f40d59f309db Also, have a look at the changes to see if everything looks correct. Thank you! Krzysztof
On Wed, Nov 06, 2024 at 01:59:41AM +0000, Hongxing Zhu wrote: > > -----Original Message----- > > From: Bjorn Helgaas <helgaas@kernel.org> > > Sent: 2024年11月6日 7:27 > > To: Hongxing Zhu <hongxing.zhu@nxp.com> > > Cc: kwilczynski@kernel.org; bhelgaas@google.com; > > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org; > > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > > linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev > > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at > > some platforms > > > > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > > > The dw_pcie_suspend_noirq() function currently returns success > > > directly if no endpoint (EP) device is connected. However, on some > > > platforms, power loss occurs during suspend, causing dw_resume() to do > > > nothing in this case. > > > This results in a system halt because the DWC controller is not > > > initialized after power-on during resume. > > > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci) > > > if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & > > PCI_EXP_LNKCTL_ASPM_L1) > > > return 0; > > > > > > - if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT) > > > - return 0; > > > - > > > - if (pci->pp.ops->pme_turn_off) > > > - pci->pp.ops->pme_turn_off(&pci->pp); > > > - else > > > - ret = dw_pcie_pme_turn_off(pci); > > > + if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > > > + /* Only send out PME_TURN_OFF when PCIE link is up */ > > > + if (pci->pp.ops->pme_turn_off) > > > + pci->pp.ops->pme_turn_off(&pci->pp); > > > + else > > > + ret = dw_pcie_pme_turn_off(pci); > > > > This looks possibly racy since the link can go down at any point. > > When link is down and without this commit changes, > dw_pcie_suspend_noirq() return directly, and the PME_TURN_OFF > wouldn't be kicked off. Right, that's the code change. > I change the behavior to issue the PME_TURN_OFF when link is up > here. But I don't think you responded to the race question. What happens here? if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { --> link goes down here <-- pci->pp.ops->pme_turn_off(&pci->pp); You decide the LTSSM is active and the link is up. Then the link goes down. Then you send PME_Turn_off. Now what? If it's safe to try to send PME_Turn_off regardless of whether the link is up or down, there would be no need to test the LTSSM state. Bjorn
> -----Original Message----- > From: Krzysztof Wilczy��ski <kwilczynski@kernel.org> > Sent: 2024年11月6日 23:07 > To: Hongxing Zhu <hongxing.zhu@nxp.com> > Cc: bhelgaas@google.com; lorenzo.pieralisi@arm.com; Frank Li > <frank.li@nxp.com>; mani@kernel.org; linux-pci@vger.kernel.org; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; > kernel@pengutronix.de; imx@lists.linux.dev > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at > some platforms > > Hello, > > > > Applied to controller/dwc, thank you! > > > > > > [01/01] PCI: dwc: Fix resume failure if no EP is connected at some > > > platforms > > > > Hi Krzysztof: > > Thanks for your pick up. > > I combine this dwc bug fix with the other one. > > Can you help to replace this commit by the following series? > > https://lkml/ > > .org%2Flkml%2F2024%2F10%2F10%2F240&data=05%7C02%7Chongxing.zh > u%40nxp.c > > > om%7C172fa64841a2446056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd9 > 9c5c3016 > > > 35%7C0%7C0%7C638665024412056669%7CUnknown%7CTWFpbGZsb3d8ey > JFbXB0eU1hcG > > > kiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIj > oy > > > fQ%3D%3D%7C0%7C%7C%7C&sdata=rGeK%2F70o1PIBMF%2FtzkQGssALklG > Cz8YDtK8efq > > R2EIc%3D&reserved=0 > > Sure thing. So, the following: > > - > https://lore.ke/ > rnel.org%2Flinux-pci%2F1721628913-1449-1-git-send-email-hongxing.zhu%4 > 0nxp.com&data=05%7C02%7Chongxing.zhu%40nxp.com%7C172fa64841a24 > 46056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0 > %7C638665024412095419%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hc > GkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU > IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=QLO7tl0zCYAjkejuh%2Fj635rAEvyix > x8BMkwuG6weW4Y%3D&reserved=0 > > has been replaced with the following: > > - > https://lore.ke/ > rnel.org%2Flinux-pci%2F1728539269-1861-1-git-send-email-hongxing.zhu%4 > 0nxp.com&data=05%7C02%7Chongxing.zhu%40nxp.com%7C172fa64841a24 > 46056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0 > %7C638665024412113023%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hc > GkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU > IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=grgmxy7UZZoC4ePUEzWYUyrZj7Gh > EdPwhJWdq5Tjrg4%3D&reserved=0 > > I took the entire series (consists of two patches). But let me know if you > want me to drop the following, which is the second patch: > > PCI: dwc: Always stop link in the dw_pcie_suspend_noirq() > > https://git.ker/ > nel.org%2Fpci%2Fpci%2Fc%2Ff40d59f309db&data=05%7C02%7Chongxing.zh > u%40nxp.com%7C172fa64841a2446056b008dcfe74b532%7C686ea1d3bc2b4 > c6fa92cd99c5c301635%7C0%7C0%7C638665024412129130%7CUnknown% > 7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiO > iJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=j > aMe7XdDTDMn%2BpGg54S%2BOqruL%2FN4x%2BVLAgQaYsCuUrg%3D&rese > rved=0 > > Also, have a look at the changes to see if everything looks correct. Everything is fine. Thanks a lot for your kindly help to pick up the entire series. Best Regards Richard Zhu > > Thank you! > > Krzysztof
> -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: 2024年11月7日 6:30 > To: Hongxing Zhu <hongxing.zhu@nxp.com> > Cc: kwilczynski@kernel.org; bhelgaas@google.com; > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org; > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at > some platforms > > On Wed, Nov 06, 2024 at 01:59:41AM +0000, Hongxing Zhu wrote: > > > -----Original Message----- > > > From: Bjorn Helgaas <helgaas@kernel.org> > > > Sent: 2024年11月6日 7:27 > > > To: Hongxing Zhu <hongxing.zhu@nxp.com> > > > Cc: kwilczynski@kernel.org; bhelgaas@google.com; > > > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; > > > mani@kernel.org; linux-pci@vger.kernel.org; > > > linux-arm-kernel@lists.infradead.org; > > > linux-kernel@vger.kernel.org; kernel@pengutronix.de; > > > imx@lists.linux.dev > > > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is > > > connected at some platforms > > > > > > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote: > > > > The dw_pcie_suspend_noirq() function currently returns success > > > > directly if no endpoint (EP) device is connected. However, on some > > > > platforms, power loss occurs during suspend, causing dw_resume() > > > > to do nothing in this case. > > > > This results in a system halt because the DWC controller is not > > > > initialized after power-on during resume. > > > > > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie > *pci) > > > > if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & > > > PCI_EXP_LNKCTL_ASPM_L1) > > > > return 0; > > > > > > > > - if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT) > > > > - return 0; > > > > - > > > > - if (pci->pp.ops->pme_turn_off) > > > > - pci->pp.ops->pme_turn_off(&pci->pp); > > > > - else > > > > - ret = dw_pcie_pme_turn_off(pci); > > > > + if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > > > > + /* Only send out PME_TURN_OFF when PCIE link is up */ > > > > + if (pci->pp.ops->pme_turn_off) > > > > + pci->pp.ops->pme_turn_off(&pci->pp); > > > > + else > > > > + ret = dw_pcie_pme_turn_off(pci); > > > > > > This looks possibly racy since the link can go down at any point. > > > > When link is down and without this commit changes, > > dw_pcie_suspend_noirq() return directly, and the PME_TURN_OFF wouldn't > > be kicked off. > > Right, that's the code change. > > > I change the behavior to issue the PME_TURN_OFF when link is up here. > > But I don't think you responded to the race question. What happens here? > > if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > --> link goes down here <-- > pci->pp.ops->pme_turn_off(&pci->pp); > > You decide the LTSSM is active and the link is up. Then the link goes down. > Then you send PME_Turn_off. Now what? > > If it's safe to try to send PME_Turn_off regardless of whether the link is up or > down, there would be no need to test the LTSSM state. I made a local tests on i.MX95/i.MX8QM/i.MX8MP/i.MX8MM/i.MX8MQ, and confirm that it's safe to send PME_TURN_OFF although the link is down. You're right. It's no need to test LTSSM state here. So do the L2 poll listed above after PME_TURN_OFF is sent. Since the 6.13 merge window is almost closed. How about I prepare another Fix patch to do the following items for incoming 6.14? - Before sending PME_TURN_OFF, don't test the LTSSM stat. - Remove the L2 stat poll, after sending PME_TURN_OFF. Best Regards Richard Zhu > > Bjorn
Hello, > > But I don't think you responded to the race question. What happens here? > > > > if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { > > --> link goes down here <-- > > pci->pp.ops->pme_turn_off(&pci->pp); > > > > You decide the LTSSM is active and the link is up. Then the link goes down. > > Then you send PME_Turn_off. Now what? > > > > If it's safe to try to send PME_Turn_off regardless of whether the link is up or > > down, there would be no need to test the LTSSM state. > I made a local tests on i.MX95/i.MX8QM/i.MX8MP/i.MX8MM/i.MX8MQ, and > confirm that it's safe to send PME_TURN_OFF although the link is down. > You're right. It's no need to test LTSSM state here. > So do the L2 poll listed above after PME_TURN_OFF is sent. > > Since the 6.13 merge window is almost closed. > How about I prepare another Fix patch to do the following items for incoming > 6.14? > - Before sending PME_TURN_OFF, don't test the LTSSM stat. > - Remove the L2 stat poll, after sending PME_TURN_OFF. If the changes aren't too involved, then I would rather fix it or drop the not needed code now, before we sent the Pull Request. So, feel free to sent a small patch against the current branch, or simply let me know how do you wish the current code to be changed, so I can do it against the current branch. Thank you! Krzysztof
Hello, [...] > > > If the changes aren't too involved, then I would rather fix it or drop > > > the not needed code now, before we sent the Pull Request. > > > > > > So, feel free to sent a small patch against the current branch, or > > > simply let me know how do you wish the current code to be changed, so > > > I can do it against the current branch. > > Thanks for your kindly reminder. > > This clean up small patch is on the way. > Here it is. > https://lkml.org/lkml/2024/11/7/409 Thank you! That said, there here have been some concerns raised following a review of the new patch, see: - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com Hence, I wonder whether we should drop this patch and then focus on refinements to the new version, and perhaps, once its ready, then we will include it—this might have to be for the next release at this point, sadly. Thoughts? Krzysztof
On Fri, Nov 08, 2024 at 01:30:58AM +0900, Krzysztof Wilczy��ski wrote: > Hello, > > [...] > > > > If the changes aren't too involved, then I would rather fix it or drop > > > > the not needed code now, before we sent the Pull Request. > > > > > > > > So, feel free to sent a small patch against the current branch, or > > > > simply let me know how do you wish the current code to be changed, so > > > > I can do it against the current branch. > > > Thanks for your kindly reminder. > > > This clean up small patch is on the way. > > Here it is. > > https://lkml.org/lkml/2024/11/7/409 > > Thank you! > > That said, there here have been some concerns raised following a review of > the new patch, see: > > - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com > > Hence, I wonder whether we should drop this patch and then focus on > refinements to the new version, and perhaps, once its ready, then we > will include it—this might have to be for the next release at this > point, sadly. I think we can continue discussion at refine patch. Add kept this patch because it really fix some important problem. Refine patch is just make it better. Frank > > Thoughts? > > Krzysztof
Hello, [...] > > > > > If the changes aren't too involved, then I would rather fix it or drop > > > > > the not needed code now, before we sent the Pull Request. > > > > > > > > > > So, feel free to sent a small patch against the current branch, or > > > > > simply let me know how do you wish the current code to be changed, so > > > > > I can do it against the current branch. > > > > Thanks for your kindly reminder. > > > > This clean up small patch is on the way. > > > Here it is. > > > https://lkml.org/lkml/2024/11/7/409 > > > > Thank you! > > > > That said, there here have been some concerns raised following a review of > > the new patch, see: > > > > - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com > > > > Hence, I wonder whether we should drop this patch and then focus on > > refinements to the new version, and perhaps, once its ready, then we > > will include it—this might have to be for the next release at this > > point, sadly. > > I think we can continue discussion at refine patch. Add kept this patch > because it really fix some important problem. Refine patch is just make it > better. Sounds good! Thank you! Krzysztof
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index a0822d5371bc5..cb8c3c2bcc790 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci) if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1) return 0; - if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT) - return 0; - - if (pci->pp.ops->pme_turn_off) - pci->pp.ops->pme_turn_off(&pci->pp); - else - ret = dw_pcie_pme_turn_off(pci); + if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) { + /* Only send out PME_TURN_OFF when PCIE link is up */ + if (pci->pp.ops->pme_turn_off) + pci->pp.ops->pme_turn_off(&pci->pp); + else + ret = dw_pcie_pme_turn_off(pci); - if (ret) - return ret; + if (ret) + return ret; - ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, - PCIE_PME_TO_L2_TIMEOUT_US/10, - PCIE_PME_TO_L2_TIMEOUT_US, false, pci); - if (ret) { - dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); - return ret; + ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE, + PCIE_PME_TO_L2_TIMEOUT_US/10, + PCIE_PME_TO_L2_TIMEOUT_US, false, pci); + if (ret) { + dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val); + return ret; + } } if (pci->pp.ops->deinit)