mbox series

[v1,0/3] PCI: imx6: reset link after suspend/resume

Message ID 20240819090428.17349-1-eichest@gmail.com (mailing list archive)
Headers show
Series PCI: imx6: reset link after suspend/resume | expand

Message

Stefan Eichenberger Aug. 19, 2024, 9:03 a.m. UTC
On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
at all, so the system freezes when a PCIe driver tries to access its I/O
space. The only way to get resume working again is to reset the PCIe
link, similar to what is done on devices that support suspend/resume.
Through trial and error, we found that something about the PCIe
reference clock does not work as expected after a resume. We could not
figure out if it is disabled (even though the registers still say it is
enabled), or if it is somehow unstable or has some hiccups. With the
workaround introduced in this patch series, we were able to fully resume
a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
Wifi module. If there is a better way or other ideas on how to fix this
problem, please let us know. We are aware that resetting the link should
not be necessary, but we could not find a better solution. More
interestingly, even the SoCs that support suspend/resume according to
the i.MX erratas seem to reset the link on resume in
imx6_pcie_host_init, so we hope this might be a valid workaround.

Stefan Eichenberger (3):
  PCI: imx6: Add a function to deassert the reset gpio
  PCI: imx6: move the wait for clock stabilization to enable ref clk
  PCI: imx6: reset link on resume

 drivers/pci/controller/dwc/pci-imx6.c | 69 +++++++++++++++++++++++----
 1 file changed, 59 insertions(+), 10 deletions(-)

Comments

Frank Li Aug. 19, 2024, 2:30 p.m. UTC | #1
On Mon, Aug 19, 2024 at 11:03:16AM +0200, Stefan Eichenberger wrote:
> On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
> suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
> at all, so the system freezes when a PCIe driver tries to access its I/O
> space. The only way to get resume working again is to reset the PCIe
> link, similar to what is done on devices that support suspend/resume.
> Through trial and error, we found that something about the PCIe
> reference clock does not work as expected after a resume. We could not
> figure out if it is disabled (even though the registers still say it is
> enabled), or if it is somehow unstable or has some hiccups. With the
> workaround introduced in this patch series, we were able to fully resume
> a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
> Wifi module. If there is a better way or other ideas on how to fix this
> problem, please let us know. We are aware that resetting the link should
> not be necessary, but we could not find a better solution. More
> interestingly, even the SoCs that support suspend/resume according to
> the i.MX erratas seem to reset the link on resume in
> imx6_pcie_host_init, so we hope this might be a valid workaround.
>
> Stefan Eichenberger (3):
>   PCI: imx6: Add a function to deassert the reset gpio
>   PCI: imx6: move the wait for clock stabilization to enable ref clk
>   PCI: imx6: reset link on resume

Thanks you for your patch, but it may have conflict with
https://lore.kernel.org/linux-pci/Zr4XG6r+HnbIlu8S@lizhi-Precision-Tower-5810/T/#t


>
>  drivers/pci/controller/dwc/pci-imx6.c | 69 +++++++++++++++++++++++----
>  1 file changed, 59 insertions(+), 10 deletions(-)
>
> --
> 2.43.0
>
Stefan Eichenberger Aug. 20, 2024, 6:52 a.m. UTC | #2
On Mon, Aug 19, 2024 at 10:30:38AM -0400, Frank Li wrote:
> On Mon, Aug 19, 2024 at 11:03:16AM +0200, Stefan Eichenberger wrote:
> > On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
> > suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
> > at all, so the system freezes when a PCIe driver tries to access its I/O
> > space. The only way to get resume working again is to reset the PCIe
> > link, similar to what is done on devices that support suspend/resume.
> > Through trial and error, we found that something about the PCIe
> > reference clock does not work as expected after a resume. We could not
> > figure out if it is disabled (even though the registers still say it is
> > enabled), or if it is somehow unstable or has some hiccups. With the
> > workaround introduced in this patch series, we were able to fully resume
> > a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
> > Wifi module. If there is a better way or other ideas on how to fix this
> > problem, please let us know. We are aware that resetting the link should
> > not be necessary, but we could not find a better solution. More
> > interestingly, even the SoCs that support suspend/resume according to
> > the i.MX erratas seem to reset the link on resume in
> > imx6_pcie_host_init, so we hope this might be a valid workaround.
> >
> > Stefan Eichenberger (3):
> >   PCI: imx6: Add a function to deassert the reset gpio
> >   PCI: imx6: move the wait for clock stabilization to enable ref clk
> >   PCI: imx6: reset link on resume
> 
> Thanks you for your patch, but it may have conflict with
> https://lore.kernel.org/linux-pci/Zr4XG6r+HnbIlu8S@lizhi-Precision-Tower-5810/T/#t
> 

Thanks a lot for the hint. I will have a look at the series and see if I
can adapt my changes including your suggestions.

Regards,
Stefan
Stefan Eichenberger Aug. 21, 2024, 3:41 p.m. UTC | #3
Hi Frank,

On Tue, Aug 20, 2024 at 08:52:41AM +0200, Stefan Eichenberger wrote:
> On Mon, Aug 19, 2024 at 10:30:38AM -0400, Frank Li wrote:
> > On Mon, Aug 19, 2024 at 11:03:16AM +0200, Stefan Eichenberger wrote:
> > > On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
> > > suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
> > > at all, so the system freezes when a PCIe driver tries to access its I/O
> > > space. The only way to get resume working again is to reset the PCIe
> > > link, similar to what is done on devices that support suspend/resume.
> > > Through trial and error, we found that something about the PCIe
> > > reference clock does not work as expected after a resume. We could not
> > > figure out if it is disabled (even though the registers still say it is
> > > enabled), or if it is somehow unstable or has some hiccups. With the
> > > workaround introduced in this patch series, we were able to fully resume
> > > a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
> > > Wifi module. If there is a better way or other ideas on how to fix this
> > > problem, please let us know. We are aware that resetting the link should
> > > not be necessary, but we could not find a better solution. More
> > > interestingly, even the SoCs that support suspend/resume according to
> > > the i.MX erratas seem to reset the link on resume in
> > > imx6_pcie_host_init, so we hope this might be a valid workaround.
> > >
> > > Stefan Eichenberger (3):
> > >   PCI: imx6: Add a function to deassert the reset gpio
> > >   PCI: imx6: move the wait for clock stabilization to enable ref clk
> > >   PCI: imx6: reset link on resume
> > 
> > Thanks you for your patch, but it may have conflict with
> > https://lore.kernel.org/linux-pci/Zr4XG6r+HnbIlu8S@lizhi-Precision-Tower-5810/T/#t
> > 
> 
> Thanks a lot for the hint. I will have a look at the series and see if I
> can adapt my changes including your suggestions.

I did some more tests with your series applied. Everything works as
expected on an i.MX8M Plus. However, the i.MX6Quad PCIe link still does
not work after a resume. I could trace the issues back to the following
functions, besides that we could use the same suspend/resume function as
for the other i.MX SoCs.

In suspend the follwoing function is causing issues:
imx_pcie_stop_link(imx_pcie->pci);

In resume the following functions are causing issues:
imx_pcie->drvdata->init_phy(imx_pcie); // in imx_pcie_host_init
imx_pcie_start_link(imx_pcie->pci);

I think the second one makes sense because I could not stop the link I
should not start it again. But why is also the init_phy function
failing? Are they required to setup the link? 

The messages I get when the system resumes are:
[   50.176212] Enabling non-boot CPUs ...
[   50.194446] CPU1 is up
[   50.198087] CPU2 is up
[   50.201746] CPU3 is up
[   50.563710] imx6q-pcie 1ffc000.pcie: Read DBI address failed

After the last message the system hangs. It seems this happens because
the PCIe I/O mem is not accessible anymore.

Do you have an idea what could cause imx_pcie_stop_link to break the
link on resume? Without calling them the link is working fine after
resuming and the drivers can access the PCIe I/O mem.

Thanks,
Stefan
Frank Li Aug. 21, 2024, 4:51 p.m. UTC | #4
On Wed, Aug 21, 2024 at 05:41:00PM +0200, Stefan Eichenberger wrote:
> Hi Frank,
>
> On Tue, Aug 20, 2024 at 08:52:41AM +0200, Stefan Eichenberger wrote:
> > On Mon, Aug 19, 2024 at 10:30:38AM -0400, Frank Li wrote:
> > > On Mon, Aug 19, 2024 at 11:03:16AM +0200, Stefan Eichenberger wrote:
> > > > On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
> > > > suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
> > > > at all, so the system freezes when a PCIe driver tries to access its I/O
> > > > space. The only way to get resume working again is to reset the PCIe
> > > > link, similar to what is done on devices that support suspend/resume.
> > > > Through trial and error, we found that something about the PCIe
> > > > reference clock does not work as expected after a resume. We could not
> > > > figure out if it is disabled (even though the registers still say it is
> > > > enabled), or if it is somehow unstable or has some hiccups. With the
> > > > workaround introduced in this patch series, we were able to fully resume
> > > > a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
> > > > Wifi module. If there is a better way or other ideas on how to fix this
> > > > problem, please let us know. We are aware that resetting the link should
> > > > not be necessary, but we could not find a better solution. More
> > > > interestingly, even the SoCs that support suspend/resume according to
> > > > the i.MX erratas seem to reset the link on resume in
> > > > imx6_pcie_host_init, so we hope this might be a valid workaround.
> > > >
> > > > Stefan Eichenberger (3):
> > > >   PCI: imx6: Add a function to deassert the reset gpio
> > > >   PCI: imx6: move the wait for clock stabilization to enable ref clk
> > > >   PCI: imx6: reset link on resume
> > >
> > > Thanks you for your patch, but it may have conflict with
> > > https://lore.kernel.org/linux-pci/Zr4XG6r+HnbIlu8S@lizhi-Precision-Tower-5810/T/#t
> > >
> >
> > Thanks a lot for the hint. I will have a look at the series and see if I
> > can adapt my changes including your suggestions.
>
> I did some more tests with your series applied. Everything works as
> expected on an i.MX8M Plus. However, the i.MX6Quad PCIe link still does
> not work after a resume. I could trace the issues back to the following
> functions, besides that we could use the same suspend/resume function as
> for the other i.MX SoCs.

Upstream 6q pci don't support suspend/resume.

 [IMX6Q] = {
                .variant = IMX6Q,
                .flags = IMX_PCIE_FLAG_IMX_PHY |
                         IMX_PCIE_FLAG_IMX_SPEED_CHANGE,

If you add some code, can you post your patch(mark as RFC) then let me to
check.

Frank

>
> In suspend the follwoing function is causing issues:
> imx_pcie_stop_link(imx_pcie->pci);
>
> In resume the following functions are causing issues:
> imx_pcie->drvdata->init_phy(imx_pcie); // in imx_pcie_host_init
> imx_pcie_start_link(imx_pcie->pci);
>
> I think the second one makes sense because I could not stop the link I
> should not start it again. But why is also the init_phy function
> failing? Are they required to setup the link?
>
> The messages I get when the system resumes are:
> [   50.176212] Enabling non-boot CPUs ...
> [   50.194446] CPU1 is up
> [   50.198087] CPU2 is up
> [   50.201746] CPU3 is up
> [   50.563710] imx6q-pcie 1ffc000.pcie: Read DBI address failed
>
> After the last message the system hangs. It seems this happens because
> the PCIe I/O mem is not accessible anymore.
>
> Do you have an idea what could cause imx_pcie_stop_link to break the
> link on resume? Without calling them the link is working fine after
> resuming and the drivers can access the PCIe I/O mem.
>
> Thanks,
> Stefan
Stefan Eichenberger Aug. 22, 2024, 7:20 a.m. UTC | #5
On Wed, Aug 21, 2024 at 12:51:12PM -0400, Frank Li wrote:
> On Wed, Aug 21, 2024 at 05:41:00PM +0200, Stefan Eichenberger wrote:
> > Hi Frank,
> >
> > On Tue, Aug 20, 2024 at 08:52:41AM +0200, Stefan Eichenberger wrote:
> > > On Mon, Aug 19, 2024 at 10:30:38AM -0400, Frank Li wrote:
> > > > On Mon, Aug 19, 2024 at 11:03:16AM +0200, Stefan Eichenberger wrote:
> > > > > On the i.MX6Quad (not QuadPlus), the PCIe link does not work after a
> > > > > suspend/resume cycle. Worse, the PCIe memory mapped I/O isn't accessible
> > > > > at all, so the system freezes when a PCIe driver tries to access its I/O
> > > > > space. The only way to get resume working again is to reset the PCIe
> > > > > link, similar to what is done on devices that support suspend/resume.
> > > > > Through trial and error, we found that something about the PCIe
> > > > > reference clock does not work as expected after a resume. We could not
> > > > > figure out if it is disabled (even though the registers still say it is
> > > > > enabled), or if it is somehow unstable or has some hiccups. With the
> > > > > workaround introduced in this patch series, we were able to fully resume
> > > > > a Compex WLE900VX (ath10k) miniPCIe Wifi module and an Intel AX200 M.2
> > > > > Wifi module. If there is a better way or other ideas on how to fix this
> > > > > problem, please let us know. We are aware that resetting the link should
> > > > > not be necessary, but we could not find a better solution. More
> > > > > interestingly, even the SoCs that support suspend/resume according to
> > > > > the i.MX erratas seem to reset the link on resume in
> > > > > imx6_pcie_host_init, so we hope this might be a valid workaround.
> > > > >
> > > > > Stefan Eichenberger (3):
> > > > >   PCI: imx6: Add a function to deassert the reset gpio
> > > > >   PCI: imx6: move the wait for clock stabilization to enable ref clk
> > > > >   PCI: imx6: reset link on resume
> > > >
> > > > Thanks you for your patch, but it may have conflict with
> > > > https://lore.kernel.org/linux-pci/Zr4XG6r+HnbIlu8S@lizhi-Precision-Tower-5810/T/#t
> > > >
> > >
> > > Thanks a lot for the hint. I will have a look at the series and see if I
> > > can adapt my changes including your suggestions.
> >
> > I did some more tests with your series applied. Everything works as
> > expected on an i.MX8M Plus. However, the i.MX6Quad PCIe link still does
> > not work after a resume. I could trace the issues back to the following
> > functions, besides that we could use the same suspend/resume function as
> > for the other i.MX SoCs.
> 
> Upstream 6q pci don't support suspend/resume.
> 
>  [IMX6Q] = {
>                 .variant = IMX6Q,
>                 .flags = IMX_PCIE_FLAG_IMX_PHY |
>                          IMX_PCIE_FLAG_IMX_SPEED_CHANGE,
> 
> If you add some code, can you post your patch(mark as RFC) then let me to
> check.

Sure, I will wait until your series is merged and then try to come up
with an RFC for suspend/resume on i.MX6Quad. Maybe we could use the same
mechanism as for the downstream kernel where they just toggle the
IMX6Q_GPR1_PCIE_TEST_PD bit?
https://github.com/nxp-imx/linux-imx/commit/4e92355e1f79d225ea842511fcfd42b343b32995