diff mbox series

[01/11] PCI: qcom-ep: Disable resources unconditionally during PERST# assert

Message ID 20240314-pci-epf-rework-v1-1-6134e6c1d491@linaro.org (mailing list archive)
State Superseded
Headers show
Series PCI: endpoint: Make host reboot handling more robust | expand

Commit Message

Manivannan Sadhasivam March 14, 2024, 3:23 p.m. UTC
All EP specific resources are enabled during PERST# deassert. As a counter
operation, all resources should be disabled during PERST# assert. There is
no point in skipping that if the link was not enabled.

This will also result in enablement of the resources twice if PERST# got
deasserted again. So remove the check from qcom_pcie_perst_assert() and
disable all the resources unconditionally.

Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Niklas Cassel March 22, 2024, 4:08 p.m. UTC | #1
On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote:
> All EP specific resources are enabled during PERST# deassert. As a counter
> operation, all resources should be disabled during PERST# assert. There is
> no point in skipping that if the link was not enabled.
> 
> This will also result in enablement of the resources twice if PERST# got
> deasserted again. So remove the check from qcom_pcie_perst_assert() and
> disable all the resources unconditionally.
> 
> Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>  drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> index 2fb8c15e7a91..50b1635e3cbb 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
>  static void qcom_pcie_perst_assert(struct dw_pcie *pci)
>  {
>  	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
> -	struct device *dev = pci->dev;
> -
> -	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
> -		dev_dbg(dev, "Link is already disabled\n");
> -		return;
> -	}
>  
>  	dw_pcie_ep_cleanup(&pci->ep);
>  	qcom_pcie_disable_resources(pcie_ep);

Are you really sure that this is safe?

I think I remember seeing some splat in dmesg if some clks, or maybe it
was regulators, got disabled while already being disabled.

Perhaps you could test it by simply calling:
qcom_pcie_disable_resources();
twice here, and see if you see and splat in dmesg.


Kind regards,
Niklas
Manivannan Sadhasivam March 26, 2024, 7:44 a.m. UTC | #2
On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote:
> On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote:
> > All EP specific resources are enabled during PERST# deassert. As a counter
> > operation, all resources should be disabled during PERST# assert. There is
> > no point in skipping that if the link was not enabled.
> > 
> > This will also result in enablement of the resources twice if PERST# got
> > deasserted again. So remove the check from qcom_pcie_perst_assert() and
> > disable all the resources unconditionally.
> > 
> > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> >  drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
> >  1 file changed, 6 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > index 2fb8c15e7a91..50b1635e3cbb 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
> >  static void qcom_pcie_perst_assert(struct dw_pcie *pci)
> >  {
> >  	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
> > -	struct device *dev = pci->dev;
> > -
> > -	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
> > -		dev_dbg(dev, "Link is already disabled\n");
> > -		return;
> > -	}
> >  
> >  	dw_pcie_ep_cleanup(&pci->ep);
> >  	qcom_pcie_disable_resources(pcie_ep);
> 
> Are you really sure that this is safe?
> 
> I think I remember seeing some splat in dmesg if some clks, or maybe it
> was regulators, got disabled while already being disabled.
> 
> Perhaps you could test it by simply calling:
> qcom_pcie_disable_resources();
> twice here, and see if you see and splat in dmesg.
> 

Calling the disable_resources() function twice will definitely result in the
splat. But here PERST# is level triggered, so I don't see how the EP can see
assert twice.

Am I missing something?

- Mani
Niklas Cassel March 26, 2024, 10:24 a.m. UTC | #3
On Tue, Mar 26, 2024 at 01:14:29PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote:
> > On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote:
> > > All EP specific resources are enabled during PERST# deassert. As a counter
> > > operation, all resources should be disabled during PERST# assert. There is
> > > no point in skipping that if the link was not enabled.
> > > 
> > > This will also result in enablement of the resources twice if PERST# got
> > > deasserted again. So remove the check from qcom_pcie_perst_assert() and
> > > disable all the resources unconditionally.
> > > 
> > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > ---
> > >  drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
> > >  1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > index 2fb8c15e7a91..50b1635e3cbb 100644
> > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
> > >  static void qcom_pcie_perst_assert(struct dw_pcie *pci)
> > >  {
> > >  	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
> > > -	struct device *dev = pci->dev;
> > > -
> > > -	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
> > > -		dev_dbg(dev, "Link is already disabled\n");
> > > -		return;
> > > -	}
> > >  
> > >  	dw_pcie_ep_cleanup(&pci->ep);
> > >  	qcom_pcie_disable_resources(pcie_ep);
> > 
> > Are you really sure that this is safe?
> > 
> > I think I remember seeing some splat in dmesg if some clks, or maybe it
> > was regulators, got disabled while already being disabled.
> > 
> > Perhaps you could test it by simply calling:
> > qcom_pcie_disable_resources();
> > twice here, and see if you see and splat in dmesg.
> > 
> 
> Calling the disable_resources() function twice will definitely result in the
> splat. But here PERST# is level triggered, so I don't see how the EP can see
> assert twice.
> 
> Am I missing something?

I think I remember now, I was developing a driver using a .core_init_notifier,
but I followed the pcie-tegra model, which does not enable any resources in
probe() (it only gets them), so I got the splat because when PERST got
asserted, resources would get disabled even though they were already disabled.

pcie-qcom:
-gets resources in .probe()
-enables resources in .probe()
-sets no default state in .probe()

pcie-tegra:
-gets resources in .probe()
-enables resources in perst_deassert()
-sets default state to EP_STATE_DISABLED in probe()

So pcie-qcom does not seem to be following the same pattern like pcie-tegra,
because pcie-qcom actually does enable resources for the first time in
probe(), while tegra will enable resources for the first time in
perst_deassert().

Sorry for the noise.


Kind regards,
Niklas
Niklas Cassel March 26, 2024, 10:37 a.m. UTC | #4
On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote:
> All EP specific resources are enabled during PERST# deassert. As a counter
> operation, all resources should be disabled during PERST# assert. There is
> no point in skipping that if the link was not enabled.
> 
> This will also result in enablement of the resources twice if PERST# got
> deasserted again. So remove the check from qcom_pcie_perst_assert() and
> disable all the resources unconditionally.
> 
> Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>  drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> index 2fb8c15e7a91..50b1635e3cbb 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
>  static void qcom_pcie_perst_assert(struct dw_pcie *pci)
>  {
>  	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
> -	struct device *dev = pci->dev;
> -
> -	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
> -		dev_dbg(dev, "Link is already disabled\n");
> -		return;
> -	}
>  
>  	dw_pcie_ep_cleanup(&pci->ep);
>  	qcom_pcie_disable_resources(pcie_ep);
> 
> -- 
> 2.25.1
> 

It would be nice if you could do a similar change to pcie-tegra,
so that the drivers are in sync, as that is not strictly related:

Reviewed-by: Niklas Cassel <cassel@kernel.org>
Manivannan Sadhasivam March 26, 2024, 11:10 a.m. UTC | #5
On Tue, Mar 26, 2024 at 11:24:18AM +0100, Niklas Cassel wrote:
> On Tue, Mar 26, 2024 at 01:14:29PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote:
> > > On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote:
> > > > All EP specific resources are enabled during PERST# deassert. As a counter
> > > > operation, all resources should be disabled during PERST# assert. There is
> > > > no point in skipping that if the link was not enabled.
> > > > 
> > > > This will also result in enablement of the resources twice if PERST# got
> > > > deasserted again. So remove the check from qcom_pcie_perst_assert() and
> > > > disable all the resources unconditionally.
> > > > 
> > > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > ---
> > > >  drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------
> > > >  1 file changed, 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > > index 2fb8c15e7a91..50b1635e3cbb 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > > > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
> > > >  static void qcom_pcie_perst_assert(struct dw_pcie *pci)
> > > >  {
> > > >  	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
> > > > -	struct device *dev = pci->dev;
> > > > -
> > > > -	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
> > > > -		dev_dbg(dev, "Link is already disabled\n");
> > > > -		return;
> > > > -	}
> > > >  
> > > >  	dw_pcie_ep_cleanup(&pci->ep);
> > > >  	qcom_pcie_disable_resources(pcie_ep);
> > > 
> > > Are you really sure that this is safe?
> > > 
> > > I think I remember seeing some splat in dmesg if some clks, or maybe it
> > > was regulators, got disabled while already being disabled.
> > > 
> > > Perhaps you could test it by simply calling:
> > > qcom_pcie_disable_resources();
> > > twice here, and see if you see and splat in dmesg.
> > > 
> > 
> > Calling the disable_resources() function twice will definitely result in the
> > splat. But here PERST# is level triggered, so I don't see how the EP can see
> > assert twice.
> > 
> > Am I missing something?
> 
> I think I remember now, I was developing a driver using a .core_init_notifier,
> but I followed the pcie-tegra model, which does not enable any resources in
> probe() (it only gets them), so I got the splat because when PERST got
> asserted, resources would get disabled even though they were already disabled.
> 
> pcie-qcom:
> -gets resources in .probe()
> -enables resources in .probe()
> -sets no default state in .probe()
> 
> pcie-tegra:
> -gets resources in .probe()
> -enables resources in perst_deassert()
> -sets default state to EP_STATE_DISABLED in probe()
> 
> So pcie-qcom does not seem to be following the same pattern like pcie-tegra,
> because pcie-qcom actually does enable resources for the first time in
> probe(), while tegra will enable resources for the first time in
> perst_deassert().
> 
> Sorry for the noise.
> 

I was planning to drop enable_resources() from Qcom driver once the DBI rework
series gets merged. Because, the resource enablement during probe is currently
done to avoid the crash that is bound to happen if registers are accessed during
probe.

But what your observation reveals is that it is possible to get PERST# assert
during the EP boot up itself which I was not accounting for. I always assumed
that the EP will receive PERST# deassert first. If that is not the case, then
this patch needs to be dropped.

- Mani
Niklas Cassel March 26, 2024, 1:47 p.m. UTC | #6
On Tue, Mar 26, 2024 at 04:40:21PM +0530, Manivannan Sadhasivam wrote:
> 
> I was planning to drop enable_resources() from Qcom driver once the DBI rework
> series gets merged. Because, the resource enablement during probe is currently
> done to avoid the crash that is bound to happen if registers are accessed during
> probe.
> 
> But what your observation reveals is that it is possible to get PERST# assert
> during the EP boot up itself which I was not accounting for. I always assumed
> that the EP will receive PERST# deassert first. If that is not the case, then
> this patch needs to be dropped.

From what I saw when having debug prints from my old email to you:
https://lore.kernel.org/linux-pci/Zalu%2F%2FdNi5BhZlBU@x1-carbon/


## RC side:
# reboot

## EP side
[  845.606810] pci: PERST asserted by host!
[  852.483985] pci: PERST de-asserted by host!
[  852.503041] pci: PERST asserted by host!
[  852.522375] pci: link up! (LTSSM_STATUS: 0x230011)
[  852.610318] pci: PERST de-asserted by host!



So in my case, I assume that the RC asserts PERST during a SoC reset.

This is obviously from the RC driver asserting PERST + sleep 100 ms +
PERST deassert:
[  852.503041] pci: PERST asserted by host!
[  852.610318] pci: PERST de-asserted by host!

The two before that:
[  852.483985] pci: PERST de-asserted by host!
[  852.503041] pci: PERST asserted by host!

appears to be because the RC I am using, incorrectly sets the PERST gpio as
ACTIVE HIGH:
https://github.com/torvalds/linux/blob/v6.9-rc1/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L300

Well, at least they are bug compatible and sets the output to:
https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L170-L184

0 and the 1, which, since the DT binding is incorrect, will actually
do the right thing and assert and the deassert PERST.

The problem seems to be that the initial flags:
https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L242-L243
is: GPIOD_OUT_HIGH

which explains why I get the extra:
[  852.483985] pci: PERST de-asserted by host!
before
[  852.503041] pci: PERST asserted by host!

with basically no time between them..


I guess I should send a patch to set the initial value to
GPIOD_OUT_LOW, so that the RC driver does not trigger a
"spurious" PERST deassertion when requesting the IRQ.


So I think this patch should be fine if the RC is not buggy,
but as we can see, in reality there are at least one platform
in mainline that does manage to get this wrong.


Kind regards,
Niklas
Manivannan Sadhasivam March 26, 2024, 1:55 p.m. UTC | #7
On Tue, Mar 26, 2024 at 02:47:30PM +0100, Niklas Cassel wrote:
> On Tue, Mar 26, 2024 at 04:40:21PM +0530, Manivannan Sadhasivam wrote:
> > 
> > I was planning to drop enable_resources() from Qcom driver once the DBI rework
> > series gets merged. Because, the resource enablement during probe is currently
> > done to avoid the crash that is bound to happen if registers are accessed during
> > probe.
> > 
> > But what your observation reveals is that it is possible to get PERST# assert
> > during the EP boot up itself which I was not accounting for. I always assumed
> > that the EP will receive PERST# deassert first. If that is not the case, then
> > this patch needs to be dropped.
> 
> From what I saw when having debug prints from my old email to you:
> https://lore.kernel.org/linux-pci/Zalu%2F%2FdNi5BhZlBU@x1-carbon/
> 
> 
> ## RC side:
> # reboot
> 
> ## EP side
> [  845.606810] pci: PERST asserted by host!
> [  852.483985] pci: PERST de-asserted by host!
> [  852.503041] pci: PERST asserted by host!
> [  852.522375] pci: link up! (LTSSM_STATUS: 0x230011)
> [  852.610318] pci: PERST de-asserted by host!
> 
> 
> 
> So in my case, I assume that the RC asserts PERST during a SoC reset.
> 
> This is obviously from the RC driver asserting PERST + sleep 100 ms +
> PERST deassert:
> [  852.503041] pci: PERST asserted by host!
> [  852.610318] pci: PERST de-asserted by host!
> 
> The two before that:
> [  852.483985] pci: PERST de-asserted by host!
> [  852.503041] pci: PERST asserted by host!
> 
> appears to be because the RC I am using, incorrectly sets the PERST gpio as
> ACTIVE HIGH:
> https://github.com/torvalds/linux/blob/v6.9-rc1/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L300
> 
> Well, at least they are bug compatible and sets the output to:
> https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L170-L184
> 
> 0 and the 1, which, since the DT binding is incorrect, will actually
> do the right thing and assert and the deassert PERST.
> 
> The problem seems to be that the initial flags:
> https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L242-L243
> is: GPIOD_OUT_HIGH
> 
> which explains why I get the extra:
> [  852.483985] pci: PERST de-asserted by host!
> before
> [  852.503041] pci: PERST asserted by host!
> 
> with basically no time between them..
> 
> 
> I guess I should send a patch to set the initial value to
> GPIOD_OUT_LOW, so that the RC driver does not trigger a
> "spurious" PERST deassertion when requesting the IRQ.
> 
> 
> So I think this patch should be fine if the RC is not buggy,
> but as we can see, in reality there are at least one platform
> in mainline that does manage to get this wrong.
> 

I've validated with x86 and Qcom RCs so far and didn't see this behavior. So I
guess I'll just keep the patch for now.

- Mani
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 2fb8c15e7a91..50b1635e3cbb 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -500,12 +500,6 @@  static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
 static void qcom_pcie_perst_assert(struct dw_pcie *pci)
 {
 	struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci);
-	struct device *dev = pci->dev;
-
-	if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) {
-		dev_dbg(dev, "Link is already disabled\n");
-		return;
-	}
 
 	dw_pcie_ep_cleanup(&pci->ep);
 	qcom_pcie_disable_resources(pcie_ep);