Message ID | 20240314-pci-epf-rework-v1-1-6134e6c1d491@linaro.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI: endpoint: Make host reboot handling more robust | expand |
On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote: > All EP specific resources are enabled during PERST# deassert. As a counter > operation, all resources should be disabled during PERST# assert. There is > no point in skipping that if the link was not enabled. > > This will also result in enablement of the resources twice if PERST# got > deasserted again. So remove the check from qcom_pcie_perst_assert() and > disable all the resources unconditionally. > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > --- > drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > index 2fb8c15e7a91..50b1635e3cbb 100644 > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > static void qcom_pcie_perst_assert(struct dw_pcie *pci) > { > struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); > - struct device *dev = pci->dev; > - > - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { > - dev_dbg(dev, "Link is already disabled\n"); > - return; > - } > > dw_pcie_ep_cleanup(&pci->ep); > qcom_pcie_disable_resources(pcie_ep); Are you really sure that this is safe? I think I remember seeing some splat in dmesg if some clks, or maybe it was regulators, got disabled while already being disabled. Perhaps you could test it by simply calling: qcom_pcie_disable_resources(); twice here, and see if you see and splat in dmesg. Kind regards, Niklas
On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote: > On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote: > > All EP specific resources are enabled during PERST# deassert. As a counter > > operation, all resources should be disabled during PERST# assert. There is > > no point in skipping that if the link was not enabled. > > > > This will also result in enablement of the resources twice if PERST# got > > deasserted again. So remove the check from qcom_pcie_perst_assert() and > > disable all the resources unconditionally. > > > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > --- > > drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ > > 1 file changed, 6 deletions(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > index 2fb8c15e7a91..50b1635e3cbb 100644 > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > > static void qcom_pcie_perst_assert(struct dw_pcie *pci) > > { > > struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); > > - struct device *dev = pci->dev; > > - > > - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { > > - dev_dbg(dev, "Link is already disabled\n"); > > - return; > > - } > > > > dw_pcie_ep_cleanup(&pci->ep); > > qcom_pcie_disable_resources(pcie_ep); > > Are you really sure that this is safe? > > I think I remember seeing some splat in dmesg if some clks, or maybe it > was regulators, got disabled while already being disabled. > > Perhaps you could test it by simply calling: > qcom_pcie_disable_resources(); > twice here, and see if you see and splat in dmesg. > Calling the disable_resources() function twice will definitely result in the splat. But here PERST# is level triggered, so I don't see how the EP can see assert twice. Am I missing something? - Mani
On Tue, Mar 26, 2024 at 01:14:29PM +0530, Manivannan Sadhasivam wrote: > On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote: > > On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote: > > > All EP specific resources are enabled during PERST# deassert. As a counter > > > operation, all resources should be disabled during PERST# assert. There is > > > no point in skipping that if the link was not enabled. > > > > > > This will also result in enablement of the resources twice if PERST# got > > > deasserted again. So remove the check from qcom_pcie_perst_assert() and > > > disable all the resources unconditionally. > > > > > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > --- > > > drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ > > > 1 file changed, 6 deletions(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > index 2fb8c15e7a91..50b1635e3cbb 100644 > > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > > > static void qcom_pcie_perst_assert(struct dw_pcie *pci) > > > { > > > struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); > > > - struct device *dev = pci->dev; > > > - > > > - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { > > > - dev_dbg(dev, "Link is already disabled\n"); > > > - return; > > > - } > > > > > > dw_pcie_ep_cleanup(&pci->ep); > > > qcom_pcie_disable_resources(pcie_ep); > > > > Are you really sure that this is safe? > > > > I think I remember seeing some splat in dmesg if some clks, or maybe it > > was regulators, got disabled while already being disabled. > > > > Perhaps you could test it by simply calling: > > qcom_pcie_disable_resources(); > > twice here, and see if you see and splat in dmesg. > > > > Calling the disable_resources() function twice will definitely result in the > splat. But here PERST# is level triggered, so I don't see how the EP can see > assert twice. > > Am I missing something? I think I remember now, I was developing a driver using a .core_init_notifier, but I followed the pcie-tegra model, which does not enable any resources in probe() (it only gets them), so I got the splat because when PERST got asserted, resources would get disabled even though they were already disabled. pcie-qcom: -gets resources in .probe() -enables resources in .probe() -sets no default state in .probe() pcie-tegra: -gets resources in .probe() -enables resources in perst_deassert() -sets default state to EP_STATE_DISABLED in probe() So pcie-qcom does not seem to be following the same pattern like pcie-tegra, because pcie-qcom actually does enable resources for the first time in probe(), while tegra will enable resources for the first time in perst_deassert(). Sorry for the noise. Kind regards, Niklas
On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote: > All EP specific resources are enabled during PERST# deassert. As a counter > operation, all resources should be disabled during PERST# assert. There is > no point in skipping that if the link was not enabled. > > This will also result in enablement of the resources twice if PERST# got > deasserted again. So remove the check from qcom_pcie_perst_assert() and > disable all the resources unconditionally. > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > --- > drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > index 2fb8c15e7a91..50b1635e3cbb 100644 > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > static void qcom_pcie_perst_assert(struct dw_pcie *pci) > { > struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); > - struct device *dev = pci->dev; > - > - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { > - dev_dbg(dev, "Link is already disabled\n"); > - return; > - } > > dw_pcie_ep_cleanup(&pci->ep); > qcom_pcie_disable_resources(pcie_ep); > > -- > 2.25.1 > It would be nice if you could do a similar change to pcie-tegra, so that the drivers are in sync, as that is not strictly related: Reviewed-by: Niklas Cassel <cassel@kernel.org>
On Tue, Mar 26, 2024 at 11:24:18AM +0100, Niklas Cassel wrote: > On Tue, Mar 26, 2024 at 01:14:29PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Mar 22, 2024 at 05:08:22PM +0100, Niklas Cassel wrote: > > > On Thu, Mar 14, 2024 at 08:53:40PM +0530, Manivannan Sadhasivam wrote: > > > > All EP specific resources are enabled during PERST# deassert. As a counter > > > > operation, all resources should be disabled during PERST# assert. There is > > > > no point in skipping that if the link was not enabled. > > > > > > > > This will also result in enablement of the resources twice if PERST# got > > > > deasserted again. So remove the check from qcom_pcie_perst_assert() and > > > > disable all the resources unconditionally. > > > > > > > > Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > --- > > > > drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ > > > > 1 file changed, 6 deletions(-) > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > > index 2fb8c15e7a91..50b1635e3cbb 100644 > > > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > > > @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > > > > static void qcom_pcie_perst_assert(struct dw_pcie *pci) > > > > { > > > > struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); > > > > - struct device *dev = pci->dev; > > > > - > > > > - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { > > > > - dev_dbg(dev, "Link is already disabled\n"); > > > > - return; > > > > - } > > > > > > > > dw_pcie_ep_cleanup(&pci->ep); > > > > qcom_pcie_disable_resources(pcie_ep); > > > > > > Are you really sure that this is safe? > > > > > > I think I remember seeing some splat in dmesg if some clks, or maybe it > > > was regulators, got disabled while already being disabled. > > > > > > Perhaps you could test it by simply calling: > > > qcom_pcie_disable_resources(); > > > twice here, and see if you see and splat in dmesg. > > > > > > > Calling the disable_resources() function twice will definitely result in the > > splat. But here PERST# is level triggered, so I don't see how the EP can see > > assert twice. > > > > Am I missing something? > > I think I remember now, I was developing a driver using a .core_init_notifier, > but I followed the pcie-tegra model, which does not enable any resources in > probe() (it only gets them), so I got the splat because when PERST got > asserted, resources would get disabled even though they were already disabled. > > pcie-qcom: > -gets resources in .probe() > -enables resources in .probe() > -sets no default state in .probe() > > pcie-tegra: > -gets resources in .probe() > -enables resources in perst_deassert() > -sets default state to EP_STATE_DISABLED in probe() > > So pcie-qcom does not seem to be following the same pattern like pcie-tegra, > because pcie-qcom actually does enable resources for the first time in > probe(), while tegra will enable resources for the first time in > perst_deassert(). > > Sorry for the noise. > I was planning to drop enable_resources() from Qcom driver once the DBI rework series gets merged. Because, the resource enablement during probe is currently done to avoid the crash that is bound to happen if registers are accessed during probe. But what your observation reveals is that it is possible to get PERST# assert during the EP boot up itself which I was not accounting for. I always assumed that the EP will receive PERST# deassert first. If that is not the case, then this patch needs to be dropped. - Mani
On Tue, Mar 26, 2024 at 04:40:21PM +0530, Manivannan Sadhasivam wrote: > > I was planning to drop enable_resources() from Qcom driver once the DBI rework > series gets merged. Because, the resource enablement during probe is currently > done to avoid the crash that is bound to happen if registers are accessed during > probe. > > But what your observation reveals is that it is possible to get PERST# assert > during the EP boot up itself which I was not accounting for. I always assumed > that the EP will receive PERST# deassert first. If that is not the case, then > this patch needs to be dropped. From what I saw when having debug prints from my old email to you: https://lore.kernel.org/linux-pci/Zalu%2F%2FdNi5BhZlBU@x1-carbon/ ## RC side: # reboot ## EP side [ 845.606810] pci: PERST asserted by host! [ 852.483985] pci: PERST de-asserted by host! [ 852.503041] pci: PERST asserted by host! [ 852.522375] pci: link up! (LTSSM_STATUS: 0x230011) [ 852.610318] pci: PERST de-asserted by host! So in my case, I assume that the RC asserts PERST during a SoC reset. This is obviously from the RC driver asserting PERST + sleep 100 ms + PERST deassert: [ 852.503041] pci: PERST asserted by host! [ 852.610318] pci: PERST de-asserted by host! The two before that: [ 852.483985] pci: PERST de-asserted by host! [ 852.503041] pci: PERST asserted by host! appears to be because the RC I am using, incorrectly sets the PERST gpio as ACTIVE HIGH: https://github.com/torvalds/linux/blob/v6.9-rc1/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L300 Well, at least they are bug compatible and sets the output to: https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L170-L184 0 and the 1, which, since the DT binding is incorrect, will actually do the right thing and assert and the deassert PERST. The problem seems to be that the initial flags: https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L242-L243 is: GPIOD_OUT_HIGH which explains why I get the extra: [ 852.483985] pci: PERST de-asserted by host! before [ 852.503041] pci: PERST asserted by host! with basically no time between them.. I guess I should send a patch to set the initial value to GPIOD_OUT_LOW, so that the RC driver does not trigger a "spurious" PERST deassertion when requesting the IRQ. So I think this patch should be fine if the RC is not buggy, but as we can see, in reality there are at least one platform in mainline that does manage to get this wrong. Kind regards, Niklas
On Tue, Mar 26, 2024 at 02:47:30PM +0100, Niklas Cassel wrote: > On Tue, Mar 26, 2024 at 04:40:21PM +0530, Manivannan Sadhasivam wrote: > > > > I was planning to drop enable_resources() from Qcom driver once the DBI rework > > series gets merged. Because, the resource enablement during probe is currently > > done to avoid the crash that is bound to happen if registers are accessed during > > probe. > > > > But what your observation reveals is that it is possible to get PERST# assert > > during the EP boot up itself which I was not accounting for. I always assumed > > that the EP will receive PERST# deassert first. If that is not the case, then > > this patch needs to be dropped. > > From what I saw when having debug prints from my old email to you: > https://lore.kernel.org/linux-pci/Zalu%2F%2FdNi5BhZlBU@x1-carbon/ > > > ## RC side: > # reboot > > ## EP side > [ 845.606810] pci: PERST asserted by host! > [ 852.483985] pci: PERST de-asserted by host! > [ 852.503041] pci: PERST asserted by host! > [ 852.522375] pci: link up! (LTSSM_STATUS: 0x230011) > [ 852.610318] pci: PERST de-asserted by host! > > > > So in my case, I assume that the RC asserts PERST during a SoC reset. > > This is obviously from the RC driver asserting PERST + sleep 100 ms + > PERST deassert: > [ 852.503041] pci: PERST asserted by host! > [ 852.610318] pci: PERST de-asserted by host! > > The two before that: > [ 852.483985] pci: PERST de-asserted by host! > [ 852.503041] pci: PERST asserted by host! > > appears to be because the RC I am using, incorrectly sets the PERST gpio as > ACTIVE HIGH: > https://github.com/torvalds/linux/blob/v6.9-rc1/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L300 > > Well, at least they are bug compatible and sets the output to: > https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L170-L184 > > 0 and the 1, which, since the DT binding is incorrect, will actually > do the right thing and assert and the deassert PERST. > > The problem seems to be that the initial flags: > https://github.com/torvalds/linux/blob/v6.9-rc1/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L242-L243 > is: GPIOD_OUT_HIGH > > which explains why I get the extra: > [ 852.483985] pci: PERST de-asserted by host! > before > [ 852.503041] pci: PERST asserted by host! > > with basically no time between them.. > > > I guess I should send a patch to set the initial value to > GPIOD_OUT_LOW, so that the RC driver does not trigger a > "spurious" PERST deassertion when requesting the IRQ. > > > So I think this patch should be fine if the RC is not buggy, > but as we can see, in reality there are at least one platform > in mainline that does manage to get this wrong. > I've validated with x86 and Qcom RCs so far and didn't see this behavior. So I guess I'll just keep the patch for now. - Mani
diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c index 2fb8c15e7a91..50b1635e3cbb 100644 --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c @@ -500,12 +500,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) static void qcom_pcie_perst_assert(struct dw_pcie *pci) { struct qcom_pcie_ep *pcie_ep = to_pcie_ep(pci); - struct device *dev = pci->dev; - - if (pcie_ep->link_status == QCOM_PCIE_EP_LINK_DISABLED) { - dev_dbg(dev, "Link is already disabled\n"); - return; - } dw_pcie_ep_cleanup(&pci->ep); qcom_pcie_disable_resources(pcie_ep);
All EP specific resources are enabled during PERST# deassert. As a counter operation, all resources should be disabled during PERST# assert. There is no point in skipping that if the link was not enabled. This will also result in enablement of the resources twice if PERST# got deasserted again. So remove the check from qcom_pcie_perst_assert() and disable all the resources unconditionally. Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> --- drivers/pci/controller/dwc/pcie-qcom-ep.c | 6 ------ 1 file changed, 6 deletions(-)