diff mbox series

[v2,1/2] PCI: dwc: Perform host_init() before registering msi

Message ID 20210823154958.305677-1-bjorn.andersson@linaro.org (mailing list archive)
State Not Applicable
Headers show
Series [v2,1/2] PCI: dwc: Perform host_init() before registering msi | expand

Commit Message

Bjorn Andersson Aug. 23, 2021, 3:49 p.m. UTC
On the Qualcomm sc8180x platform the bootloader does something related
to PCI that leaves a pending "msi" interrupt, which with the current
ordering often fires before init has a chance to enable the clocks that
are necessary for the interrupt handler to access the hardware.

Move the host_init() call before the registration of the "msi" interrupt
handler to ensure the host driver has a chance to enable the clocks.

The assignment of the bridge's ops and child_ops is moved along, because
at least the TI Keystone driver overwrites these in its host_init
callback.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

Changes since v1:
- New patch, instead of enabling resources in the qcom driver before jumping to
  dw_pcie_host_init(), per Rob Herring's suggestion.

 .../pci/controller/dwc/pcie-designware-host.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Comments

Rob Herring Aug. 23, 2021, 5:46 p.m. UTC | #1
On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> On the Qualcomm sc8180x platform the bootloader does something related
> to PCI that leaves a pending "msi" interrupt, which with the current
> ordering often fires before init has a chance to enable the clocks that
> are necessary for the interrupt handler to access the hardware.
> 
> Move the host_init() call before the registration of the "msi" interrupt
> handler to ensure the host driver has a chance to enable the clocks.
> 
> The assignment of the bridge's ops and child_ops is moved along, because
> at least the TI Keystone driver overwrites these in its host_init
> callback.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
> 
> Changes since v1:
> - New patch, instead of enabling resources in the qcom driver before jumping to
>   dw_pcie_host_init(), per Rob Herring's suggestion.
> 
>  .../pci/controller/dwc/pcie-designware-host.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)

Reviewed-by: Rob Herring <robh@kernel.org>
Bjorn Helgaas Aug. 24, 2021, 7:05 p.m. UTC | #2
On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> On the Qualcomm sc8180x platform the bootloader does something related
> to PCI that leaves a pending "msi" interrupt, which with the current
> ordering often fires before init has a chance to enable the clocks that
> are necessary for the interrupt handler to access the hardware.
> 
> Move the host_init() call before the registration of the "msi" interrupt
> handler to ensure the host driver has a chance to enable the clocks.

Did you audit other drivers for similar issues?  If they do, we should
fix them all at once.

> The assignment of the bridge's ops and child_ops is moved along, because
> at least the TI Keystone driver overwrites these in its host_init
> callback.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
> 
> Changes since v1:
> - New patch, instead of enabling resources in the qcom driver before jumping to
>   dw_pcie_host_init(), per Rob Herring's suggestion.
> 
>  .../pci/controller/dwc/pcie-designware-host.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index d1d9b8344ec9..f4755f3a03be 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -335,6 +335,16 @@ int dw_pcie_host_init(struct pcie_port *pp)
>  	if (pci->link_gen < 1)
>  		pci->link_gen = of_pci_get_max_link_speed(np);
>  
> +	/* Set default bus ops */
> +	bridge->ops = &dw_pcie_ops;
> +	bridge->child_ops = &dw_child_pcie_ops;
> +
> +	if (pp->ops->host_init) {
> +		ret = pp->ops->host_init(pp);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	if (pci_msi_enabled()) {
>  		pp->has_msi_ctrl = !(pp->ops->msi_host_init ||
>  				     of_property_read_bool(np, "msi-parent") ||
> @@ -388,15 +398,6 @@ int dw_pcie_host_init(struct pcie_port *pp)
>  		}
>  	}
>  
> -	/* Set default bus ops */
> -	bridge->ops = &dw_pcie_ops;
> -	bridge->child_ops = &dw_child_pcie_ops;
> -
> -	if (pp->ops->host_init) {
> -		ret = pp->ops->host_init(pp);
> -		if (ret)
> -			goto err_free_msi;
> -	}
>  	dw_pcie_iatu_detect(pci);
>  
>  	dw_pcie_setup_rc(pp);
> -- 
> 2.29.2
>
Bjorn Andersson Aug. 24, 2021, 8:15 p.m. UTC | #3
On Tue 24 Aug 12:05 PDT 2021, Bjorn Helgaas wrote:

> On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> > On the Qualcomm sc8180x platform the bootloader does something related
> > to PCI that leaves a pending "msi" interrupt, which with the current
> > ordering often fires before init has a chance to enable the clocks that
> > are necessary for the interrupt handler to access the hardware.
> > 
> > Move the host_init() call before the registration of the "msi" interrupt
> > handler to ensure the host driver has a chance to enable the clocks.
> 
> Did you audit other drivers for similar issues?  If they do, we should
> fix them all at once.
> 

I only looked at the DesignWware drivers, in an attempt to find any
issues the proposed reordering.

The set of bugs causes by drivers registering interrupts before critical
resources tends to be rather visible and I don't know if there's much
value in speculatively "fixing" drivers.

E.g. a quick look through the drivers I see a similar pattern in
pci-tegra.c, but it's unlikely that they have the similar problem in
practice and I have no way to validate that a change to the order would
have a positive effect - or any side effects.


Or am I misunderstanding your request?

Regards,
Bjorn

> > The assignment of the bridge's ops and child_ops is moved along, because
> > at least the TI Keystone driver overwrites these in its host_init
> > callback.
> > 
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> > ---
> > 
> > Changes since v1:
> > - New patch, instead of enabling resources in the qcom driver before jumping to
> >   dw_pcie_host_init(), per Rob Herring's suggestion.
> > 
> >  .../pci/controller/dwc/pcie-designware-host.c | 19 ++++++++++---------
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index d1d9b8344ec9..f4755f3a03be 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -335,6 +335,16 @@ int dw_pcie_host_init(struct pcie_port *pp)
> >  	if (pci->link_gen < 1)
> >  		pci->link_gen = of_pci_get_max_link_speed(np);
> >  
> > +	/* Set default bus ops */
> > +	bridge->ops = &dw_pcie_ops;
> > +	bridge->child_ops = &dw_child_pcie_ops;
> > +
> > +	if (pp->ops->host_init) {
> > +		ret = pp->ops->host_init(pp);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> >  	if (pci_msi_enabled()) {
> >  		pp->has_msi_ctrl = !(pp->ops->msi_host_init ||
> >  				     of_property_read_bool(np, "msi-parent") ||
> > @@ -388,15 +398,6 @@ int dw_pcie_host_init(struct pcie_port *pp)
> >  		}
> >  	}
> >  
> > -	/* Set default bus ops */
> > -	bridge->ops = &dw_pcie_ops;
> > -	bridge->child_ops = &dw_child_pcie_ops;
> > -
> > -	if (pp->ops->host_init) {
> > -		ret = pp->ops->host_init(pp);
> > -		if (ret)
> > -			goto err_free_msi;
> > -	}
> >  	dw_pcie_iatu_detect(pci);
> >  
> >  	dw_pcie_setup_rc(pp);
> > -- 
> > 2.29.2
> >
Bjorn Helgaas Aug. 24, 2021, 8:29 p.m. UTC | #4
On Tue, Aug 24, 2021 at 01:15:49PM -0700, Bjorn Andersson wrote:
> On Tue 24 Aug 12:05 PDT 2021, Bjorn Helgaas wrote:
> 
> > On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> > > On the Qualcomm sc8180x platform the bootloader does something related
> > > to PCI that leaves a pending "msi" interrupt, which with the current
> > > ordering often fires before init has a chance to enable the clocks that
> > > are necessary for the interrupt handler to access the hardware.
> > > 
> > > Move the host_init() call before the registration of the "msi" interrupt
> > > handler to ensure the host driver has a chance to enable the clocks.
> > 
> > Did you audit other drivers for similar issues?  If they do, we should
> > fix them all at once.
> 
> I only looked at the DesignWware drivers, in an attempt to find any
> issues the proposed reordering.
> 
> The set of bugs causes by drivers registering interrupts before critical
> resources tends to be rather visible and I don't know if there's much
> value in speculatively "fixing" drivers.
> 
> E.g. a quick look through the drivers I see a similar pattern in
> pci-tegra.c, but it's unlikely that they have the similar problem in
> practice and I have no way to validate that a change to the order would
> have a positive effect - or any side effects.
> 
> Or am I misunderstanding your request?

That is exactly my request.  I'm not sure if the potential issue you
noticed in pci-tegra.c is similar to the one I mentioned here:

  https://lore.kernel.org/linux-pci/20210624224040.GA3567297@bjorn-Precision-5520/

but I am actually in favor of speculatively fixing drivers even though
they're hard to test.  Code like this tends to get copied to other
places, and fixing several drivers sometimes exposes opportunities for
refactoring and sharing code.

Bjorn
Bjorn Andersson Aug. 24, 2021, 9:23 p.m. UTC | #5
On Tue 24 Aug 13:29 PDT 2021, Bjorn Helgaas wrote:

> On Tue, Aug 24, 2021 at 01:15:49PM -0700, Bjorn Andersson wrote:
> > On Tue 24 Aug 12:05 PDT 2021, Bjorn Helgaas wrote:
> > 
> > > On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> > > > On the Qualcomm sc8180x platform the bootloader does something related
> > > > to PCI that leaves a pending "msi" interrupt, which with the current
> > > > ordering often fires before init has a chance to enable the clocks that
> > > > are necessary for the interrupt handler to access the hardware.
> > > > 
> > > > Move the host_init() call before the registration of the "msi" interrupt
> > > > handler to ensure the host driver has a chance to enable the clocks.
> > > 
> > > Did you audit other drivers for similar issues?  If they do, we should
> > > fix them all at once.
> > 
> > I only looked at the DesignWware drivers, in an attempt to find any
> > issues the proposed reordering.
> > 
> > The set of bugs causes by drivers registering interrupts before critical
> > resources tends to be rather visible and I don't know if there's much
> > value in speculatively "fixing" drivers.
> > 
> > E.g. a quick look through the drivers I see a similar pattern in
> > pci-tegra.c, but it's unlikely that they have the similar problem in
> > practice and I have no way to validate that a change to the order would
> > have a positive effect - or any side effects.
> > 
> > Or am I misunderstanding your request?
> 
> That is exactly my request.

Okay.

> I'm not sure if the potential issue you
> noticed in pci-tegra.c is similar to the one I mentioned here:
> 
>   https://lore.kernel.org/linux-pci/20210624224040.GA3567297@bjorn-Precision-5520/
> 

As I still have the tegra driver open, I share your concern about the
use of potentially uninitialized variables.

The problem I was concerned about was however the same as in my patch
and the rockchip one, that if the tegra hardware isn't clocked the
pm_runtime_get_sync() (which would turn on power and clock) happens
after setting up the msi chain handler...

> but I am actually in favor of speculatively fixing drivers even though
> they're hard to test.  Code like this tends to get copied to other
> places, and fixing several drivers sometimes exposes opportunities for
> refactoring and sharing code.
> 

Looking through the other cases mentioned in your reply above certainly
gives a feeling that this problem has been inherited from driver to
driver...

I've added a ticket to my backlog to take a deeper look at this.

Regards,
Bjorn
Lorenzo Pieralisi Oct. 8, 2021, 5:48 p.m. UTC | #6
[+Vidya]

On Tue, Aug 24, 2021 at 02:23:14PM -0700, Bjorn Andersson wrote:
> On Tue 24 Aug 13:29 PDT 2021, Bjorn Helgaas wrote:
> 
> > On Tue, Aug 24, 2021 at 01:15:49PM -0700, Bjorn Andersson wrote:
> > > On Tue 24 Aug 12:05 PDT 2021, Bjorn Helgaas wrote:
> > > 
> > > > On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> > > > > On the Qualcomm sc8180x platform the bootloader does something related
> > > > > to PCI that leaves a pending "msi" interrupt, which with the current
> > > > > ordering often fires before init has a chance to enable the clocks that
> > > > > are necessary for the interrupt handler to access the hardware.
> > > > > 
> > > > > Move the host_init() call before the registration of the "msi" interrupt
> > > > > handler to ensure the host driver has a chance to enable the clocks.
> > > > 
> > > > Did you audit other drivers for similar issues?  If they do, we should
> > > > fix them all at once.
> > > 
> > > I only looked at the DesignWware drivers, in an attempt to find any
> > > issues the proposed reordering.
> > > 
> > > The set of bugs causes by drivers registering interrupts before critical
> > > resources tends to be rather visible and I don't know if there's much
> > > value in speculatively "fixing" drivers.
> > > 
> > > E.g. a quick look through the drivers I see a similar pattern in
> > > pci-tegra.c, but it's unlikely that they have the similar problem in
> > > practice and I have no way to validate that a change to the order would
> > > have a positive effect - or any side effects.
> > > 
> > > Or am I misunderstanding your request?
> > 
> > That is exactly my request.
> 
> Okay.
> 
> > I'm not sure if the potential issue you
> > noticed in pci-tegra.c is similar to the one I mentioned here:
> > 
> >   https://lore.kernel.org/linux-pci/20210624224040.GA3567297@bjorn-Precision-5520/
> > 
> 
> As I still have the tegra driver open, I share your concern about the
> use of potentially uninitialized variables.
> 
> The problem I was concerned about was however the same as in my patch
> and the rockchip one, that if the tegra hardware isn't clocked the
> pm_runtime_get_sync() (which would turn on power and clock) happens
> after setting up the msi chain handler...
> 
> > but I am actually in favor of speculatively fixing drivers even though
> > they're hard to test.  Code like this tends to get copied to other
> > places, and fixing several drivers sometimes exposes opportunities for
> > refactoring and sharing code.
> > 
> 
> Looking through the other cases mentioned in your reply above certainly
> gives a feeling that this problem has been inherited from driver to
> driver...
> 
> I've added a ticket to my backlog to take a deeper look at this.

Vidya, can you look into this please ? In the meantime I would merge
this series.

Thanks,
Lorenzo

> 
> Regards,
> Bjorn
Bjorn Andersson Oct. 8, 2021, 7:08 p.m. UTC | #7
On Fri 08 Oct 10:48 PDT 2021, Lorenzo Pieralisi wrote:

> [+Vidya]
> 
> On Tue, Aug 24, 2021 at 02:23:14PM -0700, Bjorn Andersson wrote:
> > On Tue 24 Aug 13:29 PDT 2021, Bjorn Helgaas wrote:
> > 
> > > On Tue, Aug 24, 2021 at 01:15:49PM -0700, Bjorn Andersson wrote:
> > > > On Tue 24 Aug 12:05 PDT 2021, Bjorn Helgaas wrote:
> > > > 
> > > > > On Mon, Aug 23, 2021 at 08:49:57AM -0700, Bjorn Andersson wrote:
> > > > > > On the Qualcomm sc8180x platform the bootloader does something related
> > > > > > to PCI that leaves a pending "msi" interrupt, which with the current
> > > > > > ordering often fires before init has a chance to enable the clocks that
> > > > > > are necessary for the interrupt handler to access the hardware.
> > > > > > 
> > > > > > Move the host_init() call before the registration of the "msi" interrupt
> > > > > > handler to ensure the host driver has a chance to enable the clocks.
> > > > > 
> > > > > Did you audit other drivers for similar issues?  If they do, we should
> > > > > fix them all at once.
> > > > 
> > > > I only looked at the DesignWware drivers, in an attempt to find any
> > > > issues the proposed reordering.
> > > > 
> > > > The set of bugs causes by drivers registering interrupts before critical
> > > > resources tends to be rather visible and I don't know if there's much
> > > > value in speculatively "fixing" drivers.
> > > > 
> > > > E.g. a quick look through the drivers I see a similar pattern in
> > > > pci-tegra.c, but it's unlikely that they have the similar problem in
> > > > practice and I have no way to validate that a change to the order would
> > > > have a positive effect - or any side effects.
> > > > 
> > > > Or am I misunderstanding your request?
> > > 
> > > That is exactly my request.
> > 
> > Okay.
> > 
> > > I'm not sure if the potential issue you
> > > noticed in pci-tegra.c is similar to the one I mentioned here:
> > > 
> > >   https://lore.kernel.org/linux-pci/20210624224040.GA3567297@bjorn-Precision-5520/
> > > 
> > 
> > As I still have the tegra driver open, I share your concern about the
> > use of potentially uninitialized variables.
> > 
> > The problem I was concerned about was however the same as in my patch
> > and the rockchip one, that if the tegra hardware isn't clocked the
> > pm_runtime_get_sync() (which would turn on power and clock) happens
> > after setting up the msi chain handler...
> > 
> > > but I am actually in favor of speculatively fixing drivers even though
> > > they're hard to test.  Code like this tends to get copied to other
> > > places, and fixing several drivers sometimes exposes opportunities for
> > > refactoring and sharing code.
> > > 
> > 
> > Looking through the other cases mentioned in your reply above certainly
> > gives a feeling that this problem has been inherited from driver to
> > driver...
> > 
> > I've added a ticket to my backlog to take a deeper look at this.
> 
> Vidya, can you look into this please ? In the meantime I would merge
> this series.
> 

I would greatly appreciate that, as it unlocks 5G connectivity and NVME
support on my laptop. I still intend to review the items Bjorn pointed
out, but I haven't gotten there yet.

Thanks,
Bjorn

> Thanks,
> Lorenzo
> 
> > 
> > Regards,
> > Bjorn
Lorenzo Pieralisi Oct. 12, 2021, 12:47 p.m. UTC | #8
On Mon, 23 Aug 2021 08:49:57 -0700, Bjorn Andersson wrote:
> On the Qualcomm sc8180x platform the bootloader does something related
> to PCI that leaves a pending "msi" interrupt, which with the current
> ordering often fires before init has a chance to enable the clocks that
> are necessary for the interrupt handler to access the hardware.
> 
> Move the host_init() call before the registration of the "msi" interrupt
> handler to ensure the host driver has a chance to enable the clocks.
> 
> [...]

Applied to pci/dwc, thanks!

[1/2] PCI: dwc: Perform host_init() before registering msi
      https://git.kernel.org/lpieralisi/pci/c/7e919677bb
[2/2] PCI: qcom: Add sc8180x compatible
      https://git.kernel.org/lpieralisi/pci/c/0e00fc858f

Thanks,
Lorenzo
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index d1d9b8344ec9..f4755f3a03be 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -335,6 +335,16 @@  int dw_pcie_host_init(struct pcie_port *pp)
 	if (pci->link_gen < 1)
 		pci->link_gen = of_pci_get_max_link_speed(np);
 
+	/* Set default bus ops */
+	bridge->ops = &dw_pcie_ops;
+	bridge->child_ops = &dw_child_pcie_ops;
+
+	if (pp->ops->host_init) {
+		ret = pp->ops->host_init(pp);
+		if (ret)
+			return ret;
+	}
+
 	if (pci_msi_enabled()) {
 		pp->has_msi_ctrl = !(pp->ops->msi_host_init ||
 				     of_property_read_bool(np, "msi-parent") ||
@@ -388,15 +398,6 @@  int dw_pcie_host_init(struct pcie_port *pp)
 		}
 	}
 
-	/* Set default bus ops */
-	bridge->ops = &dw_pcie_ops;
-	bridge->child_ops = &dw_child_pcie_ops;
-
-	if (pp->ops->host_init) {
-		ret = pp->ops->host_init(pp);
-		if (ret)
-			goto err_free_msi;
-	}
 	dw_pcie_iatu_detect(pci);
 
 	dw_pcie_setup_rc(pp);