diff mbox series

[v1] PCI: brcmstb: Fix regression regarding missing PCIe linkup

Message ID 20220518194211.20143-1-jim2101024@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v1] PCI: brcmstb: Fix regression regarding missing PCIe linkup | expand

Commit Message

Jim Quinlan May 18, 2022, 7:42 p.m. UTC
commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")

introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
endpoint node described in [2] was missing, no linkup would be attempted,
and subsequent accesses would cause a panic because this particular PCIe HW
causes a CPU abort on illegal accesses (instead of returning 0xffffffff).

We fix this by allowing the DT endpoint subnode to be missing.  This is
important for platforms like the CM4 which have a standard PCIe socket and
the endpoint device is unknown.

Please do not accept this commit until someone with a CM4 has tested
this solution; I have only emulated the problem and fix on different
platform.

Note that a bisection identified

commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

as the first failing commit.  This commit is a regression, but is unrelated
and was fixed by a subsequent commit in the original patchset.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
[2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml

Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
---
 drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)


base-commit: ef1302160bfb19f804451d0e919266703501c875
prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e

Comments

Bjorn Helgaas May 18, 2022, 10:18 p.m. UTC | #1
[+to Cyril]

Cyril, if you have a chance to test this and verify that it fixes the
regression, we may still be able to squeeze this into v5.18.

I can add the Reported-by and Tested-by tags myself.

On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> endpoint node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT endpoint subnode to be missing.  This is
> important for platforms like the CM4 which have a standard PCIe socket and
> the endpoint device is unknown.
> 
> Please do not accept this commit until someone with a CM4 has tested
> this solution; I have only emulated the problem and fix on different
> platform.
> 
> Note that a bisection identified
> 
> commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> 
> as the first failing commit.  This commit is a regression, but is unrelated
> and was fixed by a subsequent commit in the original patchset.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> 
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..adca74e235cb 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>  
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -	struct device *dev = &bus->dev;
>  	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
>  	int ret;
>  
> -	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +	/* Only busno==1 requires us to linkup */
> +	if ((int)bus->number != 1)
>  		return 0;
>  
>  	ret = pci_subdev_regulators_add_bus(bus);
> -	if (ret)
> +	if (ret) {
> +		pcie->refusal_mode = true;
>  		return ret;
> +	}
>  
>  	/* Grab the regulators for suspend/resume */
>  	pcie->sr = bus->dev.driver_data;
> 
> base-commit: ef1302160bfb19f804451d0e919266703501c875
> prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
> prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
> prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
> prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
> prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
> prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
> prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
> prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
> -- 
> 2.17.1
>
Cyril Brulebois May 19, 2022, 6:47 a.m. UTC | #2
Bjorn Helgaas <helgaas@kernel.org> (2022-05-18):
> Cyril, if you have a chance to test this and verify that it fixes the
> regression, we may still be able to squeeze this into v5.18.
> 
> I can add the Reported-by and Tested-by tags myself.

That looks good to me (some details below), and if that's no trouble
please use those:

  Reported-by: Cyril Brulebois <cyril@debamax.com>
  Tested-by: Cyril Brulebois <cyril@debamax.com>

> Jim Quinlan <jim2101024@gmail.com> (2022-05-18):
> > Please do not accept this commit until someone with a CM4 has tested
> > this solution; I have only emulated the problem and fix on different
> > platform.

Applying this patch on top of v5.18-rc7-48-gf993aed406ea, testing with a
CM4 mounted on a Compute Module 4 IO Board, the boot is fine again.

With an empty PCIe slot:

    root@rpi4-20220428:~# dmesg|grep -i pci
    [    0.158519] PCI: CLS 0 bytes, default 64
    [    3.374963] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
    [    3.375959] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
    [    3.375994] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
    [    3.376042] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
    [    3.376096] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
    [    3.376837] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
    [    3.376864] pci_bus 0000:00: root bus resource [bus 00-ff]
    [    3.376886] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
    [    3.376950] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
    [    3.377057] pci 0000:00:00.0: PME# supported from D0 D3hot
    [    3.379455] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
    [    3.698799] brcm-pcie fd500000.pcie: link down
    [    3.700816] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
    [    3.700855] pci 0000:00:00.0: PCI bridge to [bus 01]
    [    3.701216] pcieport 0000:00:00.0: PME: Signaling with IRQ 51
    [    3.701621] pcieport 0000:00:00.0: AER: enabled with IRQ 51
    [    3.702134] pci_bus 0000:01: busn_res: [bus 01] is released
    [    3.702417] pci_bus 0000:00: busn_res: [bus 00-ff] is released

With a PCIe→quad-USB extension board, and keyboard + USB stick on it:

    root@rpi4-20220428:~# dmesg|grep -i pci|grep -v input:
    [    0.157680] PCI: CLS 0 bytes, default 64
    [    3.374070] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
    [    3.375080] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
    [    3.375116] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
    [    3.375166] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
    [    3.375211] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
    [    3.375946] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
    [    3.375972] pci_bus 0000:00: root bus resource [bus 00-ff]
    [    3.375993] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
    [    3.376057] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
    [    3.376164] pci 0000:00:00.0: PME# supported from D0 D3hot
    [    3.428109] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
    [    3.428175] pci 0000:01:00.0: [1106:3483] type 00 class 0x0c0330
    [    3.428219] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit]
    [    3.428346] pci 0000:01:00.0: PME# supported from D0 D3cold
    [    3.430673] pci 0000:00:00.0: BAR 14: assigned [mem 0x600000000-0x6000fffff]
    [    3.430706] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit]
    [    3.430742] pci 0000:00:00.0: PCI bridge to [bus 01]
    [    3.430761] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6000fffff]
    [    3.430976] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
    [    3.431150] pcieport 0000:00:00.0: PME: Signaling with IRQ 51
    [    3.431552] pcieport 0000:00:00.0: AER: enabled with IRQ 51
    [    3.431765] pci 0000:01:00.0: enabling device (0000 -> 0002)

Both keyboard and storage on USB work fine.

> > commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two
> > funcs")
> > 
> > as the first failing commit.  This commit is a regression, but is
> > unrelated and was fixed by a subsequent commit in the original
> > patchset.

For completeness, since the original patchset was merged before
v5.17-rc1, I verified that the latest release from the linux-5.17.y
branch (i.e. v5.17.9) is also broken in the same way (not surprising
but I thought I'd check anyway):

    [    1.952374] Kernel panic - not syncing: Asynchronous SError Interrupt

Testing this patch on top of v5.17.9, it also fixes the boot there (with
or without the extension board on the PCIe slot), so it looks to me this
patch could get cc'ed to stable for inclusion into the linux-5.17.y
branch.


Please let me know if you need more testing/feedback.


Cheers,
Bjorn Helgaas May 19, 2022, 4:10 p.m. UTC | #3
[+to Rob for my naive DT questions]

On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> endpoint node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT endpoint subnode to be missing.  This is
> important for platforms like the CM4 which have a standard PCIe socket and
> the endpoint device is unknown.

I assume you're referring specifically to making this optional in the
DT:

    /* PCIe endpoint */
    pci-ep@0,0 {
            assigned-addresses =
                <0x82010000 0x0 0xf8000000 0x6 0x00000000 0x0 0x2000>;
            reg = <0x0 0x0 0x0 0x0 0x0>;
            compatible = "pci14e4,1688";
    };

I don't really understand what's going on here, but I assume this
describes a [14e4:1688] device, which the PCI database says is a
NetXtreme BCM5761 10/100/1000BASE-T Ethernet
(https://pci-ids.ucw.cz/read/PC/14e4/1688)

Why do you *ever* need this stanza?  "git grep pci-ep
Documentation/devicetree/bindings/pci/" says no other DT has one.

If the link does come up, I assume normal PCI enumeration would
discover the [14e4:1688] or whatever device is plugged into a CM4
socket, and it would read and assign BARs as needed.  Why do we need
to describe any of this in the DT?

If the link doesn't come up, it looks like you set the "refusal_mode"
so subsequent config accesses fail gracefully instead of with a CPU
abort.

[Tangent: since you never clear "refusal_mode", I assume there's no
possibility of hot-adding a device.  A device must be put in the slot
before power-up, right?]

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> 
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..adca74e235cb 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>  
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -	struct device *dev = &bus->dev;
>  	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
>  	int ret;
>  
> -	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +	/* Only busno==1 requires us to linkup */
> +	if ((int)bus->number != 1)

It's a big leap from "DT endpoint is optional" to "bus->number == 1 if
DT endpoint is missing" (if that's even what it means).  Help me
connect the dots here.

I *guess* this is really saying "we only want to bring the link up for
RPs"?

And "bus->number == 1" assumes the RP is on bus 0, there's only one
RP, and that RP's secondary bus is 1?  So it's only in that case
(we're adding the secondary bus of the RP), that we need to manually
bring up the link?

>  		return 0;
>  
>  	ret = pci_subdev_regulators_add_bus(bus);
> -	if (ret)
> +	if (ret) {
> +		pcie->refusal_mode = true;

Is this related?  It doesn't *look* related to making the DT endpoint
optional.

>  		return ret;
> +	}
>  
>  	/* Grab the regulators for suspend/resume */
>  	pcie->sr = bus->dev.driver_data;
> 
> base-commit: ef1302160bfb19f804451d0e919266703501c875
> prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
> prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
> prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
> prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
> prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
> prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
> prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
> prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
> -- 
> 2.17.1
>
Jim Quinlan May 19, 2022, 6:04 p.m. UTC | #4
On Thu, May 19, 2022 at 12:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+to Rob for my naive DT questions]
>
> On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> >
> > introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> > endpoint node described in [2] was missing, no linkup would be attempted,
> > and subsequent accesses would cause a panic because this particular PCIe HW
> > causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> >
> > We fix this by allowing the DT endpoint subnode to be missing.  This is
> > important for platforms like the CM4 which have a standard PCIe socket and
> > the endpoint device is unknown.
>
> I assume you're referring specifically to making this optional in the
> DT:
>
>     /* PCIe endpoint */
>     pci-ep@0,0 {
>             assigned-addresses =
>                 <0x82010000 0x0 0xf8000000 0x6 0x00000000 0x0 0x2000>;
>             reg = <0x0 0x0 0x0 0x0 0x0>;
>             compatible = "pci14e4,1688";
>     };
>
Actually, both that and the node that contains it, i.e. pci@0,0.

> I don't really understand what's going on here, but I assume this
> describes a [14e4:1688] device, which the PCI database says is a
> NetXtreme BCM5761 10/100/1000BASE-T Ethernet
> (https://pci-ids.ucw.cz/read/PC/14e4/1688)

Yes.  I use an assortment of PCIe endpoint devices for testing.
>
> Why do you *ever* need this stanza?  "git grep pci-ep
> Documentation/devicetree/bindings/pci/" says no other DT has one.

You'll find one in
"Documentation/devicetree/bindings/pci/nvidia,tegra-pcie.txt", line
~240, although this
is a board DTS example.  They use "pci@0,0" for endpoint 02:00.0,
whereas I find "pci-ep" to
be more descriptive.

Note that  the "pci-ep@0,0" node is in the "example" section of
brcm,stb-pcie.yaml; but nothing
says it is required.  I believe it was added it because a reviewer
asked me to, but if I remember
incorrectly,  it does illustrate that "pcie@0,0" is not the endpoint
device node as many would think.

Note that the regression occurred because "pci@0,0" was missing, not
"pci-ep@0,0" as I first thought.

>
> If the link does come up, I assume normal PCI enumeration would
> discover the [14e4:1688] or whatever device is plugged into a CM4
> socket, and it would read and assign BARs as needed.  Why do we need
> to describe any of this in the DT?
The only  reason one needs to describe this node is  when a regulator is
under the root port, in my case pci@0,0.  In the example this is

                            vpcie3v3-supply = <&vreg7>;

This was the entire reason behind the original patchset.
>
> If the link doesn't come up, it looks like you set the "refusal_mode"
> so subsequent config accesses fail gracefully instead of with a CPU
> abort.
Yes.
>
> [Tangent: since you never clear "refusal_mode", I assume there's no
> possibility of hot-adding a device.  A device must be put in the slot
> before power-up, right?]
Yes, we do not have the HW functionality to support hotplug.

>
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> >
> > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > ---
> >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > index ba5c120816b2..adca74e235cb 100644
> > --- a/drivers/pci/controller/pcie-brcmstb.c
> > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> >
> >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> >  {
> > -     struct device *dev = &bus->dev;
> >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> >       int ret;
> >
> > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > +     /* Only busno==1 requires us to linkup */
> > +     if ((int)bus->number != 1)
>
> It's a big leap from "DT endpoint is optional" to "bus->number == 1 if
> DT endpoint is missing" (if that's even what it means).  Help me
> connect the dots here.
The brcm_pcie_add_bus() function returned immediately and skipped linkup
when (!dev->of_node). That clause was removed from that function, which
is the true fix for the regression,  but you can see thiscondition
is still tested in pci_subdev_regulators_add_bus().

I added the "busno != 1" as an added precaution,
as the brcmstb RC driver only cares about pcie linkup and turning on
regulators when busno==1.

If this regulator mechanism becomes a feature any RC driver may use --
as it was in
v8 of the original patch but was moved to pcie-brcamstb only to avoid conflicts
with Pali's upcoming RC functionality improvements -- I would probably consider
removing the busno==1 clause.

Regards and thanks,
Jim Quinlan
Broadcom S


>
> I *guess* this is really saying "we only want to bring the link up for
> RPs"?
>
> And "bus->number == 1" assumes the RP is on bus 0, there's only one
> RP, and that RP's secondary bus is 1?  So it's only in that case
> (we're adding the secondary bus of the RP), that we need to manually
> bring up the link?
>
> >               return 0;
> >
> >       ret = pci_subdev_regulators_add_bus(bus);
> > -     if (ret)
> > +     if (ret) {
> > +             pcie->refusal_mode = true;
>
> Is this related?  It doesn't *look* related to making the DT endpoint
> optional.
>
> >               return ret;
> > +     }
> >
> >       /* Grab the regulators for suspend/resume */
> >       pcie->sr = bus->dev.driver_data;
> >
> > base-commit: ef1302160bfb19f804451d0e919266703501c875
> > prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
> > prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
> > prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
> > prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
> > prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
> > prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
> > prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
> > prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
> > --
> > 2.17.1
> >
Jim Quinlan May 19, 2022, 7:58 p.m. UTC | #5
On Thu, May 19, 2022 at 2:04 PM Jim Quinlan <jim2101024@gmail.com> wrote:
>
> On Thu, May 19, 2022 at 12:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+to Rob for my naive DT questions]
> >
> > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > >
> > > introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> > > endpoint node described in [2] was missing, no linkup would be attempted,
> > > and subsequent accesses would cause a panic because this particular PCIe HW
> > > causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> > >
> > > We fix this by allowing the DT endpoint subnode to be missing.  This is
> > > important for platforms like the CM4 which havedev->dev.of_nodei a standard PCIe socket and
> > > the endpoint device is unknown.
> >
> > I assume you're referring specifically to making this optional in the
> > DT:
> >
> >     /* PCIe endpoint */
> >     pci-ep@0,0 {
> >             assigned-addresses =
> >                 <0x82010000 0x0 0xf8000000 0x6 0x00000000 0x0 0x2000>;
> >             reg = <0x0 0x0 0x0 0x0 0x0>;
> >             compatible = "pci14e4,1688";
> >     };
> >
> Actually, both that and the node that contains it, i.e. pci@0,0.
>
> > I don't really understand what's going on here, but I assume this
> > describes a [14e4:1688] device, which the PCI database says is a
> > NetXtreme BCM5761 10/100/1000BASE-T Ethernet
> > (https://pci-ids.ucw.cz/read/PC/14e4/1688)
>
> Yes.  I use an assortment of PCIe endpoint devices for testing.
> >
> > Why do you *ever* need this stanza?  "git grep pci-ep
> > Documentation/devicetree/bindings/pci/" says no other DT has one.
>
> You'll find one in
> "Documentation/devicetree/bindings/pci/nvidia,tegra-pcie.txt", line
> ~240, although this
> is a board DTS example.  They use "pci@0,0" for endpoint 02:00.0,
> whereas I find "pci-ep" to
> be more descriptive.
>
> Note that  the "pci-ep@0,0" node is in the "example" section of
> brcm,stb-pcie.yaml; but nothing
> says it is required.  I believe it was added it because a reviewer
> asked me to, but if I remember
> incorrectly,  it does illustrate that "pcie@0,0" is not the endpoint
> device node as many would think.
>
> Note that the regression occurred because "pci@0,0" was missing, not
> "pci-ep@0,0" as I first thought.
>
> >
> > If the link does come up, I assume normal PCI enumeration would
> > discover the [14e4:1688] or whatever device is plugged into a CM4
> > socket, and it would read and assign BARs as needed.  Why do we need
> > to describe any of this in the DT?
Hi Bjorn,

I was remiss in not mentioning our biggest actual use of specifying
this sub-subnode: to pass info to the endproint driver.  For example:

pcie@1000110000 {
        compatible = "brcm,bcm7211-pcie";
        /* ... */

        pci@0,0 {
                compatible = "pciclass,0604";
                /* ... */

                pci-ep@0,0 {
                        local-mac-address = [ 00 10 18 d0 3c 51 ];
                        reg = <0x10000 0x0 0x0 0x0 0x0>;
                };
        };
};

The PCIe endpoint driver can just invoke

        of_get_mac_address(dev->dev.of_node, &addr)


Regards,
Jim Quinlan
Broadcom STB


> The only  reason one needs to describe this node is  when a regulator is
> under the root port, in my case pci@0,0.  In the example this is
>
>                             vpcie3v3-supply = <&vreg7>;
>
> This was the entire reason behind the original patchset.
> >
> > If the link doesn't come up, it looks like you set the "refusal_mode"
> > so subsequent config accesses fail gracefully instead of with a CPU
> > abort.
> Yes.
> >
> > [Tangent: since you never clear "refusal_mode", I assume there's no
> > possibility of hot-adding a device.  A device must be put in the slot
> > before power-up, right?]
> Yes, we do not have the HW functionality to support hotplug.
>
> >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > >
> > > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > > ---
> > >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > > index ba5c120816b2..adca74e235cb 100644
> > > --- a/drivers/pci/controller/pcie-brcmstb.c
> > > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> > >
> > >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> > >  {
> > > -     struct device *dev = &bus->dev;
> > >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> > >       int ret;
> > >
> > > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > > +     /* Only busno==1 requires us to linkup */
> > > +     if ((int)bus->number != 1)
> >
> > It's a big leap from "DT endpoint is optional" to "bus->number == 1 if
> > DT endpoint is missing" (if that's even what it means).  Help me
> > connect the dots here.
> The brcm_pcie_add_bus() function returned immediately and skipped linkup
> when (!dev->of_node). That clause was removed from that function, which
> is the true fix for the regression,  but you can see thiscondition
> is still tested in pci_subdev_regulators_add_bus().
>
> I added the "busno != 1" as an added precaution,
> as the brcmstb RC driver only cares about pcie linkup and turning on
> regulators when busno==1.
>
> If this regulator mechanism becomes a feature any RC driver may use --
> as it was in
> v8 of the original patch but was moved to pcie-brcamstb only to avoid conflicts
> with Pali's upcoming RC functionality improvements -- I would probably consider
> removing the busno==1 clause.
>
> Regards and thanks,
> Jim Quinlan
> Broadcom S
>
>
> >
> > I *guess* this is really saying "we only want to bring the link up for
> > RPs"?
> >
> > And "bus->number == 1" assumes the RP is on bus 0, there's only one
> > RP, and that RP's secondary bus is 1?  So it's only in that case
> > (we're adding the secondary bus of the RP), that we need to manually
> > bring up the link?
> >
> > >               return 0;
> > >
> > >       ret = pci_subdev_regulators_add_bus(bus);
> > > -     if (ret)
> > > +     if (ret) {
> > > +             pcie->refusal_mode = true;
> >
> > Is this related?  It doesn't *look* related to making the DT endpoint
> > optional.
> >
> > >               return ret;
> > > +     }
> > >
> > >       /* Grab the regulators for suspend/resume */
> > >       pcie->sr = bus->dev.driver_data;
> > >
> > > base-commit: ef1302160bfb19f804451d0e919266703501c875
> > > prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
> > > prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
> > > prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
> > > prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
> > > prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
> > > prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
> > > prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
> > > prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
> > > --
> > > 2.17.1
> > >
Bjorn Helgaas May 21, 2022, 4:43 p.m. UTC | #6
[+cc Rafael, linux-pm because I think there are interesting power
management questions here]

On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> endpoint node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT endpoint subnode to be missing.  This is
> important for platforms like the CM4 which have a standard PCIe socket and
> the endpoint device is unknown.

I think the problem here is that on the CM, we try to enumerate
devices that are not powered up, isn't it?  The commit log should say
something about that power situation and how the driver learns about
the power regulators instead of just pointing at an DT endpoint node.

I guess the intent of this patch is to turn on the power to downstream
devices before enumerating them?  What happens if we turn on the power
but don't find any downstream devices?  From looking at the code, I
assume we just leave the power on.  Maybe that's what you want, I
dunno.

I added Rafael because this seems vaguely similar to runtime power
management, and if we can integrate with that somehow, I'd sure like
to avoid building a parallel infrastructure for it.

The current path we're on is to move some of this code that's
currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
hesitant about that because ACPI does just fine without it.  If we're
adding new DT functionality that could not be implemented via ACPI,
that's one thing.  But I'm not convinced this is that new.

That's a longer term question.  In the short term we need to fix the
regression.  More specifics about that below.

[0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> 
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..adca74e235cb 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>  
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -	struct device *dev = &bus->dev;
>  	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
>  	int ret;
>  
> -	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +	/* Only busno==1 requires us to linkup */
> +	if ((int)bus->number != 1)
>  		return 0;
>  
>  	ret = pci_subdev_regulators_add_bus(bus);
> -	if (ret)
> +	if (ret) {
> +		pcie->refusal_mode = true;
>  		return ret;
> +	}
>  
>  	/* Grab the regulators for suspend/resume */
>  	pcie->sr = bus->dev.driver_data;

IIUC, this path:

  pci_alloc_child_bus
    brcm_pcie_add_bus                   # .add_bus method
      pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
        alloc_subdev_regulators         # in pcie-brcmstb.c for now
        regulator_bulk_get
        regulator_bulk_enable
      brcm_pcie_linkup			# bring link up

is basically so we can leave power to downstream devices off, then
turn it on when we're ready to enumerate those downstream devices.

I think the brcmstb root bus is always bus 0, it only has a single
Root Port on the root bus, and it always leads to bus 1, so it sort of
makes sense that we only need to turn on power when we're about to
scan "bus->number == 1".

But this power management seems like a pattern that other controllers
will use.  Other controllers will have several Root Ports, so checking
the bus number won't work for them.  Instead of checking the bus
number, I think brcmstb should check more directly for a power
regulator.

Tangent 1: I think this means a downstream device goes from D3cold to
D0uninitialized?  Does this code account for the required delays
accesses to the device?  I see some in brcm_pcie_linkup(), but I don't
see anything that looks like Tpvperl (the time PERST# must remain
asserted after power becomes valid) or Tperst (when asserted, PERST#
must remain asserted at least this long) (both from PCIe r6.0, sec
6.6.1).

Tangent 2: "brcm_pcie_link_up()" makes sense -- it's the conventional
name for the simple boolean function that tells us whether the link is
up.  "brcm_pcie_linkup()", which *brings* the link up, is confusing
because it's too similar to "brcm_pcie_link_up()".  The conventional
name for this would be "brcm_pcie_start_link()".

Tangent 3: There are fewer than 20 forward function declarations in
drivers/pci/controller/, and 9 of them are in pcie-brcmstb.c.  It's a
lot easier to maintain all these drivers if they use a common style.
Generally speaking, Linux code orders function definitions to avoid
the need for forward declarations.

Bjorn
Jim Quinlan May 21, 2022, 6:51 p.m. UTC | #7
On Sat, May 21,
2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Rafael, linux-pm because I think there are interesting power
> management questions here]
>
> On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> >
> > introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> > endpoint node described in [2] was missing, no linkup would be attempted,
> > and subsequent accesses would cause a panic because this particular PCIe HW
> > causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> >
> > We fix this by allowing the DT endpoint subnode to be missing.  This is
> > important for platforms like the CM4 which have a standard PCIe socket and
> > the endpoint device is unknown.
>
> I think the problem here is that on the CM, we try to enumerate
> devices that are not powered up, isn't it?  The commit log should say
> something about that power situation and how the driver learns aboutCONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio
> the power regulators instead of just pointing at an DT endpoint node.
Hi Bjorn,

This is incorrect.  The regression occurred because the code mistakenly
skips PCIe-linkup if the PCI portdrv DT node  does not exist. With our
RC HW, doing a config space access to bus 1  w/o first linking up results
in a  CPU abort.  This regression has nothing to do with EP power at all.

The RPi does not use the "PCIe regulator" feature of my original patchset.
It is currently used only by our STB and Cable Modem  products.

>
> I guess the intent of this patch is to turn on the power to downstream
> devices before enumerating them?
Are you referring to my original patchset or the one I just submitted?
 If the former,
yes.  If the latter, no.

>  What happens if we turn on the power
> but don't find any downstream devices?
They are turned off to conserve power.

> From looking at the code, I
> assume we just leave the power on.  Maybe that's what you want, I
> dunno.
For STB and Cable Modem products we do not leave the power on.  In
fact, our Cable
Modem group was the first to request this feature.   It appears that the RPi CM4
always keeps endpoint power on but I do not know for sure.

>
> I added Rafael because this seems vaguely similar to runtime power
> management, and if we can integrate with that somehow, I'd sure like
> to avoid building a parallel infrastructure for it.
>
> The current path we're on is to move some of this code that's
> currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> hesitant about that because ACPI does just fine without it.  If we're
> adding new DT functionality that could not be implemented via ACPI,
> that's one thing.  But I'm not convinced this is that new.
AFAICT, Broadcom STB and Cable Modem products do not have/use/want ACPI.
We are fine with keeping this "PCIe regulator" feature private to our driver and
giving you speedy and full support in maintaining it.

> That's a longer term question.  In the short term we need to fix the
> regression.  More specifics about that below.
>
> [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com
>
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> >
> > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > ---
> >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > index ba5c120816b2..adca74e235cb 100644
> > --- a/drivers/pci/controller/pcie-brcmstb.c
> > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> >
> >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> >  {
> > -     struct device *dev = &bus->dev;
> >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> >       int ret;
> >
> > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > +     /* Only busno==1 requires us to linkup */
> > +     if ((int)bus->number != 1)
> >               return 0;
> >
> >       ret = pci_subdev_regulators_add_bus(bus);
> > -     if (ret)
> > +     if (ret) {
> > +             pcie->refusal_mode = true;
> >               return ret;
> > +     }
> >
> >       /* Grab the regulators for suspend/resume */
> >       pcie->sr = bus->dev.driver_data;
>
> IIUC, this path:
>
>   pci_alloc_child_bus
>     brcm_pcie_add_bus                   # .add_bus method
>       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
>         alloc_subdev_regulators         # in pcie-brcmstb.c for now
>         regulator_bulk_get
>         regulator_bulk_enable
>       brcm_pcie_linkup                  # bring link up
>
> is basically so we can leave power to downstream devices off, then
> turn it on when we're ready to enumerate those downstream devices.
Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
for the endpoint
driver to turn on its own regulators, but even to know which endpoint
driver to probe
we must turn on the regulator to establish linkup.

> I think the brcmstb root bus is always bus 0, it only has a single
> Root Port on the root bus, and it always leads to bus 1, so it sort of
> makes sense that we only need to turn on power when we're about to
> scan "bus->number == 1".
Correct.

>
> But this power management seems like a pattern that other controllers
> will use.  Other controllers will have several Root Ports, so checking
> the bus number won't work for them.  Instead of checking the bus
> number, I think brcmstb should check more directly for a power
> regulator.
I agree.  That is why I said that we should consider removing the "busno==1"
conditional if we want this feature for general use.  If you want,
I can submit a V2 that removes this conditional.

I'm guessing here but I think the Rockchip folks could use this "pcie
regulator"  feature.
They got regulator DT properties in their PCIe RC DT node upstreamed
but we were denied for trying the same approach.

>
> Tangent 1: I think this means a downstream device goes from D3cold to
> D0uninitialized?  Does this code account for the required delays
> accesses to the device?  I see some in brcm_pcie_linkup(), but I don't
> see anything that looks like Tpvperl (the time PERST# must remain
> asserted after power becomes valid) or Tperst (when asserted, PERST#
> must remain asserted at least this long) (both from PCIe r6.0, sec
> 6.6.1).
I have a series of patches coming up that address some of these concerns.
Can we please take this up then but allow us to escape "revert jail" first?
I promise I will copy your tangents and address all of them with the
future patchset.

>
> Tangent 2: "brcm_pcie_link_up()" makes sense -- it's the conventional
> name for the simple boolean function that tells us whether the link is
> up.  "brcm_pcie_linkup()", which *brings* the link up, is confusing
> because it's too similar to "brcm_pcie_link_up()".  The conventional
> name for this would be "brcm_pcie_start_link()".
I will fix this in the future patchset.

>
> Tangent 3: There are fewer than 20 forward function declarations in
> drivers/pci/controller/, and 9 of them are in pcie-brcmstb.c.  It's a
> lot easier to maintain all these drivers if they use a common style.
> Generally speaking, Linux code orders function definitions to avoid
> the need for forward declarations.
I will improve the situation in the future patchset

Regards,
Jim Quinlan
Broadcom STB
>
> Bjorn
Bjorn Helgaas May 23, 2022, 10:10 p.m. UTC | #8
On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> On Sat, May 21,
> 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > voltage regulators")
> > >
> > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > PCIe endpoint node described in [2] was missing, no linkup would
> > > be attempted, and subsequent accesses would cause a panic
> > > because this particular PCIe HW causes a CPU abort on illegal
> > > accesses (instead of returning 0xffffffff).
> > >
> > > We fix this by allowing the DT endpoint subnode to be missing.
> > > This is important for platforms like the CM4 which have a
> > > standard PCIe socket and the endpoint device is unknown.
> >
> > I think the problem here is that on the CM, we try to enumerate
> > devices that are not powered up, isn't it?  The commit log should
> > say something about that power situation and how the driver learns
> > about the power regulators instead of just pointing at an DT
> > endpoint node.
> 
> This is incorrect.  The regression occurred because the code
> mistakenly skips PCIe-linkup if the PCI portdrv DT node does not
> exist. With our RC HW, doing a config space access to bus 1 w/o
> first linking up results in a CPU abort.  This regression has
> nothing to do with EP power at all.

OK, I think I'm starting to see, but I'm still missing some things.

67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev
regulators") added pci_subdev_regulators_add_bus() as an .add_bus()
method.  This is called by pci_alloc_child_bus(), and if the DT
describes any regulators for the bridge leading to the new child bus,
we turn them on.

Then 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
regulators") added brcm_pcie_add_bus() and made *it* the .add_bus()
method.  It turns on the regulators and brings the link up, but it
skips both if there's no DT node for the bridge to the new bus.

I guess RPi4 CM has no DT node to describe regulators, so we skip both
turning them on *and* bringing the link up?

But above you say it's the *endpoint* node that doesn't exist.  The
existing code looks like it's checking for the *bridge* node
(bus->dev->of_node).  We haven't even enumerated the devices on the
child bus, so we don't know about them at this point.

What happens if there is a DT node for the bridge, but it doesn't
describe any regulators?  I assume regulator_bulk_get() will fail, and
it looks like that might still keep us from bringing the link up?

I would think that lack of regulator description in the DT would mean
that any regulators are always on and the OS doesn't need to do
anything.

> >  What happens if we turn on the power but don't find any
> >  downstream devices?
>
> They are turned off to conserve power.
> 
> > From looking at the code, I assume we just leave the power on.
> > Maybe that's what you want, I dunno.

> For STB and Cable Modem products we do not leave the power on.  In
> fact, our Cable Modem group was the first to request this feature.
> It appears that the RPi CM4 always keeps endpoint power on but I do
> not know for sure.

I'm confused.  Why can't we tell by looking at pcie-brcmstb.c?  All I
know is what's in pcie-brcmstb.c; I have no idea which things apply to
which products.

The only calls to regulator_bulk_disable() are in
pci_subdev_regulators_remove_bus(), brcm_pcie_suspend(), and
brcm_pcie_resume().  I don't think the fact that enumeration didn't
find any devices necessarily leads to any of those.  What am I
missing?  (This is really a tangent that isn't critical for fixing the
regression.)

> > I added Rafael because this seems vaguely similar to runtime power
> > management, and if we can integrate with that somehow, I'd sure like
> > to avoid building a parallel infrastructure for it.
> >
> > The current path we're on is to move some of this code that's
> > currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> > hesitant about that because ACPI does just fine without it.  If we're
> > adding new DT functionality that could not be implemented via ACPI,
> > that's one thing.  But I'm not convinced this is that new.
>
> AFAICT, Broadcom STB and Cable Modem products do not have/use/want
> ACPI.  We are fine with keeping this "PCIe regulator" feature
> private to our driver and giving you speedy and full support in
> maintaining it.

I don't mean that you should use ACPI, only that ACPI platforms can do
this sort of power control using the existing PCI core infrastructure,
and maybe there's a way for OF/DT platforms to hook into that same
infrastructure to minimize the driver-specific work.  E.g., maybe
there's a way to extend platform_pci_set_power_state() and similar to
manage these regulators.

> > [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com
> >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > >
> > > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > > ---
> > >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > > index ba5c120816b2..adca74e235cb 100644
> > > --- a/drivers/pci/controller/pcie-brcmstb.c
> > > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> > >
> > >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> > >  {
> > > -     struct device *dev = &bus->dev;
> > >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> > >       int ret;
> > >
> > > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > > +     /* Only busno==1 requires us to linkup */
> > > +     if ((int)bus->number != 1)
> > >               return 0;
> > >
> > >       ret = pci_subdev_regulators_add_bus(bus);
> > > -     if (ret)
> > > +     if (ret) {
> > > +             pcie->refusal_mode = true;
> > >               return ret;
> > > +     }
> > >
> > >       /* Grab the regulators for suspend/resume */
> > >       pcie->sr = bus->dev.driver_data;
> >
> > IIUC, this path:
> >
> >   pci_alloc_child_bus
> >     brcm_pcie_add_bus                   # .add_bus method
> >       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
> >         alloc_subdev_regulators         # in pcie-brcmstb.c for now
> >         regulator_bulk_get
> >         regulator_bulk_enable
> >       brcm_pcie_linkup                  # bring link up
> >
> > is basically so we can leave power to downstream devices off, then
> > turn it on when we're ready to enumerate those downstream devices.
>
> Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
> for the endpoint driver to turn on its own regulators, but even to
> know which endpoint driver to probe we must turn on the regulator to
> establish linkup.

I don't think having an endpoint driver turn on power to its device is
the right goal.  As you said, if the power is off, we don't know
whether there's an endpoint or what it is, so the driver isn't in the
picture (I know sometimes endpoints are described in DT, and that
might be needed for non-discoverable properties, but I don't think
it's a good way to *enumerate* the device).

I don't know much about ACPI power management, but I kind of think it
turns on power to *everything* initially so we can enumerate all the
devices (Rafael or others, please correct me!)  After enumeration, we
can turn off devices we don't need, and the power management framework 
already supports turning devices on again when we use them.

> > I think the brcmstb root bus is always bus 0, it only has a single
> > Root Port on the root bus, and it always leads to bus 1, so it sort of
> > makes sense that we only need to turn on power when we're about to
> > scan "bus->number == 1".
>
> Correct.
> 
> > But this power management seems like a pattern that other
> > controllers will use.  Other controllers will have several Root
> > Ports, so checking the bus number won't work for them.  Instead of
> > checking the bus number, I think brcmstb should check more
> > directly for a power regulator.
>
> I agree.  That is why I said that we should consider removing the
> "busno==1" conditional if we want this feature for general use.  If
> you want, I can submit a V2 that removes this conditional.

If it's as easy as dropping a needlessly platform-dependent check, we
should absolutely do it.  Are you saying the patch would just become
this?

> I have a series of patches coming up that address some of these concerns.
> Can we please take this up then but allow us to escape "revert jail" first?

Of course.  That's why I labeled these "tangents"; they're just things
for future work that I noticed and didn't want to forget.

Bjorn
Jim Quinlan May 24, 2022, 4:54 p.m. UTC | #9
On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > On Sat, May 21,
> > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > > voltage regulators")
> > > >
> > > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > > PCIe endpoint node described in [2] was missing, no linkup would
> > > > be attempted, and subsequent accesses would cause a panic
> > > > because this particular PCIe HW causes a CPU abort on illegal
> > > > accesses (instead of returning 0xffffffff).
> > > >
> > > > We fix this by allowing the DT endpoint subnode to be missing.
> > > > This is important for platforms like the CM4 which have a
> > > > standard PCIe socket and the endpoint device is unknown.
> > >
> > > I think the problem here is that on the CM, we try to enumerate
> > > devices that are not powered up, isn't it?  The commit log should
> > > say something about that power situation and how the driver learns
> > > about the power regulators instead of just pointing at an DT
> > > endpoint node.
> >
> > This is incorrect.  The regression occurred because the code
> > mistakenly skips PCIe-linkup if the PCI portdrv DT node does not
> > exist. With our RC HW, doing a config space access to bus 1 w/o
> > first linking up results in a CPU abort.  This regression has
> > nothing to do with EP power at all.
>
> OK, I think I'm starting to see, but I'm still missing some things.
>
> 67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev
> regulators") added pci_subdev_regulators_add_bus() as an .add_bus()
> method.  This is called by pci_alloc_child_bus(), and if the DT
> describes any regulators for the bridge leading to the new child bus,
> we turn them on.
>
> Then 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
> regulators") added brcm_pcie_add_bus() and made *it* the .add_bus()
> method.  It turns on the regulators and brings the link up, but it
> skips both if there's no DT node for the bridge to the new bus.

Hi Bjorn,

Yes, I meant it to skip the turning on of the regulators if the DT
node was missing
but I failed to notice that it would also skip the pcie linkup as well.  As you
may have guessed, all of my test systems have the PCIe root port
DT node.

>
> I guess RPi4 CM has no DT node to describe regulators, so we skip both
> turning them on *and* bringing the link up?

Yes. One repo did not have this node (Cyril/debina?), one did
(https://github.com/raspberrypi/firmware/tree/master/boot).
Of course there is nothing wrong with omitting the node; it should
have pcie linkup regardless.

>
> But above you say it's the *endpoint* node that doesn't exist.  The
> existing code looks like it's checking for the *bridge* node
> (bus->dev->of_node).  We haven't even enumerated the devices on the
> child bus, so we don't know about them at this point.
You are absolutely correct and I must change the commit message
to say the "root port DT node".   I'm sorry; this mistake likely did not
help you understand the fix. :-(

>
> What happens if there is a DT node for the bridge, but it doesn't
> describe any regulators?  I assume regulator_bulk_get() will fail, and
> it looks like that might still keep us from bringing the link up?
The regulator_bulk_get()  func does not fail if the regulators are not
present.  Instead it "gets"
a dummy device and issues a warning per missing regulator.
A version of my pullreq submitted code to prescan the DT node and call
regulator_bulk_get() with
only the names of the regulators present, but IIRC this was NAKd.
Hopefully I will not be swamped with RPi developers'  emails when they
think these warnings are an issue.

>
> I would think that lack of regulator description in the DT would mean
> that any regulators are always on and the OS doesn't need to do
> anything.pci_subdev_regulators_remove_bus
I agree.

>
> > >  What happens if we turn on the power but don't find any
> > >  downstream devices?
> >
> > They are turned off to conserve power.pci_subdev_regulators_remove_bus
> >
> > > From looking at the code, I assume we just leave the power on.
> > > Maybe that's what you want, I dunno.
>
> > For STB and Cable Modem products we do not leave the power on.  In
> > fact, our Cable Modem group was the first to request this feature.
> > It appears that the RPi CM4 always keeps endpoint power on but I do
> > not know for sure.
>
> I'm confused.  Why can't we tell by looking at pcie-brcmstb.c?  All I
> know is what's in pcie-brcmstb.c; I have no idea which things apply to
> which products.

I was just adding  background information but I see that I really
didn't answer your
question.   Allow me another chance:

When brcm_pcie_add_bus() is invoked, we will "get" and enable any
regulators that are present in the DT node.  If the busno==1, we will
will also attempt pcie-linkup.  If PCIe linkup fails, which can happen for
multiple reasons but most due to a  missing device, we turn
on "refusal" mode to prevent our unforgiving PCIe HW from causing an
abort on any subsequent PCIe config-space accesses.  Further,
a failed linkup will have brcm_pcie_probe() stopping and removing
the root bus, which in turn invokes  brcm_pcie_remove_bus() (actually
named pci_subdev_regulators_remove_bus() as it may someday find its
way into bus.c), which invokes regulator_bulk_disable() on any
regulators that were enabled by the probe.

>
> The only calls to regulator_bulk_disable() are in
> pci_subdev_regulators_remove_bus(), brcm_pcie_suspend(), and
> brcm_pcie_resume().  I don't think the fact that enumeration didn't
> find any devices necessarily leads to any of those.  What am I
> missing?  (This is really a tangent that isn't critical for fixing the
> regression.)
If there was no linkup during the probe, the probe follows this
path before it returns with error:

brcm_pcie_probe()
    brcm_pcie_remove()
        pci_stop_root_bus();
        pci_remove_root_bus();
            pci_remove_bus_device()
                pci_remove_bus_device()
                    pci_subdev_regulators_remove_bus()
                        regulator_bulk_disable()


>
> > > I added Rafael because this seems vaguely similar to runtime power
> > > management, and if we can integrate with that somehow, I'd sure like
> > > to avoid building a parallel infrastructure for it.
> > >
> > > The current path we're on is to move some of this code that's
> > > currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> > > hesitant about that because ACPI does just fine without it.  If we're
> > > adding new DT functionality that could not be implemented via ACPI,
> > > that's one thing.  But I'm not convinced this is that new.
> >
> > AFAICT, Broadcom STB and Cable Modem products do not have/use/want
> > ACPI.  We are fine with keeping this "PCIe regulator" feature
> > private to our driver and giving you speedy and full support in
> > maintaining it.
>
> I don't mean that you should use ACPI, only that ACPI platforms can do
> this sort of power control using the existing PCI core infrastructure,
> and maybe there's a way for OF/DT platforms to hook into that same
> infrastructure to minimize the driver-specific work.  E.g., maybe
> there's a way to extend platform_pci_set_power_state() and similar to
> manage these regulators.

Got it.

Unless you object, I plan on sending you a v2 of my regression fix which will
correct the commit message, change the "if (busno == 1)" conditional to
only guard the pcie linkup call, and add further comments.

I have noted and will also address your other concerns and suggestions
in a future
patchset as I think it is best that I get my hands on a CM4 board
before I submit
any more changes.

Kind Regards,
Jim Quinlan
Broadcom STB
>
> > > [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com
> > >
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > > >
> > > > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > > > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > > > ---
> > > >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > > > index ba5c120816b2..adca74e235cb 100644
> > > > --- a/drivers/pci/controller/pcie-brcmstb.c
> > > > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > > > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> > > >
> > > >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> > > >  {
> > > > -     struct device *dev = &bus->dev;
> > > >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> > > >       int ret;
> > > >
> > > > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > > > +     /* Only busno==1 requires us to linkup */
> > > > +     if ((int)bus->number != 1)
> > > >               return 0;
> > > >
> > > >       ret = pci_subdev_regulators_add_bus(bus);
> > > > -     if (ret)
> > > > +     if (ret) {
> > > > +             pcie->refusal_mode = true;
> > > >               return ret;
> > > > +     }
> > > >
> > > >       /* Grab the regulators for suspend/resume */
> > > >       pcie->sr = bus->dev.driver_data;
> > >
> > > IIUC, this path:
> > >
> > >   pci_alloc_child_bus
> > >     brcm_pcie_add_bus                   # .add_bus method
> > >       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
> > >         alloc_subdev_regulators         # in pcie-brcmstb.c for now
> > >         regulator_bulk_get
> > >         regulator_bulk_enable
> > >       brcm_pcie_linkup                  # bring link up
> > >
> > > is basically so we can leave power to downstream devices off, then
> > > turn it on when we're ready to enumerate those downstream devices.
> >
> > Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
> > for the endpoint driver to turn on its own regulators, but even to
> > know which endpoint driver to probe we must turn on the regulator to
> > establish linkup.
>
> I don't think having an endpoint driver turn on power to its device is
> the right goal.  As you said, if the power is off, we don't know
> whether there's an endpoint or what it is, so the driver isn't in the
> picture (I know sometimes endpoints are described in DT, and that
> might be needed for non-discoverable properties, but I don't think
> it's a good way to *enumerate* the device).
>
> I don't know much about ACPI power management, but I kind of think it
> turns on power to *everything* initially so we can enumerate all the
> devices (Rafael or others, please correct me!)  After enumeration, we
> can turn off devices we don't need, and the power management framework
> already supports turning devices on again when we use them.
>
> > > I think the brcmstb root bus is always bus 0, it only has a single
> > > Root Port on the root bus, and it always leads to bus 1, so it sort of
> > > makes sense that we only need to turn on power when we're about to
> > > scan "bus->number == 1".
> >
> > Correct.
> >
> > > But this power management seems like a pattern that other
> > > controllers will use.  Other controllers will have several Root
> > > Ports, so checking the bus number won't work for them.  Instead of
> > > checking the bus number, I think brcmstb should check more
> > > directly for a power regulator.
> >
> > I agree.  That is why I said that we should consider removing the
> > "busno==1" conditional if we want this feature for general use.  If
> > you want, I can submit a V2 that removes this conditional.
>
> If it's as easy as dropping a needlessly platform-dependent check, we
> should absolutely do it.  Are you saying the patch would just become
> this?
>
> > I have a series of patches coming up that address some of these concerns.
> > Can we please take this up then but allow us to escape "revert jail" first?
>
> Of course.  That's why I labeled these "tangents"; they're just things
> for future work that I noticed and didn't want to forget.
>
> Bjorn
Cyril Brulebois May 24, 2022, 11:56 p.m. UTC | #10
Hi Jim,

Jim Quinlan <jim2101024@gmail.com> (2022-05-24):
> Yes. One repo did not have this node (Cyril/debina?), one did
> (https://github.com/raspberrypi/firmware/tree/master/boot).
> Of course there is nothing wrong with omitting the node; it should
> have pcie linkup regardless.

I work/debug stuff on Debian systems, but Debian's just shipping what's
in mainline. Raspberry people maintain their own vendor DTBs.

> Unless you object, I plan on sending you a v2 of my regression fix
> which will correct the commit message, change the "if (busno == 1)"
> conditional to only guard the pcie linkup call, and add further
> comments.
> 
> I have noted and will also address your other concerns and suggestions
> in a future patchset as I think it is best that I get my hands on a
> CM4 board before I submit any more changes.

For the record, I'm still happy to be cc'ed so that I spend time testing
further patches, be it the short-term regression fix (for inclusion in
master, but also checking it fixes linux-5.17.y and now linux-5.18.y,
if stable maintainers would welcome the extra testing), or the future
patchset.

I can't guarantee you'll have an answer in a few hours like that
happened during the past few days (during which I prioritized testing
over anything else so as not to be a blocker in case it could be
squeezed into v5.18). But I'm still willing to allocate some time to
make sure the CM4 keeps working, even if you don't get your hands on
such systems right away.


Cheers,
Stefan Wahren May 25, 2022, 7:21 a.m. UTC | #11
Hi Jim,

Am 24.05.22 um 18:54 schrieb Jim Quinlan:
> On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
>>> On Sat, May 21,
>>> 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
>>> at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
>>>>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
>>>>> voltage regulators")
>>>>>
>>>>> introduced a regression on the PCIe RPi4 Compute Module.  If the
>>>>> PCIe endpoint node described in [2] was missing, no linkup would
>>>>> be attempted, and subsequent accesses would cause a panic
>>>>> because this particular PCIe HW causes a CPU abort on illegal
>>>>> accesses (instead of returning 0xffffffff).
>>>>>
>>>>> We fix this by allowing the DT endpoint subnode to be missing.
>>>>> This is important for platforms like the CM4 which have a
>>>>> standard PCIe socket and the endpoint device is unknown.
>>>> I think the problem here is that on the CM, we try to enumerate
>>>> devices that are not powered up, isn't it?  The commit log should
>>>> say something about that power situation and how the driver learns
>>>> about the power regulators instead of just pointing at an DT
>>>> endpoint node.
>>> This is incorrect.  The regression occurred because the code
>>> mistakenly skips PCIe-linkup if the PCI portdrv DT node does not
>>> exist. With our RC HW, doing a config space access to bus 1 w/o
>>> first linking up results in a CPU abort.  This regression has
>>> nothing to do with EP power at all.
>> OK, I think I'm starting to see, but I'm still missing some things.
>>
>> 67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev
>> regulators") added pci_subdev_regulators_add_bus() as an .add_bus()
>> method.  This is called by pci_alloc_child_bus(), and if the DT
>> describes any regulators for the bridge leading to the new child bus,
>> we turn them on.
>>
>> Then 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
>> regulators") added brcm_pcie_add_bus() and made *it* the .add_bus()
>> method.  It turns on the regulators and brings the link up, but it
>> skips both if there's no DT node for the bridge to the new bus.
> Hi Bjorn,
>
> Yes, I meant it to skip the turning on of the regulators if the DT
> node was missing
> but I failed to notice that it would also skip the pcie linkup as well.  As you
> may have guessed, all of my test systems have the PCIe root port
> DT node.
>
>> I guess RPi4 CM has no DT node to describe regulators, so we skip both
>> turning them on *and* bringing the link up?
> Yes. One repo did not have this node (Cyril/debina?), one did
> (https://github.com/raspberrypi/firmware/tree/master/boot).
> Of course there is nothing wrong with omitting the node; it should
> have pcie linkup regardless.
Please ignore the vendor tree, because you only have to care about 
mainline kernel and DT here.
>
>> But above you say it's the *endpoint* node that doesn't exist.  The
>> existing code looks like it's checking for the *bridge* node
>> (bus->dev->of_node).  We haven't even enumerated the devices on the
>> child bus, so we don't know about them at this point.
> You are absolutely correct and I must change the commit message
> to say the "root port DT node".   I'm sorry; this mistake likely did not
> help you understand the fix. :-(
>
>> What happens if there is a DT node for the bridge, but it doesn't
>> describe any regulators?  I assume regulator_bulk_get() will fail, and
>> it looks like that might still keep us from bringing the link up?
> The regulator_bulk_get()  func does not fail if the regulators are not
> present.  Instead it "gets"
> a dummy device and issues a warning per missing regulator.
> A version of my pullreq submitted code to prescan the DT node and call
> regulator_bulk_get() with
> only the names of the regulators present, but IIRC this was NAKd.
> Hopefully I will not be swamped with RPi developers'  emails when they
> think these warnings are an issue.

This won't be the first driver complaining about missing regulators and 
won't be the last one. So don't expect an email from me ;-)

Best regards
Jim Quinlan May 25, 2022, 5:13 p.m. UTC | #12
On Tue, May 24, 2022 at 7:56 PM Cyril Brulebois <kibi@debian.org> wrote:
>
> Hi Jim,
>
> Jim Quinlan <jim2101024@gmail.com> (2022-05-24):
> > Yes. One repo did not have this node (Cyril/debina?), one did
> > (https://github.com/raspberrypi/firmware/tree/master/boot).
> > Of course there is nothing wrong with omitting the node; it should
> > have pcie linkup regardless.
>
> I work/debug stuff on Debian systems, but Debian's just shipping what's
> in mainline. Raspberry people maintain their own vendor DTBs.
>
> > Unless you object, I plan on sending you a v2 of my regression fix
> > which will correct the commit message, change the "if (busno == 1)"
> > conditional to only guard the pcie linkup call, and add further
> > comments.
> >
> > I have noted and will also address your other concerns and suggestions
> > in a future patchset as I think it is best that I get my hands on a
> > CM4 board before I submit any more changes.
>
> For the record, I'm still happy to be cc'ed so that I spend time testing
> further patches, be it the short-term regression fix (for inclusion in
> master, but also checking it fixes linux-5.17.y and now linux-5.18.y,
> if stable maintainers would welcome the extra testing), or the future
> patchset.
>
> I can't guarantee you'll have an answer in a few hours like that
> happened during the past few days (during which I prioritized testing
> over anything else so as not to be a blocker in case it could be
> squeezed into v5.18). But I'm still willing to allocate some time to
> make sure the CM4 keeps working, even if you don't get your hands on
> such systems right away.

I really appreciate the help, thank you.  We have ordered a CM4 and I
will be testing on  it prior to sending pullreqs.

Regards,
Jim Quinlan
Broadcom ST


>
>
> Cheers,
> --
> Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant
Jim Quinlan May 25, 2022, 5:24 p.m. UTC | #13
On Wed, May 25, 2022 at 3:21 AM Stefan Wahren <stefan.wahren@i2se.com> wrote:
>
> Hi Jim,
>
> Am 24.05.22 um 18:54 schrieb Jim Quinlan:
> > On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> >>> On Sat, May 21,
> >>> 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> >>> at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>>> On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> >>>>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> >>>>> voltage regulators")
> >>>>>
> >>>>> introduced a regression on the PCIe RPi4 Compute Module.  If the
> >>>>> PCIe endpoint node described in [2] was missing, no linkup would
> >>>>> be attempted, and subsequent accesses would cause a panic
> >>>>> because this particular PCIe HW causes a CPU abort on illegal
> >>>>> accesses (instead of returning 0xffffffff).
> >>>>>
> >>>>> We fix this by allowing the DT endpoint subnode to be missing.
> >>>>> This is important for platforms like the CM4 which have a
> >>>>> standard PCIe socket and the endpoint device is unknown.
> >>>> I think the problem here is that on the CM, we try to enumerate
> >>>> devices that are not powered up, isn't it?  The commit log should
> >>>> say something about that power situation and how the driver learns
> >>>> about the power regulators instead of just pointing at an DT
> >>>> endpoint node.
> >>> This is incorrect.  The regression occurred because the code
> >>> mistakenly skips PCIe-linkup if the PCI portdrv DT node does not
> >>> exist. With our RC HW, doing a config space access to bus 1 w/o
> >>> first linking up results in a CPU abort.  This regression has
> >>> nothing to do with EP power at all.
> >> OK, I think I'm starting to see, but I'm still missing some things.
> >>
> >> 67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev
> >> regulators") added pci_subdev_regulators_add_bus() as an .add_bus()
> >> method.  This is called by pci_alloc_child_bus(), and if the DT
> >> describes any regulators for the bridge leading to the new child bus,
> >> we turn them on.
> >>
> >> Then 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
> >> regulators") added brcm_pcie_add_bus() and made *it* the .add_bus()
> >> method.  It turns on the regulators and brings the link up, but it
> >> skips both if there's no DT node for the bridge to the new bus.
> > Hi Bjorn,
> >
> > Yes, I meant it to skip the turning on of the regulators if the DT
> > node was missing
> > but I failed to notice that it would also skip the pcie linkup as well.  As you
> > may have guessed, all of my test systems have the PCIe root port
> > DT node.
> >
> >> I guess RPi4 CM has no DT node to describe regulators, so we skip both
> >> turning them on *and* bringing the link up?
> > Yes. One repo did not have this node (Cyril/debina?), one did
> > (https://github.com/raspberrypi/firmware/tree/master/boot).
> > Of course there is nothing wrong with omitting the node; it should
> > have pcie linkup regardless.
> Please ignore the vendor tree, because you only have to care about
> mainline kernel and DT here.
Okay, good to know.

> >
> >> But above you say it's the *endpoint* node that doesn't exist.  The
> >> existing code looks like it's checking for the *bridge* node
> >> (bus->dev->of_node).  We haven't even enumerated the devices on the
> >> child bus, so we don't know about them at this point.
> > You are absolutely correct and I must change the commit message
> > to say the "root port DT node".   I'm sorry; this mistake likely did not
> > help you understand the fix. :-(
> >
> >> What happens if there is a DT node for the bridge, but it doesn't
> >> describe any regulators?  I assume regulator_bulk_get() will fail, and
> >> it looks like that might still keep us from bringing the link up?
> > The regulator_bulk_get()  func does not fail if the regulators are not
> > present.  Instead it "gets"
> > a dummy device and issues a warning per missing regulator.
> > A version of my pullreq submitted code to prescan the DT node and call
> > regulator_bulk_get() with
> > only the names of the regulators present, but IIRC this was NAKd.
> > Hopefully I will not be swamped with RPi developers'  emails when they
> > think these warnings are an issue.
>
> This won't be the first driver complaining about missing regulators and
> won't be the last one. So don't expect an email from me ;-)
Perhaps I complain too much :-)

Cheers,
Jim Quinlan
Broadcom STB
>
> Best regards
>
Bjorn Helgaas May 25, 2022, 9:57 p.m. UTC | #14
On Tue, May 24, 2022 at 12:54:48PM -0400, Jim Quinlan wrote:
> On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > > On Sat, May 21,
> > > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > > > voltage regulators")
> > > > >
> > > > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > > > PCIe endpoint node described in [2] was missing, no linkup would
> > > > > be attempted, and subsequent accesses would cause a panic
> > > > > because this particular PCIe HW causes a CPU abort on illegal
> > > > > accesses (instead of returning 0xffffffff).
> > > > >
> > > > > We fix this by allowing the DT endpoint subnode to be missing.
> > > > > This is important for platforms like the CM4 which have a
> > > > > standard PCIe socket and the endpoint device is unknown.

> > But above you say it's the *endpoint* node that doesn't exist.  The
> > existing code looks like it's checking for the *bridge* node
> > (bus->dev->of_node).  We haven't even enumerated the devices on the
> > child bus, so we don't know about them at this point.
>
> You are absolutely correct and I must change the commit message
> to say the "root port DT node".   I'm sorry; this mistake likely did not
> help you understand the fix. :-(

Great, that will help me out!  I think including the relevant DT
snippet would also make it more concrete and might conceivably be
helpful to somebody working around it on a kernel without the fix.

> > What happens if there is a DT node for the bridge, but it doesn't
> > describe any regulators?  I assume regulator_bulk_get() will fail, and
> > it looks like that might still keep us from bringing the link up?
>
> The regulator_bulk_get() func does not fail if the regulators are
> not present.  Instead it "gets" a dummy device and issues a warning
> per missing regulator.  A version of my pullreq submitted code to
> prescan the DT node and call regulator_bulk_get() with only the
> names of the regulators present, but IIRC this was NAKd.  Hopefully
> I will not be swamped with RPi developers' emails when they think
> these warnings are an issue.

Ah, I see, this is the IS_ERR (but not -ENODEV) NORMAL_GET case in
_regulator_get().  You might get some emails, but I guess it must be a
fairly common situation :)

> > > >  What happens if we turn on the power but don't find any
> > > >  downstream devices?
> > >
> > > They are turned off to conserve power.
> ...

> When brcm_pcie_add_bus() is invoked, we will "get" and enable any
> regulators that are present in the DT node.  If the busno==1, we will
> will also attempt pcie-linkup.  If PCIe linkup fails, which can happen for
> multiple reasons but most due to a  missing device, we turn
> on "refusal" mode to prevent our unforgiving PCIe HW from causing an
> abort on any subsequent PCIe config-space accesses.

> Further, a failed linkup will have brcm_pcie_probe() stopping and
> removing the root bus, which in turn invokes  brcm_pcie_remove_bus()
> (actually named pci_subdev_regulators_remove_bus() as it may someday
> find its way into bus.c), which invokes regulator_bulk_disable() on
> any regulators that were enabled by the probe.

Ah, thanks!  This is the detail I missed.  If pci_host_probe()
succeeds and the link is down, we call brcm_pcie_remove() (the
driver's .remove() method).  That's unusual and possibly unique among
native host bridge drivers.  I'm not sure that's the best pattern
here.  Most drivers can't do that because they expect multiple devices
on the root bus.  And the Root Port is still a functional device on
its own, even if its link is down.  Users likely expect to see it in
lspci and manipulate it via setpci.  It may have AER logs with clues
about why the link didn't come up.

Again something for future discussion, not for this regression.

> Unless you object, I plan on sending you a v2 of my regression fix
> which will correct the commit message, change the "if (busno == 1)"
> conditional to only guard the pcie linkup call, and add further
> comments.

I don't really *like* comparing "busno == 1" because the root bus
number is programmable on most devices.  It would be more obvious if
we could test for a Root Port directly.  But maybe it's the best we
can do for now.

Someday it seems like we should figure out how to make the PCI core
smart enough to turn on any regulators for devices below the Root Port
when we put the Root Port in D0.  But I don't know how to do that or
even whether it's feasible.

Bjorn
Rob Herring May 26, 2022, 7:25 p.m. UTC | #15
On Mon, May 23, 2022 at 05:10:36PM -0500, Bjorn Helgaas wrote:
> On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > On Sat, May 21,
> > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > > voltage regulators")
> > > >
> > > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > > PCIe endpoint node described in [2] was missing, no linkup would
> > > > be attempted, and subsequent accesses would cause a panic
> > > > because this particular PCIe HW causes a CPU abort on illegal
> > > > accesses (instead of returning 0xffffffff).
> > > >
> > > > We fix this by allowing the DT endpoint subnode to be missing.
> > > > This is important for platforms like the CM4 which have a
> > > > standard PCIe socket and the endpoint device is unknown.
> > >
> > > I think the problem here is that on the CM, we try to enumerate
> > > devices that are not powered up, isn't it?  The commit log should
> > > say something about that power situation and how the driver learns
> > > about the power regulators instead of just pointing at an DT
> > > endpoint node.
> > 
> > This is incorrect.  The regression occurred because the code
> > mistakenly skips PCIe-linkup if the PCI portdrv DT node does not
> > exist. With our RC HW, doing a config space access to bus 1 w/o
> > first linking up results in a CPU abort.  This regression has
> > nothing to do with EP power at all.
> 
> OK, I think I'm starting to see, but I'm still missing some things.
> 
> 67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev
> regulators") added pci_subdev_regulators_add_bus() as an .add_bus()
> method.  This is called by pci_alloc_child_bus(), and if the DT
> describes any regulators for the bridge leading to the new child bus,
> we turn them on.
> 
> Then 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
> regulators") added brcm_pcie_add_bus() and made *it* the .add_bus()
> method.  It turns on the regulators and brings the link up, but it
> skips both if there's no DT node for the bridge to the new bus.
> 
> I guess RPi4 CM has no DT node to describe regulators, so we skip both
> turning them on *and* bringing the link up?
> 
> But above you say it's the *endpoint* node that doesn't exist.  The
> existing code looks like it's checking for the *bridge* node
> (bus->dev->of_node).  We haven't even enumerated the devices on the
> child bus, so we don't know about them at this point.
> 
> What happens if there is a DT node for the bridge, but it doesn't
> describe any regulators?  I assume regulator_bulk_get() will fail, and
> it looks like that might still keep us from bringing the link up?
> 
> I would think that lack of regulator description in the DT would mean
> that any regulators are always on and the OS doesn't need to do
> anything.
> 
> > >  What happens if we turn on the power but don't find any
> > >  downstream devices?
> >
> > They are turned off to conserve power.
> > 
> > > From looking at the code, I assume we just leave the power on.
> > > Maybe that's what you want, I dunno.
> 
> > For STB and Cable Modem products we do not leave the power on.  In
> > fact, our Cable Modem group was the first to request this feature.
> > It appears that the RPi CM4 always keeps endpoint power on but I do
> > not know for sure.
> 
> I'm confused.  Why can't we tell by looking at pcie-brcmstb.c?  All I
> know is what's in pcie-brcmstb.c; I have no idea which things apply to
> which products.
> 
> The only calls to regulator_bulk_disable() are in
> pci_subdev_regulators_remove_bus(), brcm_pcie_suspend(), and
> brcm_pcie_resume().  I don't think the fact that enumeration didn't
> find any devices necessarily leads to any of those.  What am I
> missing?  (This is really a tangent that isn't critical for fixing the
> regression.)
> 
> > > I added Rafael because this seems vaguely similar to runtime power
> > > management, and if we can integrate with that somehow, I'd sure like
> > > to avoid building a parallel infrastructure for it.
> > >
> > > The current path we're on is to move some of this code that's
> > > currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> > > hesitant about that because ACPI does just fine without it.  If we're
> > > adding new DT functionality that could not be implemented via ACPI,
> > > that's one thing.  But I'm not convinced this is that new.
> >
> > AFAICT, Broadcom STB and Cable Modem products do not have/use/want
> > ACPI.  We are fine with keeping this "PCIe regulator" feature
> > private to our driver and giving you speedy and full support in
> > maintaining it.
> 
> I don't mean that you should use ACPI, only that ACPI platforms can do
> this sort of power control using the existing PCI core infrastructure,
> and maybe there's a way for OF/DT platforms to hook into that same
> infrastructure to minimize the driver-specific work.  E.g., maybe
> there's a way to extend platform_pci_set_power_state() and similar to
> manage these regulators.

The big difference is ACPI abstracts how to control power for a device. 
The OS just knows D0, D3, etc. states. For DT, there is no such 
abstraction. You need device specific code to do device specific power 
management.

> > > [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com
> > >
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > > [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> > > >
> > > > Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> > > > Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> > > > Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> > > > ---
> > > >  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > > > index ba5c120816b2..adca74e235cb 100644
> > > > --- a/drivers/pci/controller/pcie-brcmstb.c
> > > > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > > > @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
> > > >
> > > >  static int brcm_pcie_add_bus(struct pci_bus *bus)
> > > >  {
> > > > -     struct device *dev = &bus->dev;
> > > >       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> > > >       int ret;
> > > >
> > > > -     if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> > > > +     /* Only busno==1 requires us to linkup */
> > > > +     if ((int)bus->number != 1)
> > > >               return 0;
> > > >
> > > >       ret = pci_subdev_regulators_add_bus(bus);
> > > > -     if (ret)
> > > > +     if (ret) {
> > > > +             pcie->refusal_mode = true;
> > > >               return ret;
> > > > +     }
> > > >
> > > >       /* Grab the regulators for suspend/resume */
> > > >       pcie->sr = bus->dev.driver_data;
> > >
> > > IIUC, this path:
> > >
> > >   pci_alloc_child_bus
> > >     brcm_pcie_add_bus                   # .add_bus method
> > >       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
> > >         alloc_subdev_regulators         # in pcie-brcmstb.c for now
> > >         regulator_bulk_get
> > >         regulator_bulk_enable
> > >       brcm_pcie_linkup                  # bring link up
> > >
> > > is basically so we can leave power to downstream devices off, then
> > > turn it on when we're ready to enumerate those downstream devices.
> >
> > Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
> > for the endpoint driver to turn on its own regulators, but even to
> > know which endpoint driver to probe we must turn on the regulator to
> > establish linkup.
> 
> I don't think having an endpoint driver turn on power to its device is
> the right goal.  

DT requires device specific code to control a specific device. That 
belongs in the driver for that device.

> As you said, if the power is off, we don't know
> whether there's an endpoint or what it is, so the driver isn't in the
> picture (I know sometimes endpoints are described in DT, and that
> might be needed for non-discoverable properties, but I don't think
> it's a good way to *enumerate* the device).

Well, I don't think there is another way. People have tried to hack 
around this for years. There's the 'generic' bindings with never ending 
properties added one-by-one to try to handle new ctrl or timing needs 
for new device. Or there's the pseudo virtual platform device and driver 
to do power management on the side.

The only way generic control by the upstream (on the bus) device works 
is if there's a standard connector which has standard power supplies, 
reset, etc. which we know the timing for ('cause it's in the PCI spec).


> I don't know much about ACPI power management, but I kind of think it
> turns on power to *everything* initially so we can enumerate all the
> devices (Rafael or others, please correct me!)  After enumeration, we
> can turn off devices we don't need, and the power management framework 
> already supports turning devices on again when we use them.

That's basically requiring the firmware turn on everything before boot. 
Might work for some, but probably not everyone because they can't change 
firmware. There's also the issue that the kernel turns off unused clocks 
and regulators at late_initcall which is a problem if the PCI device's 
driver is a module.

Rob
Bjorn Helgaas May 26, 2022, 8:53 p.m. UTC | #16
On Thu, May 26, 2022 at 02:25:12PM -0500, Rob Herring wrote:
> On Mon, May 23, 2022 at 05:10:36PM -0500, Bjorn Helgaas wrote:
> > On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > > On Sat, May 21,
> > > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:

> > > > I added Rafael because this seems vaguely similar to runtime power
> > > > management, and if we can integrate with that somehow, I'd sure like
> > > > to avoid building a parallel infrastructure for it.
> > > >
> > > > The current path we're on is to move some of this code that's
> > > > currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> > > > hesitant about that because ACPI does just fine without it.  If we're
> > > > adding new DT functionality that could not be implemented via ACPI,
> > > > that's one thing.  But I'm not convinced this is that new.
> > >
> > > AFAICT, Broadcom STB and Cable Modem products do not have/use/want
> > > ACPI.  We are fine with keeping this "PCIe regulator" feature
> > > private to our driver and giving you speedy and full support in
> > > maintaining it.
> > 
> > I don't mean that you should use ACPI, only that ACPI platforms can do
> > this sort of power control using the existing PCI core infrastructure,
> > and maybe there's a way for OF/DT platforms to hook into that same
> > infrastructure to minimize the driver-specific work.  E.g., maybe
> > there's a way to extend platform_pci_set_power_state() and similar to
> > manage these regulators.
> 
> The big difference is ACPI abstracts how to control power for a device. 
> The OS just knows D0, D3, etc. states. For DT, there is no such 
> abstraction. You need device specific code to do device specific power 
> management.

I'm thinking about the PCI side of the host controller, which should
live by the PCI rules.  There are device-specific ways to control
power, clocks, resets, etc on the PCI side, but drivers for PCI
devices (as opposed to drivers for the host controllers) can't really
call that code directly.

There are some exceptions, but generally speaking I don't think PCI
drivers that use generic power management need to use PCI_D0,
PCI_D3hot, etc directly.  Generic PM uses interfaces like
pci_pm_suspend() that keep most of the PCI details in the PCI core
instead of the endpoint driver, e.g., [3].

The PCI core has a bunch of interfaces:

  platform_pci_power_manageable()
  platform_pci_set_power_state()
  platform_pci_get_power_state()
  platform_pci_choose_state()

that currently mostly use ACPI.  So I'm wondering whether there's some
way to extend those platform_*() interfaces to call the native host
controller device-specific power control code via an ops structure.

Otherwise it feels like the native host controller drivers are in a
different world than the generic PM world, and we'll end up with every
host controller driver reimplementing things.

For example, how would we runtime suspend a Root Port and turn off
power for PCI devices below it?  Obviously that requires
device-specific code to control the power.  Do we have some common
interface to it, or do we have to trap config writes to PCI_PM_CTRL or
something?

[3] https://git.kernel.org/linus/cd97b7e0d780

> > > > [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com

> > > > IIUC, this path:
> > > >
> > > >   pci_alloc_child_bus
> > > >     brcm_pcie_add_bus                   # .add_bus method
> > > >       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
> > > >         alloc_subdev_regulators         # in pcie-brcmstb.c for now
> > > >         regulator_bulk_get
> > > >         regulator_bulk_enable
> > > >       brcm_pcie_linkup                  # bring link up
> > > >
> > > > is basically so we can leave power to downstream devices off, then
> > > > turn it on when we're ready to enumerate those downstream devices.
> > >
> > > Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
> > > for the endpoint driver to turn on its own regulators, but even to
> > > know which endpoint driver to probe we must turn on the regulator to
> > > establish linkup.
> > 
> > I don't think having an endpoint driver turn on power to its device is
> > the right goal.  
> 
> DT requires device specific code to control a specific device. That 
> belongs in the driver for that device.

I must be talking about something different than you are.  I see that
brcmstb has device-specific code to control the brcmstb device as well
as power for PCI devices downstream from that device.

When I read "endpoint driver" I think of a PCIe Endpoint device like a
NIC.  That's just a random PCI device, and I read "endpoint driver to
turn on its own regulators" as suggesting that the NIC driver (e1000,
etc) would turn on power to the NIC.  Is that the intent?

Bjorn
Francesco Dolcini May 27, 2022, 6:50 a.m. UTC | #17
On Wed, May 25, 2022 at 04:57:39PM -0500, Bjorn Helgaas wrote:
> On Tue, May 24, 2022 at 12:54:48PM -0400, Jim Quinlan wrote:
> > When brcm_pcie_add_bus() is invoked, we will "get" and enable any
> > regulators that are present in the DT node.  If the busno==1, we will
> > will also attempt pcie-linkup.  If PCIe linkup fails, which can happen for
> > multiple reasons but most due to a  missing device, we turn
> > on "refusal" mode to prevent our unforgiving PCIe HW from causing an
> > abort on any subsequent PCIe config-space accesses.
> 
> > Further, a failed linkup will have brcm_pcie_probe() stopping and
> > removing the root bus, which in turn invokes  brcm_pcie_remove_bus()
> > (actually named pci_subdev_regulators_remove_bus() as it may someday
> > find its way into bus.c), which invokes regulator_bulk_disable() on
> > any regulators that were enabled by the probe.
> 
> Ah, thanks!  This is the detail I missed.  If pci_host_probe()
> succeeds and the link is down, we call brcm_pcie_remove() (the
> driver's .remove() method).  That's unusual and possibly unique among
> native host bridge drivers.  I'm not sure that's the best pattern
> here.  Most drivers can't do that because they expect multiple devices
> on the root bus.  And the Root Port is still a functional device on
> its own, even if its link is down.  Users likely expect to see it in
> lspci and manipulate it via setpci.  It may have AER logs with clues
> about why the link didn't come up.
> 
> Again something for future discussion, not for this regression.

I experienced the same end result, root port not available unless the
link is up during probe, with the imx6 PCI driver and I'm also not
convinced this is the best decision.

I guess one of the reasons for this behavior is to save some power, but
it should be possible to just disable the PCIe root port in the
device tree to handle the use case in which PCIe port is not available
at all on the system.

Francesco
Bjorn Helgaas May 27, 2022, 11:27 p.m. UTC | #18
On Wed, May 25, 2022 at 04:57:39PM -0500, Bjorn Helgaas wrote:
> On Tue, May 24, 2022 at 12:54:48PM -0400, Jim Quinlan wrote:
> > On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > > > On Sat, May 21,
> > > > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > > > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > > > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > > > > voltage regulators")
> > > > > >
> > > > > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > > > > PCIe endpoint node described in [2] was missing, no linkup would
> > > > > > be attempted, and subsequent accesses would cause a panic
> > > > > > because this particular PCIe HW causes a CPU abort on illegal
> > > > > > accesses (instead of returning 0xffffffff).
> > > > > >
> > > > > > We fix this by allowing the DT endpoint subnode to be missing.
> > > > > > This is important for platforms like the CM4 which have a
> > > > > > standard PCIe socket and the endpoint device is unknown.
> 
> > > But above you say it's the *endpoint* node that doesn't exist.  The
> > > existing code looks like it's checking for the *bridge* node
> > > (bus->dev->of_node).  We haven't even enumerated the devices on the
> > > child bus, so we don't know about them at this point.
> >
> > You are absolutely correct and I must change the commit message
> > to say the "root port DT node".   I'm sorry; this mistake likely did not
> > help you understand the fix. :-(
> 
> Great, that will help me out!  I think including the relevant DT
> snippet would also make it more concrete and might conceivably be
> helpful to somebody working around it on a kernel without the fix.

Where are we at with this?  Linus just merged my pull request, and I'd
really like to get this resolved before -rc1 (expected June 5 or so),
which means I'd like to ask him to pull the fix early next week.

The alternative is to ask him to pull these reverts, which have
actually been in -next since May 11:

  4246970a3bcb ("Revert "PCI: brcmstb: Split brcm_pcie_setup() into two funcs"")
  f35b19f02e01 ("Revert "PCI: brcmstb: Add mechanism to turn on subdev regulators"")
  ae65b283d7a4 ("Revert "PCI: brcmstb: Add control of subdevice voltage regulators"")
  d938b26e9b14 ("Revert "PCI: brcmstb: Do not turn off WOL regulators on suspend"")

Bjorn
Jim Quinlan May 28, 2022, 12:19 a.m. UTC | #19
On Fri, May 27, 2022 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, May 25, 2022 at 04:57:39PM -0500, Bjorn Helgaas wrote:
> > On Tue, May 24, 2022 at 12:54:48PM -0400, Jim Quinlan wrote:
> > > On Mon, May 23, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > > > > On Sat, May 21,
> > > > > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > > > > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> > > > > > > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice
> > > > > > > voltage regulators")
> > > > > > >
> > > > > > > introduced a regression on the PCIe RPi4 Compute Module.  If the
> > > > > > > PCIe endpoint node described in [2] was missing, no linkup would
> > > > > > > be attempted, and subsequent accesses would cause a panic
> > > > > > > because this particular PCIe HW causes a CPU abort on illegal
> > > > > > > accesses (instead of returning 0xffffffff).
> > > > > > >
> > > > > > > We fix this by allowing the DT endpoint subnode to be missing.
> > > > > > > This is important for platforms like the CM4 which have a
> > > > > > > standard PCIe socket and the endpoint device is unknown.
> >
> > > > But above you say it's the *endpoint* node that doesn't exist.  The
> > > > existing code looks like it's checking for the *bridge* node
> > > > (bus->dev->of_node).  We haven't even enumerated the devices on the
> > > > child bus, so we don't know about them at this point.
> > >
> > > You are absolutely correct and I must change the commit message
> > > to say the "root port DT node".   I'm sorry; this mistake likely did not
> > > help you understand the fix. :-(
> >
> > Great, that will help me out!  I think including the relevant DT
> > snippet would also make it more concrete and might conceivably be
> > helpful to somebody working around it on a kernel without the fix.
>
> Where are we at with this?  Linus just merged my pull request, and I'd
> really like to get this resolved before -rc1 (expected June 5 or so),
> which means I'd like to ask him to pull the fix early next week.
I was waiting to see where the email thread was going...
I'll send out the v2 regression fix in less than 24 hours.


Regards,
Jim Quinlan
Broadcom STB
>
> The alternative is to ask him to pull these reverts, which have
> actually been in -next since May 11:
>
>   4246970a3bcb ("Revert "PCI: brcmstb: Split brcm_pcie_setup() into two funcs"")
>   f35b19f02e01 ("Revert "PCI: brcmstb: Add mechanism to turn on subdev regulators"")
>   ae65b283d7a4 ("Revert "PCI: brcmstb: Add control of subdevice voltage regulators"")
>   d938b26e9b14 ("Revert "PCI: brcmstb: Do not turn off WOL regulators on suspend"")
>
> Bjorn
Bjorn Helgaas May 28, 2022, 1:59 a.m. UTC | #20
On Fri, May 27, 2022 at 08:19:16PM -0400, Jim Quinlan wrote:
> On Fri, May 27, 2022 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:

> > Where are we at with this?  Linus just merged my pull request, and I'd
> > really like to get this resolved before -rc1 (expected June 5 or so),
> > which means I'd like to ask him to pull the fix early next week.
>
> I was waiting to see where the email thread was going...

My fault for raising all the tangents :)

> I'll send out the v2 regression fix in less than 24 hours.

Great, thanks!
Rob Herring May 31, 2022, 7:46 p.m. UTC | #21
On Thu, May 26, 2022 at 03:53:55PM -0500, Bjorn Helgaas wrote:
> On Thu, May 26, 2022 at 02:25:12PM -0500, Rob Herring wrote:
> > On Mon, May 23, 2022 at 05:10:36PM -0500, Bjorn Helgaas wrote:
> > > On Sat, May 21, 2022 at 02:51:42PM -0400, Jim Quinlan wrote:
> > > > On Sat, May 21,
> > > > 2CONFIG_INITRAMFS_SOURCE="/work3/jq921458/cpio/54-arm64-rootfs.cpio022
> > > > at 12:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> 
> > > > > I added Rafael because this seems vaguely similar to runtime power
> > > > > management, and if we can integrate with that somehow, I'd sure like
> > > > > to avoid building a parallel infrastructure for it.
> > > > >
> > > > > The current path we're on is to move some of this code that's
> > > > > currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
> > > > > hesitant about that because ACPI does just fine without it.  If we're
> > > > > adding new DT functionality that could not be implemented via ACPI,
> > > > > that's one thing.  But I'm not convinced this is that new.
> > > >
> > > > AFAICT, Broadcom STB and Cable Modem products do not have/use/want
> > > > ACPI.  We are fine with keeping this "PCIe regulator" feature
> > > > private to our driver and giving you speedy and full support in
> > > > maintaining it.
> > > 
> > > I don't mean that you should use ACPI, only that ACPI platforms can do
> > > this sort of power control using the existing PCI core infrastructure,
> > > and maybe there's a way for OF/DT platforms to hook into that same
> > > infrastructure to minimize the driver-specific work.  E.g., maybe
> > > there's a way to extend platform_pci_set_power_state() and similar to
> > > manage these regulators.
> > 
> > The big difference is ACPI abstracts how to control power for a device. 
> > The OS just knows D0, D3, etc. states. For DT, there is no such 
> > abstraction. You need device specific code to do device specific power 
> > management.
> 
> I'm thinking about the PCI side of the host controller, which should
> live by the PCI rules.  There are device-specific ways to control
> power, clocks, resets, etc on the PCI side, but drivers for PCI
> devices (as opposed to drivers for the host controllers) can't really
> call that code directly.

Yes, there are PCI specific ways to handle some of this when it is 
signals or power for standard PCI slots. But then it's also possible 
that you have a soldered down device that has extra or different 
interfaces.

When this Broadcom thread was reviewed originally, I was the one pushing 
this towards doing this in the portdrv. That seems like the more 
logical place at least to control the root port state even if we need 
host controller specific routines to do the work. It's all related to 
how do we separate out host bridge and root port operations. 

> There are some exceptions, but generally speaking I don't think PCI
> drivers that use generic power management need to use PCI_D0,
> PCI_D3hot, etc directly.  Generic PM uses interfaces like
> pci_pm_suspend() that keep most of the PCI details in the PCI core
> instead of the endpoint driver, e.g., [3].

Yeah, I think that's a different issue.


> The PCI core has a bunch of interfaces:
> 
>   platform_pci_power_manageable()
>   platform_pci_set_power_state()
>   platform_pci_get_power_state()
>   platform_pci_choose_state()
> 
> that currently mostly use ACPI.  So I'm wondering whether there's some
> way to extend those platform_*() interfaces to call the native host
> controller device-specific power control code via an ops structure.
> 
> Otherwise it feels like the native host controller drivers are in a
> different world than the generic PM world, and we'll end up with every
> host controller driver reimplementing things.
> 
> For example, how would we runtime suspend a Root Port and turn off
> power for PCI devices below it?  Obviously that requires
> device-specific code to control the power.  Do we have some common
> interface to it, or do we have to trap config writes to PCI_PM_CTRL or
> something?

Shrug. Honestly, the PCI specific power management stuff is not 
something I've studied. I'm a bit more fluent runtime PM.

Somewhat related to all this is this thread[4] where I've suggested that 
the right way to save power when there's no link (or child device 
really) is using runtime PM rather than just failing probe. We also 
don't need each host controller doing their own conformance test hacks 
either.


> [3] https://git.kernel.org/linus/cd97b7e0d780
> 
> > > > > [0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com
> 
> > > > > IIUC, this path:
> > > > >
> > > > >   pci_alloc_child_bus
> > > > >     brcm_pcie_add_bus                   # .add_bus method
> > > > >       pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
> > > > >         alloc_subdev_regulators         # in pcie-brcmstb.c for now
> > > > >         regulator_bulk_get
> > > > >         regulator_bulk_enable
> > > > >       brcm_pcie_linkup                  # bring link up
> > > > >
> > > > > is basically so we can leave power to downstream devices off, then
> > > > > turn it on when we're ready to enumerate those downstream devices.
> > > >
> > > > Yes  -- it is the "chicken-and-egg" problem.  Ideally, we would like
> > > > for the endpoint driver to turn on its own regulators, but even to
> > > > know which endpoint driver to probe we must turn on the regulator to
> > > > establish linkup.
> > > 
> > > I don't think having an endpoint driver turn on power to its device is
> > > the right goal.  
> > 
> > DT requires device specific code to control a specific device. That 
> > belongs in the driver for that device.
> 
> I must be talking about something different than you are.  I see that
> brcmstb has device-specific code to control the brcmstb device as well
> as power for PCI devices downstream from that device.
> 
> When I read "endpoint driver" I think of a PCIe Endpoint device like a
> NIC.  That's just a random PCI device, and I read "endpoint driver to
> turn on its own regulators" as suggesting that the NIC driver (e1000,
> etc) would turn on power to the NIC.  Is that the intent?

Yes! A NIC as an add-in card doesn't need anything because it's just a 
standard PCI connector with standard power sequencing. But take that 
same NIC chip and solder it down on a board. Then the board designers 
start cost saving and remove components. For example, there's no need 
for standard PCI supply to chip supply regulators (e.g. 12V/3.3V to 
whatever the chip has). Who needs an EEPROM with a MAC address either.

I think there's roughly 2 cases we're dealing with. The platform 
specific ways to do power control on standard PCIe slots/connectors, and 
then non-standard connections that need downstream device specific ways 
to do power management (including powering on to be discovered). The 
line is blurred a bit because the latter case needs some of the former 
case (at least any in-band PCI power management). The problem I see all 
the time (not just PCI) is people trying to implement something 
generic/common rather than device specific which then makes its way into 
bindings. The only way something generic works is if there's a spec 
behind it. For PCI slots there is, but it is important we distinguish 
the 2 cases.

Rob

[4] https://lore.kernel.org/linux-pci/YksDJfterGl9uPjs@robh.at.kernel.org/
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index ba5c120816b2..adca74e235cb 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -540,16 +540,18 @@  static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
 
 static int brcm_pcie_add_bus(struct pci_bus *bus)
 {
-	struct device *dev = &bus->dev;
 	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
 	int ret;
 
-	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
+	/* Only busno==1 requires us to linkup */
+	if ((int)bus->number != 1)
 		return 0;
 
 	ret = pci_subdev_regulators_add_bus(bus);
-	if (ret)
+	if (ret) {
+		pcie->refusal_mode = true;
 		return ret;
+	}
 
 	/* Grab the regulators for suspend/resume */
 	pcie->sr = bus->dev.driver_data;