Message ID | 20220701162726.31346-1-jim2101024@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: brcmstb: Re-submit reverted patchset | expand |
On 7/1/22 09:27, Jim Quinlan wrote: > A submission [1] was made to enable a PCIe root port to turn on regulators > for downstream devices. It was accepted. Months later, a regression was > discovered on an RPi CM4 [2]. The patchset was reverted [3] as the fix > came too late in the release cycle. The regression in question is > triggered only when the PCIe RC DT node has no root port subnode, which is > a perfectly reasonsable configuration. > > The original commits are now being resubmitted with some modifications to > fix the regression. The modifcations on the original commits are > described below (the SHA is that of the original commit): > > [830aa6f29f07 PCI: brcmstb: Split brcm_pcie_setup() into two funcs] > NOTE: In the originally submitted patchset, this commit introduced a > regression that was corrected by a subsequent commit in the same > patchset. Let's not do this again. > > @@ -1411,6 +1411,10 @@ static int brcm_pcie_probe(struct platform_device *pdev) > if (ret) > goto fail; > > + ret = brcm_pcie_linkup(pcie); > + if (ret) > + goto fail; > > > [67211aadcb4b PCI: brcmstb: Add mechanism to turn on subdev regulators] > NOTE: Not related to the regression, the regulators must be freed whenever > the PCIe tree is dismantled: > > @@ -507,6 +507,7 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus) > > if (regulator_bulk_disable(sr->num_supplies, sr->supplies)) > dev_err(dev, "failed to disable regulators for downstream device\n"); > + regulator_bulk_free(sr->num_supplies, sr->supplies); > dev->driver_data = NULL; > > > [93e41f3fca3d PCI: brcmstb: Add control of subdevice voltage regulators] > NOTE: If the PCIe RC DT node was missing a Root Port subnode, the PCIe > link-up was skipped. This is the regression. Fix it by attempting > link-up even if the Root Port DT subnode is missing. > > @@ -503,11 +503,10 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus) > > static int brcm_pcie_add_bus(struct pci_bus *bus) > { > - struct device *dev = &bus->dev; > struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata; > int ret; > > - if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent)) > + if (!bus->parent || !pci_is_root_bus(bus->parent)) > return 0; > > ret = pci_subdev_regulators_add_bus(bus); > > [1] https://lore.kernel.org/r/20220106160332.2143-1-jim2101024@gmail.com > [2] https://bugzilla.kernel.org/show_bug.cgi?id=215925 > [3] https://lore.kernel.org/linux-pci/20220511201856.808690-1-helgaas@kernel.org/ On a Raspberry Pi 4B: Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli <f.fainelli@gmail.com> (2022-07-01): > On 7/1/22 09:27, Jim Quinlan wrote: > > A submission [1] was made to enable a PCIe root port to turn on regulators > > for downstream devices. It was accepted. Months later, a regression was > > discovered on an RPi CM4 [2]. The patchset was reverted [3] as the fix > > came too late in the release cycle. The regression in question is > > triggered only when the PCIe RC DT node has no root port subnode, which is > > a perfectly reasonsable configuration. > > > > The original commits are now being resubmitted with some modifications to > > fix the regression. The modifcations on the original commits are > > described below (the SHA is that of the original commit): > > > > [830aa6f29f07 PCI: brcmstb: Split brcm_pcie_setup() into two funcs] > > NOTE: In the originally submitted patchset, this commit introduced a > > regression that was corrected by a subsequent commit in the same > > patchset. Let's not do this again. > > > > @@ -1411,6 +1411,10 @@ static int brcm_pcie_probe(struct platform_device *pdev) > > if (ret) > > goto fail; > > > > + ret = brcm_pcie_linkup(pcie); > > + if (ret) > > + goto fail; > > > > > > [67211aadcb4b PCI: brcmstb: Add mechanism to turn on subdev regulators] > > NOTE: Not related to the regression, the regulators must be freed whenever > > the PCIe tree is dismantled: > > > > @@ -507,6 +507,7 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus) > > > > if (regulator_bulk_disable(sr->num_supplies, sr->supplies)) > > dev_err(dev, "failed to disable regulators for downstream device\n"); > > + regulator_bulk_free(sr->num_supplies, sr->supplies); > > dev->driver_data = NULL; > > > > > > [93e41f3fca3d PCI: brcmstb: Add control of subdevice voltage regulators] > > NOTE: If the PCIe RC DT node was missing a Root Port subnode, the PCIe > > link-up was skipped. This is the regression. Fix it by attempting > > link-up even if the Root Port DT subnode is missing. > > > > @@ -503,11 +503,10 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus) > > > > static int brcm_pcie_add_bus(struct pci_bus *bus) > > { > > - struct device *dev = &bus->dev; > > struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata; > > int ret; > > > > - if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent)) > > + if (!bus->parent || !pci_is_root_bus(bus->parent)) > > return 0; > > > > ret = pci_subdev_regulators_add_bus(bus); > > > > [1] https://lore.kernel.org/r/20220106160332.2143-1-jim2101024@gmail.com > > [2] https://bugzilla.kernel.org/show_bug.cgi?id=215925 > > [3] https://lore.kernel.org/linux-pci/20220511201856.808690-1-helgaas@kernel.org/ > > On a Raspberry Pi 4B: > > Tested-by: Florian Fainelli <f.fainelli@gmail.com> As it stands, CM4 support in master is less than ideal: the mmc issues I've mentioned in some earlier discussion are making it very hard to draw any definitive conclusions. Soft reboots or cold boots don't seem to make a difference: the storage might not show up at all, leading to getting dropped into an initramfs shell, or it might show up but further accesses can be delayed so much that the system proceeds to booting but very slowly, and it might even lead to getting dropped into some emergency/maintenance mode. This affects both the CM4 Lite variant (no internal storage = SD card in the CM4 IO slot) and some CM4 non-Lite variant (with internal storage), with messages like this one getting repeated: [ 310.105020] mmc0: Timeout waiting for hardware cmd interrupt. [ 310.110864] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 310.117390] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00009902 [ 310.123918] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 310.130445] mmc0: sdhci: Argument: 0x000001aa | Trn mode: 0x00000000 [ 310.136971] mmc0: sdhci: Present: 0x01ff0001 | Host ctl: 0x00000001 [ 310.143496] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000000 [ 310.150021] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00007187 [ 310.156548] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00018000 [ 310.163074] mmc0: sdhci: Int enab: 0x00ff0003 | Sig enab: 0x00ff0003 [ 310.169600] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001 [ 310.176126] mmc0: sdhci: Caps: 0x00000000 | Caps_1: 0x00000000 [ 310.182652] mmc0: sdhci: Cmd: 0x0000081a | Max curr: 0x00000001 [ 310.189178] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000 [ 310.195704] mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 [ 310.202230] mmc0: sdhci: Host ctl2: 0x00000000 [ 310.206728] mmc0: sdhci: ============================================ That happens with current master (v5.19-rc5-56-ge35e5b6f695d2), with or without this patchset. That being said, I'm not able to reproduce the showstopper regression that I reported against the initial patchset (booting was breaking in the very first few seconds), so I suppose it's fine to propose the following even if that's somewhat tainted by those mmc issues. With Raspberry Pi CM4 (Lite and non-Lite), mounted on a CM4 IO Board: - with a PCIe to quad-USB board, USB storage and USB keyboard; - without anything in the PCIe slot. Tested-by: Cyril Brulebois <cyril@debamax.com> Cheers,
On 7/5/22 13:55, Cyril Brulebois wrote: > Florian Fainelli <f.fainelli@gmail.com> (2022-07-01): >> On 7/1/22 09:27, Jim Quinlan wrote: >>> A submission [1] was made to enable a PCIe root port to turn on regulators >>> for downstream devices. It was accepted. Months later, a regression was >>> discovered on an RPi CM4 [2]. The patchset was reverted [3] as the fix >>> came too late in the release cycle. The regression in question is >>> triggered only when the PCIe RC DT node has no root port subnode, which is >>> a perfectly reasonsable configuration. >>> >>> The original commits are now being resubmitted with some modifications to >>> fix the regression. The modifcations on the original commits are >>> described below (the SHA is that of the original commit): >>> >>> [830aa6f29f07 PCI: brcmstb: Split brcm_pcie_setup() into two funcs] >>> NOTE: In the originally submitted patchset, this commit introduced a >>> regression that was corrected by a subsequent commit in the same >>> patchset. Let's not do this again. >>> >>> @@ -1411,6 +1411,10 @@ static int brcm_pcie_probe(struct platform_device *pdev) >>> if (ret) >>> goto fail; >>> >>> + ret = brcm_pcie_linkup(pcie); >>> + if (ret) >>> + goto fail; >>> >>> >>> [67211aadcb4b PCI: brcmstb: Add mechanism to turn on subdev regulators] >>> NOTE: Not related to the regression, the regulators must be freed whenever >>> the PCIe tree is dismantled: >>> >>> @@ -507,6 +507,7 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus) >>> >>> if (regulator_bulk_disable(sr->num_supplies, sr->supplies)) >>> dev_err(dev, "failed to disable regulators for downstream device\n"); >>> + regulator_bulk_free(sr->num_supplies, sr->supplies); >>> dev->driver_data = NULL; >>> >>> >>> [93e41f3fca3d PCI: brcmstb: Add control of subdevice voltage regulators] >>> NOTE: If the PCIe RC DT node was missing a Root Port subnode, the PCIe >>> link-up was skipped. This is the regression. Fix it by attempting >>> link-up even if the Root Port DT subnode is missing. >>> >>> @@ -503,11 +503,10 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus) >>> >>> static int brcm_pcie_add_bus(struct pci_bus *bus) >>> { >>> - struct device *dev = &bus->dev; >>> struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata; >>> int ret; >>> >>> - if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent)) >>> + if (!bus->parent || !pci_is_root_bus(bus->parent)) >>> return 0; >>> >>> ret = pci_subdev_regulators_add_bus(bus); >>> >>> [1] https://lore.kernel.org/r/20220106160332.2143-1-jim2101024@gmail.com >>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=215925 >>> [3] https://lore.kernel.org/linux-pci/20220511201856.808690-1-helgaas@kernel.org/ >> >> On a Raspberry Pi 4B: >> >> Tested-by: Florian Fainelli <f.fainelli@gmail.com> > > As it stands, CM4 support in master is less than ideal: the mmc issues > I've mentioned in some earlier discussion are making it very hard to > draw any definitive conclusions. Soft reboots or cold boots don't seem > to make a difference: the storage might not show up at all, leading to > getting dropped into an initramfs shell, or it might show up but further > accesses can be delayed so much that the system proceeds to booting but > very slowly, and it might even lead to getting dropped into some > emergency/maintenance mode. > > This affects both the CM4 Lite variant (no internal storage = SD card in > the CM4 IO slot) and some CM4 non-Lite variant (with internal storage), > with messages like this one getting repeated: > > [ 310.105020] mmc0: Timeout waiting for hardware cmd interrupt. > [ 310.110864] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== > [ 310.117390] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00009902 > [ 310.123918] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 > [ 310.130445] mmc0: sdhci: Argument: 0x000001aa | Trn mode: 0x00000000 > [ 310.136971] mmc0: sdhci: Present: 0x01ff0001 | Host ctl: 0x00000001 > [ 310.143496] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000000 > [ 310.150021] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00007187 > [ 310.156548] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00018000 > [ 310.163074] mmc0: sdhci: Int enab: 0x00ff0003 | Sig enab: 0x00ff0003 > [ 310.169600] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001 > [ 310.176126] mmc0: sdhci: Caps: 0x00000000 | Caps_1: 0x00000000 > [ 310.182652] mmc0: sdhci: Cmd: 0x0000081a | Max curr: 0x00000001 > [ 310.189178] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000 > [ 310.195704] mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 > [ 310.202230] mmc0: sdhci: Host ctl2: 0x00000000 > [ 310.206728] mmc0: sdhci: ============================================ > > That happens with current master (v5.19-rc5-56-ge35e5b6f695d2), with or > without this patchset. > > That being said, I'm not able to reproduce the showstopper regression > that I reported against the initial patchset (booting was breaking in > the very first few seconds), so I suppose it's fine to propose the > following even if that's somewhat tainted by those mmc issues. Any chance you can bisect the eMMC issues so we can investigate those separately? Thanks! > > > With Raspberry Pi CM4 (Lite and non-Lite), mounted on a CM4 IO Board: > - with a PCIe to quad-USB board, USB storage and USB keyboard; > - without anything in the PCIe slot. > > Tested-by: Cyril Brulebois <cyril@debamax.com> Thanks!
Florian Fainelli <f.fainelli@gmail.com> (2022-07-05): > On 7/5/22 13:55, Cyril Brulebois wrote: > > That happens with current master (v5.19-rc5-56-ge35e5b6f695d2), with > > or without this patchset. > > > > That being said, I'm not able to reproduce the showstopper > > regression that I reported against the initial patchset (booting was > > breaking in the very first few seconds), so I suppose it's fine to > > propose the following even if that's somewhat tainted by those mmc > > issues. > > Any chance you can bisect the eMMC issues so we can investigate those > separately? Thanks! Definitely. I wanted to make sure I wouldn't delay the reintroduction of this patchset (feeling partly responsible for the revert that happened in the first place), by providing some feedback regarding a possible come-back of the regression, as soon as possible. Now that this is out of the way, I'll try and find time to investigate those MMC issues. Ditto for DRM, I seem to have completely lost the HDMI output (that's less of an issue thanks to the serial console that has been rather reliable to gather kernel logs). I think I started encountering both issues very early in the devel cycle (when we were still trying to find some follow-up commits to fix the regression instead of going for the full-on revert), but I couldn't afford spending time chasing multiple issues at once. I haven't checked whether reports exist already for those issues, but that's my next step. Cheers,
On Tue, Jul 5, 2022 at 5:28 PM Cyril Brulebois <kibi@debian.org> wrote: > > Florian Fainelli <f.fainelli@gmail.com> (2022-07-05): > > On 7/5/22 13:55, Cyril Brulebois wrote: > > > That happens with current master (v5.19-rc5-56-ge35e5b6f695d2), with > > > or without this patchset. > > > > > > That being said, I'm not able to reproduce the showstopper > > > regression that I reported against the initial patchset (booting was > > > breaking in the very first few seconds), so I suppose it's fine to > > > propose the following even if that's somewhat tainted by those mmc > > > issues. > > > > Any chance you can bisect the eMMC issues so we can investigate those > > separately? Thanks! Cyril, Before you go to the trouble of a bisection, can you just post the (or email me) the following: o complete boot log o output of "cat /proc/interrupts" o output of "for i in $(find /sys/devices/platform/ -type f -name state) ; do echo $i: $(cat $i) ; done" Thanks, Jim Quinlan Broadcom STB > > > Definitely. I wanted to make sure I wouldn't delay the reintroduction of > this patchset (feeling partly responsible for the revert that happened > in the first place), by providing some feedback regarding a possible > come-back of the regression, as soon as possible. > > Now that this is out of the way, I'll try and find time to investigate > those MMC issues. Ditto for DRM, I seem to have completely lost the HDMI > output (that's less of an issue thanks to the serial console that has > been rather reliable to gather kernel logs). > > I think I started encountering both issues very early in the devel > cycle (when we were still trying to find some follow-up commits to fix > the regression instead of going for the full-on revert), but I couldn't > afford spending time chasing multiple issues at once. I haven't checked > whether reports exist already for those issues, but that's my next step. > > > Cheers, > -- > Cyril Brulebois (kibi@debian.org) <https://debamax.com/> > D-I release manager -- Release team member -- Freelance Consultant
On Fri, Jul 01, 2022 at 12:27:21PM -0400, Jim Quinlan wrote: > A submission [1] was made to enable a PCIe root port to turn on regulators > for downstream devices. It was accepted. Months later, a regression was > discovered on an RPi CM4 [2]. The patchset was reverted [3] as the fix > came too late in the release cycle. The regression in question is > triggered only when the PCIe RC DT node has no root port subnode, which is > a perfectly reasonsable configuration. > ... > Jim Quinlan (4): > PCI: brcmstb: Split brcm_pcie_setup() into two funcs > PCI: brcmstb: Add mechanism to turn on subdev regulators > PCI: brcmstb: oAdd control of subdevice voltage regulators > PCI: brcmstb: Do not turn off WOL regulators on suspend > > drivers/pci/controller/pcie-brcmstb.c | 257 +++++++++++++++++++++++--- > 1 file changed, 227 insertions(+), 30 deletions(-) I'm assuming there's a v2 coming soonish? We should see -rc7 this weekend and likely a final v5.19 release on July 24, so v5.20 material should be tidied up by then. Bjorn
On Fri, Jul 15, 2022 at 2:27 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Jul 01, 2022 at 12:27:21PM -0400, Jim Quinlan wrote: > > A submission [1] was made to enable a PCIe root port to turn on regulators > > for downstream devices. It was accepted. Months later, a regression was > > discovered on an RPi CM4 [2]. The patchset was reverted [3] as the fix > > came too late in the release cycle. The regression in question is > > triggered only when the PCIe RC DT node has no root port subnode, which is > > a perfectly reasonsable configuration. > > ... > > > Jim Quinlan (4): > > PCI: brcmstb: Split brcm_pcie_setup() into two funcs > > PCI: brcmstb: Add mechanism to turn on subdev regulators > > PCI: brcmstb: oAdd control of subdevice voltage regulators > > PCI: brcmstb: Do not turn off WOL regulators on suspend > > > > drivers/pci/controller/pcie-brcmstb.c | 257 +++++++++++++++++++++++--- > > 1 file changed, 227 insertions(+), 30 deletions(-) > > I'm assuming there's a v2 coming soonish? We should see -rc7 this > weekend and likely a final v5.19 release on July 24, so v5.20 material > should be tidied up by then. Hi Bjorn, Yes, it has been ready for a few days but I am bumping into unrelated issues while trying to do suspend/resume tests with the latest upstream. Hopefully I will send it out tonight or this WE. Regards, Jim Quinlan Broadcom STB > > Bjorn