mbox series

[0/4] PCI/pwrctrl: Rework pwrctrl driver integration and add driver for PCI slot

Message ID 20241210-pci-pwrctrl-slot-v1-0-eae45e488040@linaro.org (mailing list archive)
Headers show
Series PCI/pwrctrl: Rework pwrctrl driver integration and add driver for PCI slot | expand

Message

Manivannan Sadhasivam via B4 Relay Dec. 10, 2024, 9:55 a.m. UTC
Hi,

This series reworks the PCI pwrctrl integration (again) by moving the creation
and removal of pwrctrl devices to pci_scan_device() and pci_destroy_dev() APIs.
This is based on the suggestion provided by Lukas Wunner [1][2]. With this
change, it is now possible to create pwrctrl devices for PCI bridges as well.
This is required to control the power state of the PCI slots in a system. Since
the PCI slots are not explicitly defined in devicetree, the agreement is to
define the supplies for PCI slots in PCI bridge nodes itself [3].

Based on this, a pwrctrl driver to control the supplies of PCI slots are also
added in patch 4. With this driver, it is now possible to control the voltage
regulators powering the PCI slots defined in PCI bridge nodes as below:

```
pcie@0 {
	compatible "pciclass,0604"
	...

	vpcie12v-supply = <&vpcie12v_reg>;
	vpcie3v3-supply = <&vpcie3v3_reg>;
	vpcie3v3aux-supply = <&vpcie3v3aux_reg>;
};
```

To make use of this driver, the PCI bridge DT node should also have the
compatible "pciclass,0604". But adding this compatible triggers the following
checkpatch warning:

WARNING: DT compatible string vendor "pciclass" appears un-documented --
check ./Documentation/devicetree/bindings/vendor-prefixes.yaml

For fixing it, I added patch 3. But due to some reason, checkpatch is not
picking the 'pciclass' vendor prefix alone, and requires adding the full
compatible 'pciclass,0604' in the vendor-prefixes list. Since my perl skills are
not great, I'm leaving it in the hands of Rob to fix the checkpatch script.

[1] https://lore.kernel.org/linux-pci/Z0yLDBMAsh0yKWf2@wunner.de
[2] https://lore.kernel.org/linux-pci/Z0xAdQ2ozspEnV5g@wunner.de
[3] https://github.com/devicetree-org/dt-schema/issues/145

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
Manivannan Sadhasivam (4):
      PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
      PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
      dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
      PCI/pwrctrl: Add pwrctrl driver for PCI Slots

 .../devicetree/bindings/vendor-prefixes.yaml       |  2 +-
 drivers/pci/bus.c                                  | 43 ----------
 drivers/pci/probe.c                                | 34 ++++++++
 drivers/pci/pwrctrl/Kconfig                        | 11 +++
 drivers/pci/pwrctrl/Makefile                       |  3 +
 drivers/pci/pwrctrl/core.c                         |  2 +-
 drivers/pci/pwrctrl/slot.c                         | 93 ++++++++++++++++++++++
 drivers/pci/remove.c                               |  2 +-
 8 files changed, 144 insertions(+), 46 deletions(-)
---
base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
change-id: 20241210-pci-pwrctrl-slot-02c0ec63172f

Best regards,

Comments

Bartosz Golaszewski Dec. 10, 2024, 12:40 p.m. UTC | #1
On Tue, Dec 10, 2024 at 10:55 AM Manivannan Sadhasivam via B4 Relay
<devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote:
>
> Hi,
>
> This series reworks the PCI pwrctrl integration (again) by moving the creation
> and removal of pwrctrl devices to pci_scan_device() and pci_destroy_dev() APIs.
> This is based on the suggestion provided by Lukas Wunner [1][2]. With this
> change, it is now possible to create pwrctrl devices for PCI bridges as well.
> This is required to control the power state of the PCI slots in a system. Since
> the PCI slots are not explicitly defined in devicetree, the agreement is to
> define the supplies for PCI slots in PCI bridge nodes itself [3].
>
> Based on this, a pwrctrl driver to control the supplies of PCI slots are also
> added in patch 4. With this driver, it is now possible to control the voltage
> regulators powering the PCI slots defined in PCI bridge nodes as below:
>
> ```
> pcie@0 {
>         compatible "pciclass,0604"
>         ...
>
>         vpcie12v-supply = <&vpcie12v_reg>;
>         vpcie3v3-supply = <&vpcie3v3_reg>;
>         vpcie3v3aux-supply = <&vpcie3v3aux_reg>;
> };
> ```
>
> To make use of this driver, the PCI bridge DT node should also have the
> compatible "pciclass,0604". But adding this compatible triggers the following
> checkpatch warning:
>
> WARNING: DT compatible string vendor "pciclass" appears un-documented --
> check ./Documentation/devicetree/bindings/vendor-prefixes.yaml
>
> For fixing it, I added patch 3. But due to some reason, checkpatch is not
> picking the 'pciclass' vendor prefix alone, and requires adding the full
> compatible 'pciclass,0604' in the vendor-prefixes list. Since my perl skills are
> not great, I'm leaving it in the hands of Rob to fix the checkpatch script.
>
> [1] https://lore.kernel.org/linux-pci/Z0yLDBMAsh0yKWf2@wunner.de
> [2] https://lore.kernel.org/linux-pci/Z0xAdQ2ozspEnV5g@wunner.de
> [3] https://github.com/devicetree-org/dt-schema/issues/145
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
> Manivannan Sadhasivam (4):
>       PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
>       PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
>       dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
>       PCI/pwrctrl: Add pwrctrl driver for PCI Slots
>
>  .../devicetree/bindings/vendor-prefixes.yaml       |  2 +-
>  drivers/pci/bus.c                                  | 43 ----------
>  drivers/pci/probe.c                                | 34 ++++++++
>  drivers/pci/pwrctrl/Kconfig                        | 11 +++
>  drivers/pci/pwrctrl/Makefile                       |  3 +
>  drivers/pci/pwrctrl/core.c                         |  2 +-
>  drivers/pci/pwrctrl/slot.c                         | 93 ++++++++++++++++++++++
>  drivers/pci/remove.c                               |  2 +-
>  8 files changed, 144 insertions(+), 46 deletions(-)
> ---
> base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
> change-id: 20241210-pci-pwrctrl-slot-02c0ec63172f
>
> Best regards,
> --
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>
>

Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Qiang Yu Dec. 11, 2024, 9:55 a.m. UTC | #2
On 12/10/2024 5:55 PM, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
>
> This series reworks the PCI pwrctrl integration (again) by moving the creation
> and removal of pwrctrl devices to pci_scan_device() and pci_destroy_dev() APIs.
> This is based on the suggestion provided by Lukas Wunner [1][2]. With this
> change, it is now possible to create pwrctrl devices for PCI bridges as well.
> This is required to control the power state of the PCI slots in a system. Since
> the PCI slots are not explicitly defined in devicetree, the agreement is to
> define the supplies for PCI slots in PCI bridge nodes itself [3].
>
> Based on this, a pwrctrl driver to control the supplies of PCI slots are also
> added in patch 4. With this driver, it is now possible to control the voltage
> regulators powering the PCI slots defined in PCI bridge nodes as below:
>
> ```
> pcie@0 {
> 	compatible "pciclass,0604"
> 	...
>
> 	vpcie12v-supply = <&vpcie12v_reg>;
> 	vpcie3v3-supply = <&vpcie3v3_reg>;
> 	vpcie3v3aux-supply = <&vpcie3v3aux_reg>;
> };
> ```
>
> To make use of this driver, the PCI bridge DT node should also have the
> compatible "pciclass,0604". But adding this compatible triggers the following
> checkpatch warning:
>
> WARNING: DT compatible string vendor "pciclass" appears un-documented --
> check ./Documentation/devicetree/bindings/vendor-prefixes.yaml
>
> For fixing it, I added patch 3. But due to some reason, checkpatch is not
> picking the 'pciclass' vendor prefix alone, and requires adding the full
> compatible 'pciclass,0604' in the vendor-prefixes list. Since my perl skills are
> not great, I'm leaving it in the hands of Rob to fix the checkpatch script.
>
> [1] https://lore.kernel.org/linux-pci/Z0yLDBMAsh0yKWf2@wunner.de
> [2] https://lore.kernel.org/linux-pci/Z0xAdQ2ozspEnV5g@wunner.de
> [3] https://github.com/devicetree-org/dt-schema/issues/145
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
> Manivannan Sadhasivam (4):
>        PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
>        PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
>        dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
>        PCI/pwrctrl: Add pwrctrl driver for PCI Slots
>
>   .../devicetree/bindings/vendor-prefixes.yaml       |  2 +-
>   drivers/pci/bus.c                                  | 43 ----------
>   drivers/pci/probe.c                                | 34 ++++++++
>   drivers/pci/pwrctrl/Kconfig                        | 11 +++
>   drivers/pci/pwrctrl/Makefile                       |  3 +
>   drivers/pci/pwrctrl/core.c                         |  2 +-
>   drivers/pci/pwrctrl/slot.c                         | 93 ++++++++++++++++++++++
>   drivers/pci/remove.c                               |  2 +-
>   8 files changed, 144 insertions(+), 46 deletions(-)
> ---
> base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
> change-id: 20241210-pci-pwrctrl-slot-02c0ec63172f
>
> Best regards,

Hi Mani

PCIe3 is able to link up after applying your patch. Slot power is turned 
on correctly.
But see “NULL pointer dereference” when I try to remove device.

root@linaro-gnome:/sys/bus/pci/devices/0003:00:00.0# echo 1 > remove

[   38.752976] ------------[ cut here ]------------
[   38.757726] WARNING: CPU: 1 PID: 816 at drivers/regulator/core.c:5857 
regulator_unregister+0x13c/0x160
[   38.767288] Modules linked in: phy_qcom_qmp_combo aux_bridge 
drm_kms_helper drm nvme backlight pinctrl_sm8550_lpass_lpi 
pci_pwrctl_slot pci_pwrctrl_core nvme_core phy_qcom_edp 
phy_qcom_eusb2_repeater dispcc_x1e80100 pinctrl_lpass_lpi 
phy_qcom_snps_eusb2 lpasscc_sc8280xp typec gpucc_x1e80100 phy_qcom_qmp_pcie
[   38.795279] CPU: 1 UID: 0 PID: 816 Comm: bash Not tainted 
6.12.0-next-20241128-00005-g6178bf6ce3c2-dirty #50
[   38.805359] Hardware name: Qualcomm IDP, BIOS 
6.0.240607.BOOT.MXF.2.4-00348.1-HAMOA-1.67705.7 06/ 7/2024
[   38.815088] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS 
BTYPE=--)
[   38.822239] pc : regulator_unregister+0x13c/0x160
[   38.827081] lr : regulator_unregister+0xc0/0x160
[   38.831829] sp : ffff800082ad39e0
[   38.835238] x29: ffff800082ad39e0 x28: ffff67874a4b1140 x27: 
0000000000000000
[   38.842563] x26: 0000000000000000 x25: 0000000000000000 x24: 
0000000000000000
[   38.849895] x23: 0000000000000001 x22: ffff800082ad3a88 x21: 
0000000000000000
[   38.857220] x20: ffffb72b1c7de2b0 x19: ffff678740f60000 x18: 
ffffffffffffffff
[   38.864545] x17: 2e30303030646231 x16: ffffb72b1a68d54c x15: 
3030303064424337
[   38.871867] x14: 000000000000000b x13: ffff6787402b6010 x12: 
0000000000000000
[   38.879200] x11: 0000000000000000 x10: ffffb72b1a689fe8 x9 : 
ffffb72b1a689bc8
[   38.886525] x8 : ffff678740e5b2c0 x7 : ffff800082ad3a88 x6 : 
ffff678740e5b2c0
[   38.893849] x5 : ffff678740e5b2c0 x4 : ffff678740e5b2c0 x3 : 
ffff678740e5b2c0
[   38.901172] x2 : ffff67874a4b1140 x1 : 0000000000000000 x0 : 
0000000000000007
[   38.908496] Call trace:
[   38.911019]  regulator_unregister+0x13c/0x160 (P)
[   38.915856]  regulator_unregister+0xc0/0x160 (L)
[   38.920600]  devm_rdev_release+0x14/0x20
[   38.924634]  devres_release_all+0xa0/0x100
[   38.928845]  device_unbind_cleanup+0x18/0x60
[   38.933239]  device_release_driver_internal+0x1ec/0x228
[   38.938606]  device_release_driver+0x18/0x24
[   38.942998]  bus_remove_device+0xcc/0x10c
[   38.947122]  device_del+0x14c/0x404
[   38.950705]  device_unregister+0x18/0x34
[   38.954738]  of_device_unregister+0x14/0x20
[   38.959041]  pci_remove_bus_device+0x110/0x1b0
[   38.963612]  pci_remove_bus_device+0x38/0x1b0
[   38.968094]  pci_stop_and_remove_bus_device_locked+0x28/0x3c
[   38.973907]  remove_store+0x94/0xa4
[   38.977494]  dev_attr_store+0x18/0x2c
[   38.981263]  sysfs_kf_write+0x44/0x54
[   38.985037]  kernfs_fop_write_iter+0x118/0x1a8
[   38.989607]  vfs_write+0x2b0/0x35c
[   38.993107]  ksys_write+0x68/0xfc
[   38.996519]  __arm64_sys_write+0x1c/0x28
[   39.000552]  invoke_syscall+0x48/0x110
[   39.004411]  el0_svc_common.constprop.0+0x40/0xe8
[   39.009244]  do_el0_svc+0x20/0x2c
[   39.012655]  el0_svc+0x30/0xd0
[   39.015805]  el0t_64_sync_handler+0x144/0x168
[   39.020283]  el0t_64_sync+0x198/0x19c
[   39.024052] ---[ end trace 0000000000000000 ]---
[   39.028897] Unable to handle kernel NULL pointer dereference at 
virtual address 00000000000000c0
[   39.037932] Mem abort info:
[   39.040823]   ESR = 0x0000000096000004
[   39.044691]   EC = 0x25: DABT (current EL), IL = 32 bits
[   39.050161]   SET = 0, FnV = 0
[   39.053316]   EA = 0, S1PTW = 0
[   39.056557]   FSC = 0x04: level 0 translation fault
[   39.061585] Data abort info:
[   39.064554]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[   39.070190]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   39.075395]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   39.080861] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000889f93000
[   39.087484] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000
[   39.094467] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[   39.100903] Modules linked in: phy_qcom_qmp_combo aux_bridge 
drm_kms_helper drm nvme backlight pinctrl_sm8550_lpass_lpi 
pci_pwrctl_slot pci_pwrctrl_core nvme_core phy_qcom_edp 
phy_qcom_eusb2_repeater dispcc_x1e80100 pinctrl_lpass_lpi 
phy_qcom_snps_eusb2 lpasscc_sc8280xp typec gpucc_x1e80100 phy_qcom_qmp_pcie
[   39.128882] CPU: 1 UID: 0 PID: 816 Comm: bash Tainted: G W          
6.12.0-next-20241128-00005-g6178bf6ce3c2-dirty #50
[   39.140480] Tainted: [W]=WARN
[   39.143540] Hardware name: Qualcomm IDP, BIOS 
6.0.240607.BOOT.MXF.2.4-00348.1-HAMOA-1.67705.7 06/ 7/2024
[   39.153273] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS 
BTYPE=--)
[   39.160421] pc : pci_remove_bus_device+0x120/0x1b0
[   39.165341] lr : pci_remove_bus_device+0x110/0x1b0
[   39.170261] sp : ffff800082ad3c00
[   39.173676] x29: ffff800082ad3c00 x28: ffff67874a4b1140 x27: 
0000000000000000
[   39.181004] x26: 0000000000000000 x25: 0000000000000000 x24: 
ffff67874cb2caa0
[   39.188334] x23: ffff800082ad3d88 x22: 0000000000000000 x21: 
ffff6787458f00c8
[   39.195661] x20: ffff6787458f0000 x19: ffffb72b1c5e88d8 x18: 
ffffffffffffffff
[   39.202984] x17: 74616c703d4d4554 x16: 5359534255530079 x15: 
6d6d75642d676572
[   39.210311] x14: ffffb72b1ca01098 x13: 0000000000000040 x12: 
0000000000000228
[   39.217638] x11: 0000000000000000 x10: 0000000000000000 x9 : 
0000000000000000
[   39.224964] x8 : 0000000000000000 x7 : ffff678740d3ebc8 x6 : 
ffff678745584b40
[   39.232296] x5 : ffff6787411c64e0 x4 : ffff6787411c64e0 x3 : 
ffff678741256679
[   39.239626] x2 : ffff678740e5b048 x1 : 0000000000000008 x0 : 
00000000000000c0
[   39.246951] Call trace:
[   39.249469]  pci_remove_bus_device+0x120/0x1b0 (P)
[   39.254399]  pci_remove_bus_device+0x110/0x1b0 (L)
[   39.259326]  pci_remove_bus_device+0x38/0x1b0
[   39.263801]  pci_stop_and_remove_bus_device_locked+0x28/0x3c
[   39.269620]  remove_store+0x94/0xa4
[   39.273209]  dev_attr_store+0x18/0x2c
[   39.276972]  sysfs_kf_write+0x44/0x54
[   39.280735]  kernfs_fop_write_iter+0x118/0x1a8
[   39.285305]  vfs_write+0x2b0/0x35c
[   39.288808]  ksys_write+0x68/0xfc
[   39.292221]  __arm64_sys_write+0x1c/0x28
[   39.296258]  invoke_syscall+0x48/0x110
[   39.300119]  el0_svc_common.constprop.0+0x40/0xe8
[   39.304952]  do_el0_svc+0x20/0x2c
[   39.308365]  el0_svc+0x30/0xd0
[   39.311515]  el0t_64_sync_handler+0x144/0x168
[   39.316002]  el0t_64_sync+0x198/0x19c
[   39.319766] Code: f9414aa0 d503201f d2800101 91030000 (f821101f)
[   39.326029] ---[ end trace 0000000000000000 ]---

Thanks,
Qiang
Lukas Wunner Dec. 15, 2024, 5:32 p.m. UTC | #3
On Wed, Dec 11, 2024 at 05:55:48PM +0800, Qiang Yu wrote:
> PCIe3 is able to link up after applying your patch. Slot power is turned on
> correctly.
> But see "NULL pointer dereference" when I try to remove device.

There's a WARN splat occurring before the NULL pointer deref.
Was this happening before or is it new?  Probably makes sense
to debug that first before looking into the NULL pointer deref,
which could be a result of it.


> [   38.757726] WARNING: CPU: 1 PID: 816 at drivers/regulator/core.c:5857
> regulator_unregister+0x13c/0x160
> [   38.767288] Modules linked in: phy_qcom_qmp_combo aux_bridge
> drm_kms_helper drm nvme backlight pinctrl_sm8550_lpass_lpi pci_pwrctl_slot
> pci_pwrctrl_core nvme_core phy_qcom_edp phy_qcom_eusb2_repeater
> dispcc_x1e80100 pinctrl_lpass_lpi phy_qcom_snps_eusb2 lpasscc_sc8280xp typec
> gpucc_x1e80100 phy_qcom_qmp_pcie
> [   38.795279] CPU: 1 UID: 0 PID: 816 Comm: bash Not tainted
> 6.12.0-next-20241128-00005-g6178bf6ce3c2-dirty #50
> [   38.805359] Hardware name: Qualcomm IDP, BIOS
> 6.0.240607.BOOT.MXF.2.4-00348.1-HAMOA-1.67705.7 06/ 7/2024
> [   38.815088] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS
> BTYPE=--)
> [   38.822239] pc : regulator_unregister+0x13c/0x160
> [   38.827081] lr : regulator_unregister+0xc0/0x160

The WARN splat seems to be caused by:

	WARN_ON(rdev->open_count);

So the regulator is unregistered although it's still in use.
Is there maybe a multifunction PCIe device in your system
so that multiple devices are using the same regulator?

Thanks,

Lukas
Manivannan Sadhasivam Dec. 16, 2024, 5:21 a.m. UTC | #4
On Sun, Dec 15, 2024 at 06:32:02PM +0100, Lukas Wunner wrote:
> On Wed, Dec 11, 2024 at 05:55:48PM +0800, Qiang Yu wrote:
> > PCIe3 is able to link up after applying your patch. Slot power is turned on
> > correctly.
> > But see "NULL pointer dereference" when I try to remove device.
> 
> There's a WARN splat occurring before the NULL pointer deref.
> Was this happening before or is it new?  Probably makes sense
> to debug that first before looking into the NULL pointer deref,
> which could be a result of it.
> 

Precisely.

> 
> > [   38.757726] WARNING: CPU: 1 PID: 816 at drivers/regulator/core.c:5857
> > regulator_unregister+0x13c/0x160
> > [   38.767288] Modules linked in: phy_qcom_qmp_combo aux_bridge
> > drm_kms_helper drm nvme backlight pinctrl_sm8550_lpass_lpi pci_pwrctl_slot
> > pci_pwrctrl_core nvme_core phy_qcom_edp phy_qcom_eusb2_repeater
> > dispcc_x1e80100 pinctrl_lpass_lpi phy_qcom_snps_eusb2 lpasscc_sc8280xp typec
> > gpucc_x1e80100 phy_qcom_qmp_pcie
> > [   38.795279] CPU: 1 UID: 0 PID: 816 Comm: bash Not tainted
> > 6.12.0-next-20241128-00005-g6178bf6ce3c2-dirty #50
> > [   38.805359] Hardware name: Qualcomm IDP, BIOS
> > 6.0.240607.BOOT.MXF.2.4-00348.1-HAMOA-1.67705.7 06/ 7/2024
> > [   38.815088] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS
> > BTYPE=--)
> > [   38.822239] pc : regulator_unregister+0x13c/0x160
> > [   38.827081] lr : regulator_unregister+0xc0/0x160
> 
> The WARN splat seems to be caused by:
> 
> 	WARN_ON(rdev->open_count);
> 
> So the regulator is unregistered although it's still in use.
> Is there maybe a multifunction PCIe device in your system
> so that multiple devices are using the same regulator?
> 

Maybe the regulator is shared with other peripherals (not just PCIe) in the
system.

@Qiang: I referred your patch [1] that added the slot regulators, but they were
not used by any peripherals other than PCIe. Could you please post the list of
consumers of the 3 slot regulators?

FYI, I did test root port remove on RB5 board (with dummy fixed regulators) and
it worked fine.

- Mani

[1] https://lore.kernel.org/linux-pci/20240827063631.3932971-8-quic_qianyu@quicinc.com/
Manivannan Sadhasivam Dec. 16, 2024, 8:26 a.m. UTC | #5
On Mon, Dec 16, 2024 at 10:51:18AM +0530, Manivannan Sadhasivam wrote:
> On Sun, Dec 15, 2024 at 06:32:02PM +0100, Lukas Wunner wrote:
> > On Wed, Dec 11, 2024 at 05:55:48PM +0800, Qiang Yu wrote:
> > > PCIe3 is able to link up after applying your patch. Slot power is turned on
> > > correctly.
> > > But see "NULL pointer dereference" when I try to remove device.
> > 
> > There's a WARN splat occurring before the NULL pointer deref.
> > Was this happening before or is it new?  Probably makes sense
> > to debug that first before looking into the NULL pointer deref,
> > which could be a result of it.
> > 
> 
> Precisely.
> 
> > 
> > > [   38.757726] WARNING: CPU: 1 PID: 816 at drivers/regulator/core.c:5857
> > > regulator_unregister+0x13c/0x160
> > > [   38.767288] Modules linked in: phy_qcom_qmp_combo aux_bridge
> > > drm_kms_helper drm nvme backlight pinctrl_sm8550_lpass_lpi pci_pwrctl_slot
> > > pci_pwrctrl_core nvme_core phy_qcom_edp phy_qcom_eusb2_repeater
> > > dispcc_x1e80100 pinctrl_lpass_lpi phy_qcom_snps_eusb2 lpasscc_sc8280xp typec
> > > gpucc_x1e80100 phy_qcom_qmp_pcie
> > > [   38.795279] CPU: 1 UID: 0 PID: 816 Comm: bash Not tainted
> > > 6.12.0-next-20241128-00005-g6178bf6ce3c2-dirty #50
> > > [   38.805359] Hardware name: Qualcomm IDP, BIOS
> > > 6.0.240607.BOOT.MXF.2.4-00348.1-HAMOA-1.67705.7 06/ 7/2024
> > > [   38.815088] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS
> > > BTYPE=--)
> > > [   38.822239] pc : regulator_unregister+0x13c/0x160
> > > [   38.827081] lr : regulator_unregister+0xc0/0x160
> > 
> > The WARN splat seems to be caused by:
> > 
> > 	WARN_ON(rdev->open_count);
> > 
> > So the regulator is unregistered although it's still in use.
> > Is there maybe a multifunction PCIe device in your system
> > so that multiple devices are using the same regulator?
> > 
> 
> Maybe the regulator is shared with other peripherals (not just PCIe) in the
> system.
> 
> @Qiang: I referred your patch [1] that added the slot regulators, but they were
> not used by any peripherals other than PCIe. Could you please post the list of
> consumers of the 3 slot regulators?
>

Just looked briefly into regulator_unregister() and I can see that it will get
called only when the regulator driver is unbound from the regulator device. Your
previous DT reference suggests that you were probably using fixed regulator for
all 3 slot regulators. In that case, this splat can occur when the regulator
driver is unbound (module unload?) with still one of the consumers holding
reference.

So somehow regulator_put() is never called for that consumer but the regulator
is removed. This looks like a bug somewhere.

- Mani