Message ID | 20240910180530.47194-1-sebastian.reichel@collabora.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix RK3588 GPU domain | expand |
On Tue, 10 Sept 2024 at 20:05, Sebastian Reichel <sebastian.reichel@collabora.com> wrote: > > Hi, > > I got a report, that the Linux kernel crashes on Rock 5B when the panthor > driver is loaded late after booting. The crash starts with the following > shortened error print: > > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0 > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0xa9fff > SError Interrupt on CPU4, code 0x00000000be000411 -- SError > > This series first does some cleanups in the Rockchip power domain > driver and changes the driver, so that it no longer tries to continue > when it fails to enable a domain. This gets rid of the SError interrupt > and long backtraces. But the kernel still hangs when it fails to enable > a power domain. I have not done further analysis to check if that can > be avoided. > > Last but not least this provides a fix for the GPU power domain failing > to get enabled - after some testing from my side it seems to require the > GPU voltage supply to be enabled. > > I'm not really happy about the hack to get a regulator for a sub-node > in the 5th patch, which I took over from the Mediatek driver. But to > get things going and open a discussion around it I thought it would be > best to send a first version as soon as possible. That creates a circular dependency from the fw_devlink point of view. I assume that isn't a problem and fw_devlink takes care of this, so the GPU power domain still can probe? Other than this, I think this looks okay to me. Kind regards Uffe > > Greetings, > > -- Sebastian > Sebastian Reichel (6): > pmdomain: rockchip: forward rockchip_do_pmu_set_power_domain errors > pmdomain: rockchip: cleanup mutex handling in rockchip_pd_power > pmdomain: rockchip: reduce indention in rockchip_pd_power > dt-bindings: power: rockchip: add regulator support > pmdomain: rockchip: add regulator support > arm64: dts: rockchip: Add GPU power domain regulator dependency for > RK3588 > > .../power/rockchip,power-controller.yaml | 3 + > .../boot/dts/rockchip/rk3588-armsom-sige7.dts | 4 + > arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 2 +- > .../boot/dts/rockchip/rk3588-coolpi-cm5.dtsi | 4 + > .../rockchip/rk3588-friendlyelec-cm3588.dtsi | 4 + > .../arm64/boot/dts/rockchip/rk3588-jaguar.dts | 4 + > .../boot/dts/rockchip/rk3588-ok3588-c.dts | 4 + > .../boot/dts/rockchip/rk3588-rock-5-itx.dts | 4 + > .../boot/dts/rockchip/rk3588-rock-5b.dts | 4 + > .../arm64/boot/dts/rockchip/rk3588-tiger.dtsi | 4 + > .../boot/dts/rockchip/rk3588s-coolpi-4b.dts | 4 + > .../dts/rockchip/rk3588s-khadas-edge2.dts | 4 + > .../boot/dts/rockchip/rk3588s-orangepi-5.dts | 4 + > drivers/pmdomain/rockchip/pm-domains.c | 130 +++++++++++++----- > 14 files changed, 144 insertions(+), 35 deletions(-) > > -- > 2.45.2 >
Hi, Sebastian, thanks for the patches. I've tested it on a Rockchip 5b board and now I can reload the driver at any time. Tested-by: Adrian Larumbe <adrian.larumbe@collabora.com> On 10.09.2024 19:57, Sebastian Reichel wrote: > Hi, > > I got a report, that the Linux kernel crashes on Rock 5B when the panthor > driver is loaded late after booting. The crash starts with the following > shortened error print: > > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0 > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0xa9fff > SError Interrupt on CPU4, code 0x00000000be000411 -- SError > > This series first does some cleanups in the Rockchip power domain > driver and changes the driver, so that it no longer tries to continue > when it fails to enable a domain. This gets rid of the SError interrupt > and long backtraces. But the kernel still hangs when it fails to enable > a power domain. I have not done further analysis to check if that can > be avoided. > > Last but not least this provides a fix for the GPU power domain failing > to get enabled - after some testing from my side it seems to require the > GPU voltage supply to be enabled. > > I'm not really happy about the hack to get a regulator for a sub-node > in the 5th patch, which I took over from the Mediatek driver. But to > get things going and open a discussion around it I thought it would be > best to send a first version as soon as possible. > > Greetings, > > -- Sebastian > Sebastian Reichel (6): > pmdomain: rockchip: forward rockchip_do_pmu_set_power_domain errors > pmdomain: rockchip: cleanup mutex handling in rockchip_pd_power > pmdomain: rockchip: reduce indention in rockchip_pd_power > dt-bindings: power: rockchip: add regulator support > pmdomain: rockchip: add regulator support > arm64: dts: rockchip: Add GPU power domain regulator dependency for > RK3588 > > .../power/rockchip,power-controller.yaml | 3 + > .../boot/dts/rockchip/rk3588-armsom-sige7.dts | 4 + > arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 2 +- > .../boot/dts/rockchip/rk3588-coolpi-cm5.dtsi | 4 + > .../rockchip/rk3588-friendlyelec-cm3588.dtsi | 4 + > .../arm64/boot/dts/rockchip/rk3588-jaguar.dts | 4 + > .../boot/dts/rockchip/rk3588-ok3588-c.dts | 4 + > .../boot/dts/rockchip/rk3588-rock-5-itx.dts | 4 + > .../boot/dts/rockchip/rk3588-rock-5b.dts | 4 + > .../arm64/boot/dts/rockchip/rk3588-tiger.dtsi | 4 + > .../boot/dts/rockchip/rk3588s-coolpi-4b.dts | 4 + > .../dts/rockchip/rk3588s-khadas-edge2.dts | 4 + > .../boot/dts/rockchip/rk3588s-orangepi-5.dts | 4 + > drivers/pmdomain/rockchip/pm-domains.c | 130 +++++++++++++----- > 14 files changed, 144 insertions(+), 35 deletions(-) > > -- > 2.45.2 Adrian Larumbe
Hi, On Fri, Sep 13, 2024 at 01:59:10PM GMT, Ulf Hansson wrote: > On Tue, 10 Sept 2024 at 20:05, Sebastian Reichel > <sebastian.reichel@collabora.com> wrote: > > I got a report, that the Linux kernel crashes on Rock 5B when the panthor > > driver is loaded late after booting. The crash starts with the following > > shortened error print: > > > > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0 > > rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0xa9fff > > SError Interrupt on CPU4, code 0x00000000be000411 -- SError > > > > This series first does some cleanups in the Rockchip power domain > > driver and changes the driver, so that it no longer tries to continue > > when it fails to enable a domain. This gets rid of the SError interrupt > > and long backtraces. But the kernel still hangs when it fails to enable > > a power domain. I have not done further analysis to check if that can > > be avoided. > > > > Last but not least this provides a fix for the GPU power domain failing > > to get enabled - after some testing from my side it seems to require the > > GPU voltage supply to be enabled. > > > > I'm not really happy about the hack to get a regulator for a sub-node > > in the 5th patch, which I took over from the Mediatek driver. But to > > get things going and open a discussion around it I thought it would be > > best to send a first version as soon as possible. > > That creates a circular dependency from the fw_devlink point of view. Yes. > I assume that isn't a problem and fw_devlink takes care of this, so > the GPU power domain still can probe? This has been tested on Radxa Rock 5B and RK3588 EVB1. It properly probes the GPU power domain and fixes late probing of the GPU driver :) > Other than this, I think this looks okay to me. I will send a V2 with the minor things pointed out. Greetings, -- Sebastian