Message ID | d3e9dc4201d38894b09f3198368428153a3af1a4.1728555461.git.dsimic@manjaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: dts: rockchip: Prevent thermal runaways in RK3308 SoC dtsi | expand |
On 2024-10-10 12:19, Dragan Simic wrote: > Until the TSADC, thermal zones, thermal trips and cooling maps are > defined > in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may > be > enabled under any circumstances. Allowing the DVFS to scale the CPU > cores > up without even just the critical CPU thermal trip in place can rather > easily > result in thermal runaways and damaged SoCs, which is bad. > > Thus, leave only the lowest available CPU OPP enabled for now. > > Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 > SOC") > Cc: stable@vger.kernel.org > Signed-off-by: Dragan Simic <dsimic@manjaro.org> As a note, I'll hopefully get back with the proper implementation of the thermal configuration for the RK3308, but not before the 6.14 merge window. In the meantime, let's stick to having only the lowest CPU OPP in place, as changed in this patch. > --- > arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > index 31c25de2d689..a7698e1f6b9e 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > @@ -120,16 +120,19 @@ opp-600000000 { > opp-hz = /bits/ 64 <600000000>; > opp-microvolt = <950000 950000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-816000000 { > opp-hz = /bits/ 64 <816000000>; > opp-microvolt = <1025000 1025000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-1008000000 { > opp-hz = /bits/ 64 <1008000000>; > opp-microvolt = <1125000 1125000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > };
On Thu 2024-10-10 @ 12:19:41 PM, Dragan Simic wrote: > Until the TSADC, thermal zones, thermal trips and cooling maps are defined > in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may be > enabled under any circumstances. Allowing the DVFS to scale the CPU cores > up without even just the critical CPU thermal trip in place can rather easily > result in thermal runaways and damaged SoCs, which is bad. > > Thus, leave only the lowest available CPU OPP enabled for now. > It builds, it runs, it's been running on one of my rock-pi-s boards for ~3h now. I can read my spi, i2c, and w1 sensors, so no issues for me. # cat /sys/bus/cpu/devices/cpu*/cpufreq/stats/time_in_state 408000 1168942 408000 1168942 408000 1168942 408000 1168942 Tested-by: Trevor Woerner <twoerner@gmail.com> > Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 SOC") > Cc: stable@vger.kernel.org > Signed-off-by: Dragan Simic <dsimic@manjaro.org> > --- > arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > index 31c25de2d689..a7698e1f6b9e 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > @@ -120,16 +120,19 @@ opp-600000000 { > opp-hz = /bits/ 64 <600000000>; > opp-microvolt = <950000 950000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-816000000 { > opp-hz = /bits/ 64 <816000000>; > opp-microvolt = <1025000 1025000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-1008000000 { > opp-hz = /bits/ 64 <1008000000>; > opp-microvolt = <1125000 1125000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > }; > >
On Thu, 10 Oct 2024 12:19:41 +0200, Dragan Simic wrote: > Until the TSADC, thermal zones, thermal trips and cooling maps are defined > in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may be > enabled under any circumstances. Allowing the DVFS to scale the CPU cores > up without even just the critical CPU thermal trip in place can rather easily > result in thermal runaways and damaged SoCs, which is bad. > > Thus, leave only the lowest available CPU OPP enabled for now. > > [...] Applied, thanks! [1/1] arm64: dts: rockchip: Prevent thermal runaways in RK3308 SoC dtsi commit: 864f1a5b390278a4a8d4a6d7425c7022477c6c9f Best regards,
Hi Dragan, On 2024-10-10 12:19, Dragan Simic wrote: > Until the TSADC, thermal zones, thermal trips and cooling maps are defined > in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may be > enabled under any circumstances. Allowing the DVFS to scale the CPU cores > up without even just the critical CPU thermal trip in place can rather easily > result in thermal runaways and damaged SoCs, which is bad. > > Thus, leave only the lowest available CPU OPP enabled for now. This feel like a very aggressive limitation, to only allow the opp-suspend rate, that is not even used under normal load. I let my Rock Pi S board with a RK3308B variant run "stress -c 8" for around 10 hours and the reported temp only reach around 50-55 deg c, ambient temp around 20 deg c and board laying flat on a table without any enclosure or heat sink. This was running with performance as scaling_governor and cpu running the 1008000 opp. Most RK3308 variants datasheets list 1.3 GHz as max rate for CPU, the K-variant lists 1.2 GHz, and the -S-variants seem to have both reduced voltage and max rate. The OPPs for this SoC already limits max rate to 1 GHz and is more than likely good enough to not reach the max temperature of 115-125 deg c as rated in datasheets and vendor DTs. Adding the tsadc and trips (same/similar as px30) will probably allow us to add/use the "missing" 1.2 and 1.3 GHz OPPs. Regards, Jonas > > Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 SOC") > Cc: stable@vger.kernel.org > Signed-off-by: Dragan Simic <dsimic@manjaro.org> > --- > arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > index 31c25de2d689..a7698e1f6b9e 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > @@ -120,16 +120,19 @@ opp-600000000 { > opp-hz = /bits/ 64 <600000000>; > opp-microvolt = <950000 950000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-816000000 { > opp-hz = /bits/ 64 <816000000>; > opp-microvolt = <1025000 1025000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > opp-1008000000 { > opp-hz = /bits/ 64 <1008000000>; > opp-microvolt = <1125000 1125000 1340000>; > clock-latency-ns = <40000>; > + status = "disabled"; > }; > }; > > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
Hello Jonas, On 2024-10-11 10:52, Jonas Karlman wrote: > On 2024-10-10 12:19, Dragan Simic wrote: >> Until the TSADC, thermal zones, thermal trips and cooling maps are >> defined >> in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one >> may be >> enabled under any circumstances. Allowing the DVFS to scale the CPU >> cores >> up without even just the critical CPU thermal trip in place can rather >> easily >> result in thermal runaways and damaged SoCs, which is bad. >> >> Thus, leave only the lowest available CPU OPP enabled for now. > > This feel like a very aggressive limitation, to only allow the > opp-suspend rate, that is not even used under normal load. > > I let my Rock Pi S board with a RK3308B variant run "stress -c 8" for > around 10 hours and the reported temp only reach around 50-55 deg c, > ambient temp around 20 deg c and board laying flat on a table without > any enclosure or heat sink. > > This was running with performance as scaling_governor and cpu running > the 1008000 opp. Thanks for testing all that! That's very low CPU temperature under stress testing indeed. Maybe the cooling gets worse and the CPU temperature goes higher if the board is installed into some small enclosure with no natural or forced airflow? > Most RK3308 variants datasheets list 1.3 GHz as max rate for CPU, > the K-variant lists 1.2 GHz, and the -S-variants seem to have both > reduced voltage and max rate. > > The OPPs for this SoC already limits max rate to 1 GHz and is more than > likely good enough to not reach the max temperature of 115-125 deg c as > rated in datasheets and vendor DTs. > > Adding the tsadc and trips (same/similar as px30) will probably allow > us > to add/use the "missing" 1.2 and 1.3 GHz OPPs. With these insights, I agree that the patch might have been a bit too extreme, but it also promotes good practices when it comes to upstreaming. The general rule is not to add CPU or GPU OPPs with no proper thermal configuration already in place. The patch has already been merged, and as I already noted, [1] I'll try to implement, test and submit the proper thermal configuration ASAP. It's up Heiko to decide whether to drop this patch or not. [1] https://lore.kernel.org/linux-rockchip/df92710498f66bcb4580cb2cd1573fb2@manjaro.org/ >> Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 >> SOC") >> Cc: stable@vger.kernel.org >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> >> --- >> arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> b/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> index 31c25de2d689..a7698e1f6b9e 100644 >> --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> @@ -120,16 +120,19 @@ opp-600000000 { >> opp-hz = /bits/ 64 <600000000>; >> opp-microvolt = <950000 950000 1340000>; >> clock-latency-ns = <40000>; >> + status = "disabled"; >> }; >> opp-816000000 { >> opp-hz = /bits/ 64 <816000000>; >> opp-microvolt = <1025000 1025000 1340000>; >> clock-latency-ns = <40000>; >> + status = "disabled"; >> }; >> opp-1008000000 { >> opp-hz = /bits/ 64 <1008000000>; >> opp-microvolt = <1125000 1125000 1340000>; >> clock-latency-ns = <40000>; >> + status = "disabled"; >> }; >> };
Am Freitag, 11. Oktober 2024, 11:04:38 CEST schrieb Dragan Simic: > Hello Jonas, > > On 2024-10-11 10:52, Jonas Karlman wrote: > > On 2024-10-10 12:19, Dragan Simic wrote: > >> Until the TSADC, thermal zones, thermal trips and cooling maps are > >> defined > >> in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one > >> may be > >> enabled under any circumstances. Allowing the DVFS to scale the CPU > >> cores > >> up without even just the critical CPU thermal trip in place can rather > >> easily > >> result in thermal runaways and damaged SoCs, which is bad. > >> > >> Thus, leave only the lowest available CPU OPP enabled for now. > > > > This feel like a very aggressive limitation, to only allow the > > opp-suspend rate, that is not even used under normal load. > > > > I let my Rock Pi S board with a RK3308B variant run "stress -c 8" for > > around 10 hours and the reported temp only reach around 50-55 deg c, > > ambient temp around 20 deg c and board laying flat on a table without > > any enclosure or heat sink. > > > > This was running with performance as scaling_governor and cpu running > > the 1008000 opp. > > Thanks for testing all that! That's very low CPU temperature under > stress testing indeed. Maybe the cooling gets worse and the CPU > temperature goes higher if the board is installed into some small > enclosure with no natural or forced airflow? > > > Most RK3308 variants datasheets list 1.3 GHz as max rate for CPU, > > the K-variant lists 1.2 GHz, and the -S-variants seem to have both > > reduced voltage and max rate. > > > > The OPPs for this SoC already limits max rate to 1 GHz and is more than > > likely good enough to not reach the max temperature of 115-125 deg c as > > rated in datasheets and vendor DTs. > > > > Adding the tsadc and trips (same/similar as px30) will probably allow > > us > > to add/use the "missing" 1.2 and 1.3 GHz OPPs. > > With these insights, I agree that the patch might have been a bit > too extreme, but it also promotes good practices when it comes to > upstreaming. The general rule is not to add CPU or GPU OPPs with > no proper thermal configuration already in place. > > The patch has already been merged, and as I already noted, [1] I'll > try to implement, test and submit the proper thermal configuration > ASAP. It's up Heiko to decide whether to drop this patch or not. Hmm, interesting question ;-) . Dropping the patch is of course still possible and so far we haven't actually seen anyone with real-world problems. And with Jonas' stress test, it does look like nobody will in the (hopefully short) time till we have thermal management. @Dragan, if you're in favor of that I'll drop the patch. Heiko > > [1] > https://lore.kernel.org/linux-rockchip/df92710498f66bcb4580cb2cd1573fb2@manjaro.org/ > > >> Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 > >> SOC") > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> > >> --- > >> arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > >> b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > >> index 31c25de2d689..a7698e1f6b9e 100644 > >> --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi > >> +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi > >> @@ -120,16 +120,19 @@ opp-600000000 { > >> opp-hz = /bits/ 64 <600000000>; > >> opp-microvolt = <950000 950000 1340000>; > >> clock-latency-ns = <40000>; > >> + status = "disabled"; > >> }; > >> opp-816000000 { > >> opp-hz = /bits/ 64 <816000000>; > >> opp-microvolt = <1025000 1025000 1340000>; > >> clock-latency-ns = <40000>; > >> + status = "disabled"; > >> }; > >> opp-1008000000 { > >> opp-hz = /bits/ 64 <1008000000>; > >> opp-microvolt = <1125000 1125000 1340000>; > >> clock-latency-ns = <40000>; > >> + status = "disabled"; > >> }; > >> }; >
Hello Heiko, On 2024-10-11 11:56, Heiko Stübner wrote: > Am Freitag, 11. Oktober 2024, 11:04:38 CEST schrieb Dragan Simic: >> On 2024-10-11 10:52, Jonas Karlman wrote: >> > On 2024-10-10 12:19, Dragan Simic wrote: >> >> Until the TSADC, thermal zones, thermal trips and cooling maps are >> >> defined >> >> in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one >> >> may be >> >> enabled under any circumstances. Allowing the DVFS to scale the CPU >> >> cores >> >> up without even just the critical CPU thermal trip in place can rather >> >> easily >> >> result in thermal runaways and damaged SoCs, which is bad. >> >> >> >> Thus, leave only the lowest available CPU OPP enabled for now. >> > >> > This feel like a very aggressive limitation, to only allow the >> > opp-suspend rate, that is not even used under normal load. >> > >> > I let my Rock Pi S board with a RK3308B variant run "stress -c 8" for >> > around 10 hours and the reported temp only reach around 50-55 deg c, >> > ambient temp around 20 deg c and board laying flat on a table without >> > any enclosure or heat sink. >> > >> > This was running with performance as scaling_governor and cpu running >> > the 1008000 opp. >> >> Thanks for testing all that! That's very low CPU temperature under >> stress testing indeed. Maybe the cooling gets worse and the CPU >> temperature goes higher if the board is installed into some small >> enclosure with no natural or forced airflow? >> >> > Most RK3308 variants datasheets list 1.3 GHz as max rate for CPU, >> > the K-variant lists 1.2 GHz, and the -S-variants seem to have both >> > reduced voltage and max rate. >> > >> > The OPPs for this SoC already limits max rate to 1 GHz and is more than >> > likely good enough to not reach the max temperature of 115-125 deg c as >> > rated in datasheets and vendor DTs. >> > >> > Adding the tsadc and trips (same/similar as px30) will probably allow >> > us >> > to add/use the "missing" 1.2 and 1.3 GHz OPPs. >> >> With these insights, I agree that the patch might have been a bit >> too extreme, but it also promotes good practices when it comes to >> upstreaming. The general rule is not to add CPU or GPU OPPs with >> no proper thermal configuration already in place. >> >> The patch has already been merged, and as I already noted, [1] I'll >> try to implement, test and submit the proper thermal configuration >> ASAP. It's up Heiko to decide whether to drop this patch or not. > > Hmm, interesting question ;-) . > > Dropping the patch is of course still possible and so far we haven't > actually seen anyone with real-world problems. > > And with Jonas' stress test, it does look like nobody will in the > (hopefully short) time till we have thermal management. > > @Dragan, if you're in favor of that I'll drop the patch. I hope I'll have the proper RK3308 thermal configuration available for the 6.14 merge window, also with higher OPPs in place. Knowing that we've seemingly had no RK3308 SoCs releasing the magic smoke (yet? :), I think we can keep the status quo for a couple of months or so, so let's drop this patch. Thanks again to Jonas for all the stress testing! >> [1] >> https://lore.kernel.org/linux-rockchip/df92710498f66bcb4580cb2cd1573fb2@manjaro.org/ >> >> >> Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 >> >> SOC") >> >> Cc: stable@vger.kernel.org >> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> >> >> --- >> >> arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ >> >> 1 file changed, 3 insertions(+) >> >> >> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> >> b/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> >> index 31c25de2d689..a7698e1f6b9e 100644 >> >> --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> >> +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi >> >> @@ -120,16 +120,19 @@ opp-600000000 { >> >> opp-hz = /bits/ 64 <600000000>; >> >> opp-microvolt = <950000 950000 1340000>; >> >> clock-latency-ns = <40000>; >> >> + status = "disabled"; >> >> }; >> >> opp-816000000 { >> >> opp-hz = /bits/ 64 <816000000>; >> >> opp-microvolt = <1025000 1025000 1340000>; >> >> clock-latency-ns = <40000>; >> >> + status = "disabled"; >> >> }; >> >> opp-1008000000 { >> >> opp-hz = /bits/ 64 <1008000000>; >> >> opp-microvolt = <1125000 1125000 1340000>; >> >> clock-latency-ns = <40000>; >> >> + status = "disabled"; >> >> }; >> >> }; >> > > > > > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
Am Donnerstag, 10. Oktober 2024, 22:27:44 CEST schrieb Heiko Stuebner: > On Thu, 10 Oct 2024 12:19:41 +0200, Dragan Simic wrote: > > Until the TSADC, thermal zones, thermal trips and cooling maps are defined > > in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may be > > enabled under any circumstances. Allowing the DVFS to scale the CPU cores > > up without even just the critical CPU thermal trip in place can rather easily > > result in thermal runaways and damaged SoCs, which is bad. > > > > Thus, leave only the lowest available CPU OPP enabled for now. > > > > [...] > > Applied, thanks! > > [1/1] arm64: dts: rockchip: Prevent thermal runaways in RK3308 SoC dtsi > commit: 864f1a5b390278a4a8d4a6d7425c7022477c6c9f as discussed in the other replies, I've dropped the patch again Heiko
diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi b/arch/arm64/boot/dts/rockchip/rk3308.dtsi index 31c25de2d689..a7698e1f6b9e 100644 --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi @@ -120,16 +120,19 @@ opp-600000000 { opp-hz = /bits/ 64 <600000000>; opp-microvolt = <950000 950000 1340000>; clock-latency-ns = <40000>; + status = "disabled"; }; opp-816000000 { opp-hz = /bits/ 64 <816000000>; opp-microvolt = <1025000 1025000 1340000>; clock-latency-ns = <40000>; + status = "disabled"; }; opp-1008000000 { opp-hz = /bits/ 64 <1008000000>; opp-microvolt = <1125000 1125000 1340000>; clock-latency-ns = <40000>; + status = "disabled"; }; };
Until the TSADC, thermal zones, thermal trips and cooling maps are defined in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one may be enabled under any circumstances. Allowing the DVFS to scale the CPU cores up without even just the critical CPU thermal trip in place can rather easily result in thermal runaways and damaged SoCs, which is bad. Thus, leave only the lowest available CPU OPP enabled for now. Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 SOC") Cc: stable@vger.kernel.org Signed-off-by: Dragan Simic <dsimic@manjaro.org> --- arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++ 1 file changed, 3 insertions(+)