Message ID | tencent_2256A7C02F7849F1D89390E488704E826D06@qq.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] PM: EM: Fix potential division-by-zero error in em_compute_costs() | expand |
On 4/17/25 02:07, Yaxiong Tian wrote: > From: Yaxiong Tian <tianyaxiong@kylinos.cn> > > When the device is of a non-CPU type, table[i].performance won't be > initialized in the previous em_init_performance(), resulting in division > by zero when calculating costs in em_compute_costs(). > > Since the 'cost' algorithm is only used for EAS energy efficiency > calculations and is currently not utilized by other device drivers, we > should add the _is_cpu_device(dev) check to prevent this division-by-zero > issue. > > Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") > Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > --- > kernel/power/energy_model.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index d9b7e2b38c7a..41606247c277 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > unsigned long prev_cost = ULONG_MAX; > int i, ret; > > + /* This is needed only for CPUs and EAS skip other devices */ > + if (!_is_cpu_device(dev)) > + return 0; > + > /* Compute the cost of each performance state. */ > for (i = nr_states - 1; i >= 0; i--) { > unsigned long power_res, cost; Please stop for a while. I have to check what happened that you faced the issue in the first place. I have been testing the GPU EMs and there was no issues... Let me debug that today.
在 2025/4/17 13:57, Lukasz Luba 写道: > > > On 4/17/25 02:07, Yaxiong Tian wrote: >> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >> >> When the device is of a non-CPU type, table[i].performance won't be >> initialized in the previous em_init_performance(), resulting in division >> by zero when calculating costs in em_compute_costs(). >> >> Since the 'cost' algorithm is only used for EAS energy efficiency >> calculations and is currently not utilized by other device drivers, we >> should add the _is_cpu_device(dev) check to prevent this division-by-zero >> issue. >> >> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove >> division") >> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >> --- >> kernel/power/energy_model.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >> index d9b7e2b38c7a..41606247c277 100644 >> --- a/kernel/power/energy_model.c >> +++ b/kernel/power/energy_model.c >> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, >> struct em_perf_state *table, >> unsigned long prev_cost = ULONG_MAX; >> int i, ret; >> + /* This is needed only for CPUs and EAS skip other devices */ >> + if (!_is_cpu_device(dev)) >> + return 0; >> + >> /* Compute the cost of each performance state. */ >> for (i = nr_states - 1; i >= 0; i--) { >> unsigned long power_res, cost; > > > Please stop for a while. I have to check what happened that you > faced the issue in the first place. I have been testing the GPU > EMs and there was no issues... > > Let me debug that today. Of course. Since I don't have actual hardware, I can only logically deduce that this issue might exist.
On 4/17/25 08:43, Yaxiong Tian wrote: > > > 在 2025/4/17 13:57, Lukasz Luba 写道: >> >> >> On 4/17/25 02:07, Yaxiong Tian wrote: >>> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >>> >>> When the device is of a non-CPU type, table[i].performance won't be >>> initialized in the previous em_init_performance(), resulting in division >>> by zero when calculating costs in em_compute_costs(). >>> >>> Since the 'cost' algorithm is only used for EAS energy efficiency >>> calculations and is currently not utilized by other device drivers, we >>> should add the _is_cpu_device(dev) check to prevent this >>> division-by-zero >>> issue. >>> >>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove >>> division") >>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >>> --- >>> kernel/power/energy_model.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >>> index d9b7e2b38c7a..41606247c277 100644 >>> --- a/kernel/power/energy_model.c >>> +++ b/kernel/power/energy_model.c >>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, >>> struct em_perf_state *table, >>> unsigned long prev_cost = ULONG_MAX; >>> int i, ret; >>> + /* This is needed only for CPUs and EAS skip other devices */ >>> + if (!_is_cpu_device(dev)) >>> + return 0; >>> + >>> /* Compute the cost of each performance state. */ >>> for (i = nr_states - 1; i >= 0; i--) { >>> unsigned long power_res, cost; >> >> >> Please stop for a while. I have to check what happened that you >> faced the issue in the first place. I have been testing the GPU >> EMs and there was no issues... >> >> Let me debug that today. > > Of course. Since I don't have actual hardware, I can only logically > deduce that this issue might exist. > > I have run with the GPU EM registered in the boot: ------------------------------------------------------- [ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain [ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor 0x0 status 0x0 [ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407, issues: 00000000,24040400 [ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206 Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7 [ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) [ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1 root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags 0x1 root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/* /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0 /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790 -------------------------------------------------------- The EM for the GPU is not modified during the boot like the CPUs' EM are, thus this code is not triggered. Although, the API is open and in theory the GPU EM can be modified at runtime as well and it will reach that em_compute_costs() issue with 'performance' field having value 0. So this v4 patch would be needed in this case. Please re-send this v4 patch as a completely new message. Thanks for looking at that code path and the fix for potential issue. You can also add my: Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Regrds, Lukasz
在 2025/4/17 21:27, Lukasz Luba 写道: > > > On 4/17/25 08:43, Yaxiong Tian wrote: >> >> >> 在 2025/4/17 13:57, Lukasz Luba 写道: >>> >>> >>> On 4/17/25 02:07, Yaxiong Tian wrote: >>>> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >>>> >>>> When the device is of a non-CPU type, table[i].performance won't be >>>> initialized in the previous em_init_performance(), resulting in >>>> division >>>> by zero when calculating costs in em_compute_costs(). >>>> >>>> Since the 'cost' algorithm is only used for EAS energy efficiency >>>> calculations and is currently not utilized by other device drivers, we >>>> should add the _is_cpu_device(dev) check to prevent this >>>> division-by-zero >>>> issue. >>>> >>>> Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove >>>> division") >>>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >>>> --- >>>> kernel/power/energy_model.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >>>> index d9b7e2b38c7a..41606247c277 100644 >>>> --- a/kernel/power/energy_model.c >>>> +++ b/kernel/power/energy_model.c >>>> @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, >>>> struct em_perf_state *table, >>>> unsigned long prev_cost = ULONG_MAX; >>>> int i, ret; >>>> + /* This is needed only for CPUs and EAS skip other devices */ >>>> + if (!_is_cpu_device(dev)) >>>> + return 0; >>>> + >>>> /* Compute the cost of each performance state. */ >>>> for (i = nr_states - 1; i >= 0; i--) { >>>> unsigned long power_res, cost; >>> >>> >>> Please stop for a while. I have to check what happened that you >>> faced the issue in the first place. I have been testing the GPU >>> EMs and there was no issues... >>> >>> Let me debug that today. >> >> Of course. Since I don't have actual hardware, I can only logically >> deduce that this issue might exist. >> >> > > I have run with the GPU EM registered in the boot: > > ------------------------------------------------------- > [ 2.753333] panfrost ff9a0000.gpu: EM: created perf domain > [ 2.759863] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor > 0x0 status 0x0 > [ 2.768530] panfrost ff9a0000.gpu: features: 00000000,00000407, > issues: 00000000,24040400 > [ 2.777678] panfrost ff9a0000.gpu: Features: L2:0x07120206 > Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7 > [ 2.780746] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req > 150000000Hz, actual 148500000HZ div = 0) > [ 2.790905] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1 > > root@arm:~# cat /sys/kernel/debug/energy_model/ff9a0000.gpu/flags > 0x1 > root@arm:~# grep . /sys/kernel/debug/energy_model/ff9a0000.gpu/ps*/* > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/cost:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/frequency:200000 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/inefficient:1 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/performance:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:200000/power:404250 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/cost:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/frequency:300000 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/inefficient:1 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/performance:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:300000/power:606375 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/cost:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/frequency:400000 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/inefficient:1 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/performance:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:400000/power:808500 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/cost:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/frequency:600000 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/inefficient:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/performance:0 > /sys/kernel/debug/energy_model/ff9a0000.gpu/ps:600000/power:1505790 > > -------------------------------------------------------- > > The EM for the GPU is not modified during the boot like the CPUs' > EM are, thus this code is not triggered. Although, the API is > open and in theory the GPU EM can be modified at runtime > as well and it will reach that em_compute_costs() issue > with 'performance' field having value 0. > > So this v4 patch would be needed in this case. > > Please re-send this v4 patch as a completely new message. > > Thanks for looking at that code path and the fix for potential > issue. > > You can also add my: > > Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> > > Regrds, > Lukasz Got it - patch resent with Reviewed-by. https://lore.kernel.org/all/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com/
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index d9b7e2b38c7a..41606247c277 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -233,6 +233,10 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, unsigned long prev_cost = ULONG_MAX; int i, ret; + /* This is needed only for CPUs and EAS skip other devices */ + if (!_is_cpu_device(dev)) + return 0; + /* Compute the cost of each performance state. */ for (i = nr_states - 1; i >= 0; i--) { unsigned long power_res, cost;