Message ID | b47cb1f1-6ca1-7674-903f-724be43a57a0@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Dietmar, On Wednesday 06 Jun 2018 at 15:12:15 (+0200), Dietmar Eggemann wrote: > > +static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu) > > +{ > > + unsigned long cmax = arch_scale_cpu_capacity(NULL, cpu); > > + int max_cap_state = cs_table->nr_cap_states - 1; > > + unsigned long fmax = cs_table->state[max_cap_state].frequency; > > + int i; > > + > > + for (i = 0; i < cs_table->nr_cap_states; i++) > > + cs_table->state[i].capacity = cmax * > > + cs_table->state[i].frequency / fmax; > > +} > > This has issues on a 32bit system. cs_table->state[i].capacity (unsigned > long) overflows with the frequency values stored in Hz. Ah, thank you very much for pointing this out ! I haven't tried on a 32bit machine yet, my bad. I'll fix that for v4. > > Maybe something like this to cure it: > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index 6ad53f1cf7e6..c13b3eb8bf35 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -144,9 +144,11 @@ static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu) > unsigned long fmax = cs_table->state[max_cap_state].frequency; > int i; > > - for (i = 0; i < cs_table->nr_cap_states; i++) > - cs_table->state[i].capacity = cmax * > - cs_table->state[i].frequency / fmax; > + for (i = 0; i < cs_table->nr_cap_states; i++) { > + u64 val = (u64)cmax * cs_table->state[i].frequency; > + do_div(val, fmax); > + cs_table->state[i].capacity = (unsigned long)val; > + } > } Hmmm yes, that should work. > > This brings me to another question. Let's say there are multiple users of > the Energy Model in the system. Shouldn't the units of frequency and power > not standardized, maybe Mhz and mW? > The task scheduler doesn't care since it is only interested in power diffs > but other user might do. So the good thing about specifying units is that we can probably assume ranges on the values. If the power is in mW, assuming that we're talking about a single CPU, it'll probably fit in 16 bits. 65W/core should be a reasonable upper-bound ? But there are also vendors who might not be happy with disclosing absolute values ... These are sometimes considered sensitive and only relative numbers are discussed publicly. Now, you can also argue that we already have units specified in IPA for ex, and that it doesn't really matter if a driver "lies" about the real value, as long as the ratios are correct. And I guess that anyone can do measurement on the hardware and get those values anyway. So specifying a unit (mW) for the power is probably a good idea. For the frequency, I would rather keep it consistent with what CPUFreq manages. But that should be specified somewhere, I agree. What do you think ? Thanks ! Quentin
On 06/06/18 15:37, Quentin Perret wrote: > Hi Dietmar, > > On Wednesday 06 Jun 2018 at 15:12:15 (+0200), Dietmar Eggemann wrote: > > > +static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu) > > > +{ > > > + unsigned long cmax = arch_scale_cpu_capacity(NULL, cpu); > > > + int max_cap_state = cs_table->nr_cap_states - 1; > > > + unsigned long fmax = cs_table->state[max_cap_state].frequency; > > > + int i; > > > + > > > + for (i = 0; i < cs_table->nr_cap_states; i++) > > > + cs_table->state[i].capacity = cmax * > > > + cs_table->state[i].frequency / fmax; > > > +} > > > > This has issues on a 32bit system. cs_table->state[i].capacity (unsigned > > long) overflows with the frequency values stored in Hz. > > Ah, thank you very much for pointing this out ! I haven't tried on a 32bit > machine yet, my bad. I'll fix that for v4. > > > > > Maybe something like this to cure it: > > > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > > index 6ad53f1cf7e6..c13b3eb8bf35 100644 > > --- a/kernel/power/energy_model.c > > +++ b/kernel/power/energy_model.c > > @@ -144,9 +144,11 @@ static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu) > > unsigned long fmax = cs_table->state[max_cap_state].frequency; > > int i; > > > > - for (i = 0; i < cs_table->nr_cap_states; i++) > > - cs_table->state[i].capacity = cmax * > > - cs_table->state[i].frequency / fmax; > > + for (i = 0; i < cs_table->nr_cap_states; i++) { > > + u64 val = (u64)cmax * cs_table->state[i].frequency; > > + do_div(val, fmax); > > + cs_table->state[i].capacity = (unsigned long)val; > > + } > > } > > Hmmm yes, that should work. > > > > > This brings me to another question. Let's say there are multiple users of > > the Energy Model in the system. Shouldn't the units of frequency and power > > not standardized, maybe Mhz and mW? > > The task scheduler doesn't care since it is only interested in power diffs > > but other user might do. > > So the good thing about specifying units is that we can probably assume > ranges on the values. If the power is in mW, assuming that we're talking > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > a reasonable upper-bound ? > But there are also vendors who might not be happy with disclosing absolute > values ... These are sometimes considered sensitive and only relative > numbers are discussed publicly. Now, you can also argue that we already > have units specified in IPA for ex, and that it doesn't really matter if > a driver "lies" about the real value, as long as the ratios are correct. > And I guess that anyone can do measurement on the hardware and get those > values anyway. So specifying a unit (mW) for the power is probably a > good idea. Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz binding accepted, and one of the musts was that the values were going to be normalized. So, normalized power values again maybe? Best, - Juri
On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: > > > This brings me to another question. Let's say there are multiple users of > > > the Energy Model in the system. Shouldn't the units of frequency and power > > > not standardized, maybe Mhz and mW? > > > The task scheduler doesn't care since it is only interested in power diffs > > > but other user might do. > > > > So the good thing about specifying units is that we can probably assume > > ranges on the values. If the power is in mW, assuming that we're talking > > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > > a reasonable upper-bound ? > > But there are also vendors who might not be happy with disclosing absolute > > values ... These are sometimes considered sensitive and only relative > > numbers are discussed publicly. Now, you can also argue that we already > > have units specified in IPA for ex, and that it doesn't really matter if > > a driver "lies" about the real value, as long as the ratios are correct. > > And I guess that anyone can do measurement on the hardware and get those > > values anyway. So specifying a unit (mW) for the power is probably a > > good idea. > > Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz > binding accepted, and one of the musts was that the values were going to > be normalized. So, normalized power values again maybe? Hmmm, that's a very good point ... There should be no problems on the scheduler side -- we're only interested in correct ratios. But I'm not sure on the thermal side ... I will double check that. Javi, Viresh, Eduardo: any thoughts about this ? Thanks ! Quentin
On Wednesday 06 Jun 2018 at 16:29:50 (+0100), Quentin Perret wrote: > On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: > > > > This brings me to another question. Let's say there are multiple users of > > > > the Energy Model in the system. Shouldn't the units of frequency and power > > > > not standardized, maybe Mhz and mW? > > > > The task scheduler doesn't care since it is only interested in power diffs > > > > but other user might do. > > > > > > So the good thing about specifying units is that we can probably assume > > > ranges on the values. If the power is in mW, assuming that we're talking > > > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > > > a reasonable upper-bound ? > > > But there are also vendors who might not be happy with disclosing absolute > > > values ... These are sometimes considered sensitive and only relative > > > numbers are discussed publicly. Now, you can also argue that we already > > > have units specified in IPA for ex, and that it doesn't really matter if > > > a driver "lies" about the real value, as long as the ratios are correct. > > > And I guess that anyone can do measurement on the hardware and get those > > > values anyway. So specifying a unit (mW) for the power is probably a > > > good idea. > > > > Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz > > binding accepted, and one of the musts was that the values were going to > > be normalized. So, normalized power values again maybe? > > Hmmm, that's a very good point ... There should be no problems on the > scheduler side -- we're only interested in correct ratios. But I'm not > sure on the thermal side ... I will double check that. So, IPA needs to compare the power of the CPUs with the power of other things (e.g. GPUs). So we can't normalize the power of the CPUs without normalizing in the same scale the power of the other devices. I see two possibilities: 1) we don't normalize the CPU power values, we specify them in mW, and we document (and maybe throw a warning if we see an issue at runtime) the max range of values. The max expected power for a single core could be 65K for ex (16bits). And based on that we can verify overflow and precision issues in the algorithms, and we keep it easy to compare the CPU power numbers with other devices. 2) we normalize the power values, but that means that the EM framework has to manage not only CPUs, but also other types of devices, and normalized their power values as well. That's required to keep the scale consistent across all of them, and keep comparisons doable. But if we do this, we still have to keep a normalized and a "raw" version of the power for all devices. And the "raw" power must still be in the same unit across all devices, otherwise the re-scaling is broken. The main benefit of doing this is that the range of acceptable "raw" power values can be larger, probably 32bits, and that the precision of the normalized range is arbitrary. I feel like 2) involves a lot of complexity, and not so many benefits, so I'd be happy to go with 1). Unless I forgot something ? Thanks, Quentin
On 06/06/2018 06:26 PM, Quentin Perret wrote: > On Wednesday 06 Jun 2018 at 16:29:50 (+0100), Quentin Perret wrote: >> On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: >>>>> This brings me to another question. Let's say there are multiple users of >>>>> the Energy Model in the system. Shouldn't the units of frequency and power >>>>> not standardized, maybe Mhz and mW? >>>>> The task scheduler doesn't care since it is only interested in power diffs >>>>> but other user might do. >>>> >>>> So the good thing about specifying units is that we can probably assume >>>> ranges on the values. If the power is in mW, assuming that we're talking >>>> about a single CPU, it'll probably fit in 16 bits. 65W/core should be >>>> a reasonable upper-bound ? >>>> But there are also vendors who might not be happy with disclosing absolute >>>> values ... These are sometimes considered sensitive and only relative >>>> numbers are discussed publicly. Now, you can also argue that we already >>>> have units specified in IPA for ex, and that it doesn't really matter if >>>> a driver "lies" about the real value, as long as the ratios are correct. >>>> And I guess that anyone can do measurement on the hardware and get those >>>> values anyway. So specifying a unit (mW) for the power is probably a >>>> good idea. >>> >>> Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz >>> binding accepted, and one of the musts was that the values were going to >>> be normalized. So, normalized power values again maybe? >> >> Hmmm, that's a very good point ... There should be no problems on the >> scheduler side -- we're only interested in correct ratios. But I'm not >> sure on the thermal side ... I will double check that. > > So, IPA needs to compare the power of the CPUs with the power of other > things (e.g. GPUs). So we can't normalize the power of the CPUs without > normalizing in the same scale the power of the other devices. I see two > possibilities: > > 1) we don't normalize the CPU power values, we specify them in mW, and > we document (and maybe throw a warning if we see an issue at runtime) > the max range of values. The max expected power for a single core > could be 65K for ex (16bits). And based on that we can verify > overflow and precision issues in the algorithms, and we keep it easy > to compare the CPU power numbers with other devices. I would say we need 1). 32bit values with units and proper documentation of the possible ranges. [...]
On Wed, Jun 06, 2018 at 05:26:47PM +0100, Quentin Perret wrote: > On Wednesday 06 Jun 2018 at 16:29:50 (+0100), Quentin Perret wrote: > > On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: > > > > > This brings me to another question. Let's say there are multiple users of > > > > > the Energy Model in the system. Shouldn't the units of frequency and power > > > > > not standardized, maybe Mhz and mW? > > > > > The task scheduler doesn't care since it is only interested in power diffs > > > > > but other user might do. > > > > > > > > So the good thing about specifying units is that we can probably assume > > > > ranges on the values. If the power is in mW, assuming that we're talking > > > > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > > > > a reasonable upper-bound ? > > > > But there are also vendors who might not be happy with disclosing absolute > > > > values ... These are sometimes considered sensitive and only relative > > > > numbers are discussed publicly. Now, you can also argue that we already > > > > have units specified in IPA for ex, and that it doesn't really matter if > > > > a driver "lies" about the real value, as long as the ratios are correct. > > > > And I guess that anyone can do measurement on the hardware and get those > > > > values anyway. So specifying a unit (mW) for the power is probably a > > > > good idea. > > > > > > Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz > > > binding accepted, and one of the musts was that the values were going to > > > be normalized. So, normalized power values again maybe? > > > > Hmmm, that's a very good point ... There should be no problems on the > > scheduler side -- we're only interested in correct ratios. But I'm not > > sure on the thermal side ... I will double check that. > > So, IPA needs to compare the power of the CPUs with the power of other > things (e.g. GPUs). So we can't normalize the power of the CPUs without > normalizing in the same scale the power of the other devices. I see two > possibilities: > > 1) we don't normalize the CPU power values, we specify them in mW, and > we document (and maybe throw a warning if we see an issue at runtime) > the max range of values. The max expected power for a single core > could be 65K for ex (16bits). And based on that we can verify > overflow and precision issues in the algorithms, and we keep it easy > to compare the CPU power numbers with other devices. > > 2) we normalize the power values, but that means that the EM framework > has to manage not only CPUs, but also other types of devices, and > normalized their power values as well. That's required to keep the > scale consistent across all of them, and keep comparisons doable. > But if we do this, we still have to keep a normalized and a "raw" > version of the power for all devices. And the "raw" power must still > be in the same unit across all devices, otherwise the re-scaling is > broken. The main benefit of doing this is that the range of > acceptable "raw" power values can be larger, probably 32bits, and > that the precision of the normalized range is arbitrary. > > I feel like 2) involves a lot of complexity, and not so many benefits, > so I'd be happy to go with 1). Unless I forgot something ? From the thermal point of view, the power values don't need to have any given unit, as long as the values are comparable to each other. Do we need to normalize anything in the kernel though? Can't we just assume that whatever the platform is telling us is correct? Quentin mentioned it earlier: sometimes absolute values are considered sensitive and we only get ones that are correct relative to the rest of the system. (This reminds me that the units in Documentation/thermal/cpu-cooling-api.txt are wrong and I need to fix it :-X ) Cheers, Javi
Hi Javi, On Friday 08 Jun 2018 at 14:39:42 (+0100), Javi Merino wrote: > On Wed, Jun 06, 2018 at 05:26:47PM +0100, Quentin Perret wrote: > > On Wednesday 06 Jun 2018 at 16:29:50 (+0100), Quentin Perret wrote: > > > On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: > > > > > > This brings me to another question. Let's say there are multiple users of > > > > > > the Energy Model in the system. Shouldn't the units of frequency and power > > > > > > not standardized, maybe Mhz and mW? > > > > > > The task scheduler doesn't care since it is only interested in power diffs > > > > > > but other user might do. > > > > > > > > > > So the good thing about specifying units is that we can probably assume > > > > > ranges on the values. If the power is in mW, assuming that we're talking > > > > > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > > > > > a reasonable upper-bound ? > > > > > But there are also vendors who might not be happy with disclosing absolute > > > > > values ... These are sometimes considered sensitive and only relative > > > > > numbers are discussed publicly. Now, you can also argue that we already > > > > > have units specified in IPA for ex, and that it doesn't really matter if > > > > > a driver "lies" about the real value, as long as the ratios are correct. > > > > > And I guess that anyone can do measurement on the hardware and get those > > > > > values anyway. So specifying a unit (mW) for the power is probably a > > > > > good idea. > > > > > > > > Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz > > > > binding accepted, and one of the musts was that the values were going to > > > > be normalized. So, normalized power values again maybe? > > > > > > Hmmm, that's a very good point ... There should be no problems on the > > > scheduler side -- we're only interested in correct ratios. But I'm not > > > sure on the thermal side ... I will double check that. > > > > So, IPA needs to compare the power of the CPUs with the power of other > > things (e.g. GPUs). So we can't normalize the power of the CPUs without > > normalizing in the same scale the power of the other devices. I see two > > possibilities: > > > > 1) we don't normalize the CPU power values, we specify them in mW, and > > we document (and maybe throw a warning if we see an issue at runtime) > > the max range of values. The max expected power for a single core > > could be 65K for ex (16bits). And based on that we can verify > > overflow and precision issues in the algorithms, and we keep it easy > > to compare the CPU power numbers with other devices. > > > > 2) we normalize the power values, but that means that the EM framework > > has to manage not only CPUs, but also other types of devices, and > > normalized their power values as well. That's required to keep the > > scale consistent across all of them, and keep comparisons doable. > > But if we do this, we still have to keep a normalized and a "raw" > > version of the power for all devices. And the "raw" power must still > > be in the same unit across all devices, otherwise the re-scaling is > > broken. The main benefit of doing this is that the range of > > acceptable "raw" power values can be larger, probably 32bits, and > > that the precision of the normalized range is arbitrary. > > > > I feel like 2) involves a lot of complexity, and not so many benefits, > > so I'd be happy to go with 1). Unless I forgot something ? > > From the thermal point of view, the power values don't need to have > any given unit, as long as the values are comparable to each other. OK, thanks for confirming that :-) > Do we need to normalize anything in the kernel though? Can't we just > assume that whatever the platform is telling us is correct? Quentin > mentioned it earlier: sometimes absolute values are considered > sensitive and we only get ones that are correct relative to the rest > of the system. I'm happy to specify the units as mW and let the drivers lie about the true values. At least that helps them lie coherently if another subsystem requires power in uW for example. > > (This reminds me that the units in > Documentation/thermal/cpu-cooling-api.txt are wrong and I need to fix > it :-X ) ;-) Thanks, Quentin
On Fri, Jun 08, 2018 at 04:47:39PM +0100, Quentin Perret wrote: > Hi Javi, > > On Friday 08 Jun 2018 at 14:39:42 (+0100), Javi Merino wrote: > > On Wed, Jun 06, 2018 at 05:26:47PM +0100, Quentin Perret wrote: > > > On Wednesday 06 Jun 2018 at 16:29:50 (+0100), Quentin Perret wrote: > > > > On Wednesday 06 Jun 2018 at 17:20:00 (+0200), Juri Lelli wrote: > > > > > > > This brings me to another question. Let's say there are multiple users of > > > > > > > the Energy Model in the system. Shouldn't the units of frequency and power > > > > > > > not standardized, maybe Mhz and mW? > > > > > > > The task scheduler doesn't care since it is only interested in power diffs > > > > > > > but other user might do. > > > > > > > > > > > > So the good thing about specifying units is that we can probably assume > > > > > > ranges on the values. If the power is in mW, assuming that we're talking > > > > > > about a single CPU, it'll probably fit in 16 bits. 65W/core should be > > > > > > a reasonable upper-bound ? > > > > > > But there are also vendors who might not be happy with disclosing absolute > > > > > > values ... These are sometimes considered sensitive and only relative > > > > > > numbers are discussed publicly. Now, you can also argue that we already > > > > > > have units specified in IPA for ex, and that it doesn't really matter if > > > > > > a driver "lies" about the real value, as long as the ratios are correct. > > > > > > And I guess that anyone can do measurement on the hardware and get those > > > > > > values anyway. So specifying a unit (mW) for the power is probably a > > > > > > good idea. > > > > > > > > > > Mmm, I remember we fought quite a bit while getting capacity-dmpis-mhz > > > > > binding accepted, and one of the musts was that the values were going to > > > > > be normalized. So, normalized power values again maybe? > > > > > > > > Hmmm, that's a very good point ... There should be no problems on the > > > > scheduler side -- we're only interested in correct ratios. But I'm not > > > > sure on the thermal side ... I will double check that. > > > > > > So, IPA needs to compare the power of the CPUs with the power of other > > > things (e.g. GPUs). So we can't normalize the power of the CPUs without > > > normalizing in the same scale the power of the other devices. I see two > > > possibilities: > > > > > > 1) we don't normalize the CPU power values, we specify them in mW, and > > > we document (and maybe throw a warning if we see an issue at runtime) > > > the max range of values. The max expected power for a single core > > > could be 65K for ex (16bits). And based on that we can verify > > > overflow and precision issues in the algorithms, and we keep it easy > > > to compare the CPU power numbers with other devices. > > > > > > 2) we normalize the power values, but that means that the EM framework > > > has to manage not only CPUs, but also other types of devices, and > > > normalized their power values as well. That's required to keep the > > > scale consistent across all of them, and keep comparisons doable. > > > But if we do this, we still have to keep a normalized and a "raw" > > > version of the power for all devices. And the "raw" power must still > > > be in the same unit across all devices, otherwise the re-scaling is > > > broken. The main benefit of doing this is that the range of > > > acceptable "raw" power values can be larger, probably 32bits, and > > > that the precision of the normalized range is arbitrary. > > > > > > I feel like 2) involves a lot of complexity, and not so many benefits, > > > so I'd be happy to go with 1). Unless I forgot something ? > > > > From the thermal point of view, the power values don't need to have > > any given unit, as long as the values are comparable to each other. > > OK, thanks for confirming that :-) > > > Do we need to normalize anything in the kernel though? Can't we just > > assume that whatever the platform is telling us is correct? Quentin > > mentioned it earlier: sometimes absolute values are considered > > sensitive and we only get ones that are correct relative to the rest > > of the system. > > I'm happy to specify the units as mW and let the drivers lie about the > true values. At least that helps them lie coherently if another > subsystem requires power in uW for example. I think this is a good option. Cheers, Javi
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 6ad53f1cf7e6..c13b3eb8bf35 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -144,9 +144,11 @@ static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu) unsigned long fmax = cs_table->state[max_cap_state].frequency; int i; - for (i = 0; i < cs_table->nr_cap_states; i++) - cs_table->state[i].capacity = cmax * - cs_table->state[i].frequency / fmax; + for (i = 0; i < cs_table->nr_cap_states; i++) { + u64 val = (u64)cmax * cs_table->state[i].frequency; + do_div(val, fmax); + cs_table->state[i].capacity = (unsigned long)val; + } } This brings me to another question. Let's say there are multiple users of