Message ID | 20180906131513.24862-1-prarit@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Shuah Khan |
Headers | show |
Series | cpupower: Add better frequency-info failure messaging | expand |
Hi Prarit, Thanks for the patch. Comments below: On 09/06/2018 07:15 AM, Prarit Bhargava wrote: > If a Dell system has "OS DBPM(Performance Per Watt OS)" set in BIOS, > cpupower frequency-info outputs > > current CPU frequency: Unable to call hardware > current CPU frequency: 2.50 GHz (asserted by call to kernel) > > to indicate that cpupower cannot read the frequency from hardware and > is using the kernel's cpu frequency estimate. This output is confusing > to end users who wonder why the hardware is not responding. > > Update the cpupower frequency-info output message to only output an > error if both the hardware and kernel calls fail. > > Signed-off-by: Prarit Bhargava <prarit@redhat.com> > Cc: Thomas Renninger <trenn@suse.com> > Cc: Shuah Khan <shuah@kernel.org> > Cc: Stafford Horne <shorne@gmail.com> > --- > tools/power/cpupower/utils/cpufreq-info.c | 32 +++++++++++++++++-------------- > 1 file changed, 18 insertions(+), 14 deletions(-) > > diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c > index df43cd45d810..d7dcd78de544 100644 > --- a/tools/power/cpupower/utils/cpufreq-info.c > +++ b/tools/power/cpupower/utils/cpufreq-info.c > @@ -246,17 +246,19 @@ static int get_boost_mode(unsigned int cpu) > > /* --freq / -f */ > > -static int get_freq_kernel(unsigned int cpu, unsigned int human) > +static int get_freq_kernel(unsigned int cpu, unsigned int human, int fail_msg) > { > unsigned long freq = cpufreq_get_freq_kernel(cpu); > - printf(_(" current CPU frequency: ")); > + > if (!freq) { > - printf(_(" Unable to call to kernel\n")); > + if (fail_msg) > + printf(_(" current CPU frequency: Unable to call to kernel\n")); What is the reason for this to be printed conditionally? > return -EINVAL; > } > - if (human) { > + printf(_(" current CPU frequency: ")); > + if (human) > print_speed(freq); > - } else > + else > printf("%lu", freq); > printf(_(" (asserted by call to kernel)\n")); > return 0; > @@ -265,17 +267,19 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) > > /* --hwfreq / -w */ > > -static int get_freq_hardware(unsigned int cpu, unsigned int human) > +static int get_freq_hardware(unsigned int cpu, unsigned int human, int fail_msg) > { > unsigned long freq = cpufreq_get_freq_hardware(cpu); > - printf(_(" current CPU frequency: ")); > + > if (!freq) { > - printf("Unable to call hardware\n"); > + if (fail_msg) > + printf(_(" current CPU frequency: Unable to call hardware\n")); What is the reason for this to be printed conditionally? > return -EINVAL; > } > - if (human) { > + printf(_(" current CPU frequency: ")); > + if (human) > print_speed(freq); > - } else > + else > printf("%lu", freq); > printf(_(" (asserted by call to hardware)\n")); > return 0; > @@ -475,8 +479,8 @@ static void debug_output_one(unsigned int cpu) > > get_available_governors(cpu); > get_policy(cpu); > - if (get_freq_hardware(cpu, 1) < 0) > - get_freq_kernel(cpu, 1); > + if (get_freq_hardware(cpu, 1, 0) && get_freq_kernel(cpu, 1, 0)) > + printf(_(" current CPU frequency: Unable to call hardware or kernel\n")); This will be confusing to the user. It says hardware or kernel instead of being specific about which one failed. Also looks like this changes the logic to call get_freq_kernel() even if get_freq_hardware() fails. > get_boost_mode(cpu); > } > > @@ -627,10 +631,10 @@ int cmd_freq_info(int argc, char **argv) > ret = get_hardware_limits(cpu, human); > break; > case 'w': > - ret = get_freq_hardware(cpu, human); > + ret = get_freq_hardware(cpu, human, 1); I think adding extra flag can be avoided. > break; > case 'f': > - ret = get_freq_kernel(cpu, human); > + ret = get_freq_kernel(cpu, human, 1); I think adding extra flag can be avoided. > break; > case 's': > ret = get_freq_stats(cpu, human); > thanks, -- Shuah
On 09/06/2018 02:34 PM, Shuah Khan wrote: > Hi Prarit, > > Thanks for the patch. Comments below: > > On 09/06/2018 07:15 AM, Prarit Bhargava wrote: >> If a Dell system has "OS DBPM(Performance Per Watt OS)" set in BIOS, >> cpupower frequency-info outputs >> >> current CPU frequency: Unable to call hardware >> current CPU frequency: 2.50 GHz (asserted by call to kernel) >> >> to indicate that cpupower cannot read the frequency from hardware and >> is using the kernel's cpu frequency estimate. This output is confusing >> to end users who wonder why the hardware is not responding. >> >> Update the cpupower frequency-info output message to only output an >> error if both the hardware and kernel calls fail. >> >> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >> Cc: Thomas Renninger <trenn@suse.com> >> Cc: Shuah Khan <shuah@kernel.org> >> Cc: Stafford Horne <shorne@gmail.com> >> --- >> tools/power/cpupower/utils/cpufreq-info.c | 32 +++++++++++++++++-------------- >> 1 file changed, 18 insertions(+), 14 deletions(-) >> >> diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c >> index df43cd45d810..d7dcd78de544 100644 >> --- a/tools/power/cpupower/utils/cpufreq-info.c >> +++ b/tools/power/cpupower/utils/cpufreq-info.c >> @@ -246,17 +246,19 @@ static int get_boost_mode(unsigned int cpu) >> >> /* --freq / -f */ >> >> -static int get_freq_kernel(unsigned int cpu, unsigned int human) >> +static int get_freq_kernel(unsigned int cpu, unsigned int human, int fail_msg) >> { >> unsigned long freq = cpufreq_get_freq_kernel(cpu); >> - printf(_(" current CPU frequency: ")); >> + >> if (!freq) { >> - printf(_(" Unable to call to kernel\n")); >> + if (fail_msg) >> + printf(_(" current CPU frequency: Unable to call to kernel\n")); > > What is the reason for this to be printed conditionally? > See below. >> return -EINVAL; >> } >> - if (human) { >> + printf(_(" current CPU frequency: ")); >> + if (human) >> print_speed(freq); >> - } else >> + else >> printf("%lu", freq); >> printf(_(" (asserted by call to kernel)\n")); >> return 0; >> @@ -265,17 +267,19 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) >> >> /* --hwfreq / -w */ >> >> -static int get_freq_hardware(unsigned int cpu, unsigned int human) >> +static int get_freq_hardware(unsigned int cpu, unsigned int human, int fail_msg) >> { >> unsigned long freq = cpufreq_get_freq_hardware(cpu); >> - printf(_(" current CPU frequency: ")); >> + >> if (!freq) { >> - printf("Unable to call hardware\n"); >> + if (fail_msg) >> + printf(_(" current CPU frequency: Unable to call hardware\n")); > > What is the reason for this to be printed conditionally? > See below. >> return -EINVAL; >> } >> - if (human) { >> + printf(_(" current CPU frequency: ")); >> + if (human) >> print_speed(freq); >> - } else >> + else >> printf("%lu", freq); >> printf(_(" (asserted by call to hardware)\n")); >> return 0; >> @@ -475,8 +479,8 @@ static void debug_output_one(unsigned int cpu) >> >> get_available_governors(cpu); >> get_policy(cpu); >> - if (get_freq_hardware(cpu, 1) < 0) >> - get_freq_kernel(cpu, 1); >> + if (get_freq_hardware(cpu, 1, 0) && get_freq_kernel(cpu, 1, 0)) >> + printf(_(" current CPU frequency: Unable to call hardware or kernel\n")); > > This will be confusing to the user. It says hardware or kernel instead of being > specific about which one failed. "or" might be a bad choice. Perhaps "and" is a better word there. > > Also looks like this changes the logic to call get_freq_kernel() even if > get_freq_hardware() fails. This is what we have right now: If get_freq_hardware() fails and get_freq_kernel() succeeds: current CPU frequency: Unable to call hardware current CPU frequency: 2.50 GHz (asserted by call to kernel) If 'w' is passed as a parameter and get_freq_hardware() fails current CPU frequency: Unable to call hardware If 'w' is passed as a parameter and get_freq_hardware() succeeds current CPU frequency: 2.50 GHz (asserted by call to hardware) If 'f' is passed as a parameter and get_freq_kernel() fails current CPU frequency: Unable to call kernel If 'f' is passed as a parameter and get_freq_kernel() succeeds current CPU frequency: Unable to call kernel With my change I only want to modify the output when both are called so that one error message is output. current CPU frequency: Unable to call hardware or (and?) kernel ie) if get_freq_hardware() fails in the "combined" case no error output from get_freq_hardware() should be output and the same for get_freq_kernel(), o/w you'll see current CPU frequency: Unable to call hardware current CPU frequency: Unable to call hardware and kernel when both fail. We should not change 'w' and 'f' output. This means in those cases we need the failure messages (ie fail_msg = 1) in those calls, so the fail_msg parameter is needed. If you have a better idea I'm all for it. I feel like all that code could potentially be rewritten into a single function with unified error messages. P. > >> get_boost_mode(cpu); >> } >> >> @@ -627,10 +631,10 @@ int cmd_freq_info(int argc, char **argv) >> ret = get_hardware_limits(cpu, human); >> break; >> case 'w': >> - ret = get_freq_hardware(cpu, human); >> + ret = get_freq_hardware(cpu, human, 1); > > I think adding extra flag can be avoided. > >> break; >> case 'f': >> - ret = get_freq_kernel(cpu, human); >> + ret = get_freq_kernel(cpu, human, 1); > > I think adding extra flag can be avoided. > >> break; >> case 's': >> ret = get_freq_stats(cpu, human); >> > > thanks, > -- Shuah >
On 09/06/2018 12:58 PM, Prarit Bhargava wrote: > > > On 09/06/2018 02:34 PM, Shuah Khan wrote: >> Hi Prarit, >> >> Thanks for the patch. Comments below: >> >> On 09/06/2018 07:15 AM, Prarit Bhargava wrote: >>> If a Dell system has "OS DBPM(Performance Per Watt OS)" set in BIOS, >>> cpupower frequency-info outputs >>> >>> current CPU frequency: Unable to call hardware >>> current CPU frequency: 2.50 GHz (asserted by call to kernel) >>> >>> to indicate that cpupower cannot read the frequency from hardware and >>> is using the kernel's cpu frequency estimate. This output is confusing >>> to end users who wonder why the hardware is not responding. >>> >>> Update the cpupower frequency-info output message to only output an >>> error if both the hardware and kernel calls fail. >>> >>> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >>> Cc: Thomas Renninger <trenn@suse.com> >>> Cc: Shuah Khan <shuah@kernel.org> >>> Cc: Stafford Horne <shorne@gmail.com> >>> --- >>> tools/power/cpupower/utils/cpufreq-info.c | 32 +++++++++++++++++-------------- >>> 1 file changed, 18 insertions(+), 14 deletions(-) >>> >>> diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c >>> index df43cd45d810..d7dcd78de544 100644 >>> --- a/tools/power/cpupower/utils/cpufreq-info.c >>> +++ b/tools/power/cpupower/utils/cpufreq-info.c >>> @@ -246,17 +246,19 @@ static int get_boost_mode(unsigned int cpu) >>> >>> /* --freq / -f */ >>> >>> -static int get_freq_kernel(unsigned int cpu, unsigned int human) >>> +static int get_freq_kernel(unsigned int cpu, unsigned int human, int fail_msg) >>> { >>> unsigned long freq = cpufreq_get_freq_kernel(cpu); >>> - printf(_(" current CPU frequency: ")); >>> + >>> if (!freq) { >>> - printf(_(" Unable to call to kernel\n")); >>> + if (fail_msg) >>> + printf(_(" current CPU frequency: Unable to call to kernel\n")); >> >> What is the reason for this to be printed conditionally? >> > > See below. > >>> return -EINVAL; >>> } >>> - if (human) { >>> + printf(_(" current CPU frequency: ")); >>> + if (human) >>> print_speed(freq); >>> - } else >>> + else >>> printf("%lu", freq); >>> printf(_(" (asserted by call to kernel)\n")); >>> return 0; >>> @@ -265,17 +267,19 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) >>> >>> /* --hwfreq / -w */ >>> >>> -static int get_freq_hardware(unsigned int cpu, unsigned int human) >>> +static int get_freq_hardware(unsigned int cpu, unsigned int human, int fail_msg) >>> { >>> unsigned long freq = cpufreq_get_freq_hardware(cpu); >>> - printf(_(" current CPU frequency: ")); >>> + >>> if (!freq) { >>> - printf("Unable to call hardware\n"); >>> + if (fail_msg) >>> + printf(_(" current CPU frequency: Unable to call hardware\n")); >> >> What is the reason for this to be printed conditionally? >> > > See below. > >>> return -EINVAL; >>> } >>> - if (human) { >>> + printf(_(" current CPU frequency: ")); >>> + if (human) >>> print_speed(freq); >>> - } else >>> + else >>> printf("%lu", freq); >>> printf(_(" (asserted by call to hardware)\n")); >>> return 0; >>> @@ -475,8 +479,8 @@ static void debug_output_one(unsigned int cpu) >>> >>> get_available_governors(cpu); >>> get_policy(cpu); >>> - if (get_freq_hardware(cpu, 1) < 0) >>> - get_freq_kernel(cpu, 1); >>> + if (get_freq_hardware(cpu, 1, 0) && get_freq_kernel(cpu, 1, 0)) >>> + printf(_(" current CPU frequency: Unable to call hardware or kernel\n")); >> >> This will be confusing to the user. It says hardware or kernel instead of being >> specific about which one failed. > > "or" might be a bad choice. Perhaps "and" is a better word there. Please change the message. > >> >> Also looks like this changes the logic to call get_freq_kernel() even if >> get_freq_hardware() fails. > > This is what we have right now: If get_freq_hardware() fails and > get_freq_kernel() succeeds: > > current CPU frequency: Unable to call hardware > current CPU frequency: 2.50 GHz (asserted by call to kernel) > > If 'w' is passed as a parameter and get_freq_hardware() fails > > current CPU frequency: Unable to call hardware > > If 'w' is passed as a parameter and get_freq_hardware() succeeds > > current CPU frequency: 2.50 GHz (asserted by call to hardware) > > If 'f' is passed as a parameter and get_freq_kernel() fails > > current CPU frequency: Unable to call kernel > > If 'f' is passed as a parameter and get_freq_kernel() succeeds > > current CPU frequency: Unable to call kernel > > With my change I only want to modify the output when both are called so > that one error message is output. > > current CPU frequency: Unable to call hardware or (and?) kernel > > ie) if get_freq_hardware() fails in the "combined" case no error output > from get_freq_hardware() should be output and the same for get_freq_kernel(), > o/w you'll see > > current CPU frequency: Unable to call hardware > current CPU frequency: Unable to call hardware and kernel > > when both fail. > > We should not change 'w' and 'f' output. This means in those cases > we need the failure messages (ie fail_msg = 1) in those calls, so the > fail_msg parameter is needed. Okay that makes sense. > > If you have a better idea I'm all for it. I feel like all that code could > potentially be rewritten into a single function with unified error messages. > > P. I think we want to get this in first and then do the code restructuring later. Please feel free to take it on if you have the time. Send me v2 with the change to the message and I will get this in. thanks, -- Shuah
On Thursday, September 6, 2018 8:58:47 PM CEST Prarit Bhargava wrote: > On 09/06/2018 02:34 PM, Shuah Khan wrote: > > Hi Prarit, > > ... > With my change I only want to modify the output when both are called so > that one error message is output. > > current CPU frequency: Unable to call hardware or (and?) kernel > > ie) if get_freq_hardware() fails in the "combined" case no error output > from get_freq_hardware() should be output and the same for > get_freq_kernel(), o/w you'll see > > current CPU frequency: Unable to call hardware > current CPU frequency: Unable to call hardware and kernel > > when both fail. Why do you want/need this? This is all pretty clear right now: There are 2 methods to get frequency which are both called if not a specific one is chosen via parameter. The result of both is show in a separate line. If it cannot be retrieved, an error/info line is shown that it couldn't be fetched, of course for each methods. People start to script with this tool. Modifying output is not a good idea. current CPU frequency: Unable to call hardware current CPU frequency: Unable to call to kernel is IMO even better than: current CPU frequency: Unable to call hardware and kernel The real culprit is the kernel interface(s). The message itself is not that true anymore nowadays. With pstate driver iirc it's all retrieved via mperf/aperf calls. Thomas
On 09/07/2018 10:29 AM, Thomas Renninger wrote: > On Thursday, September 6, 2018 8:58:47 PM CEST Prarit Bhargava wrote: >> On 09/06/2018 02:34 PM, Shuah Khan wrote: >>> Hi Prarit, >>> > ... >> With my change I only want to modify the output when both are called so >> that one error message is output. >> >> current CPU frequency: Unable to call hardware or (and?) kernel >> >> ie) if get_freq_hardware() fails in the "combined" case no error output >> from get_freq_hardware() should be output and the same for >> get_freq_kernel(), o/w you'll see >> >> current CPU frequency: Unable to call hardware >> current CPU frequency: Unable to call hardware and kernel >> >> when both fail. See my previous response. You'll only see current CPU frequency: Unable to call hardware and kernel for both cases. > > Why do you want/need this? As stated, it is very confusing. Was there an error? Is there something the user needs to fix? Why is it saying there is an error when the frequency was returned by the kernel? ... etc. > This is all pretty clear right now: No, it really isn't. > There are 2 methods to get frequency which are both called if not > a specific one is chosen via parameter. > > The result of both is show in a separate line. > If it cannot be retrieved, an error/info line is shown that it couldn't be > fetched, of course for each methods. > > People start to script with this tool. > Modifying output is not a good idea. How so? Other userspace apps change the output all the time. See (more widely used) turbostat in the kernel. > > current CPU frequency: Unable to call hardware > current CPU frequency: Unable to call to kernel > > is IMO even better than: > current CPU frequency: Unable to call hardware and kernel That's fine. I can change it to two lines if that's what the concern is. > > The real culprit is the kernel interface(s) > The message itself is not that true anymore nowadays. > With pstate driver iirc it's all retrieved via mperf/aperf calls. > I'm not disagreeing with that but it is what we have ATM in cpupower. P. > Thomas > >
diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c index df43cd45d810..d7dcd78de544 100644 --- a/tools/power/cpupower/utils/cpufreq-info.c +++ b/tools/power/cpupower/utils/cpufreq-info.c @@ -246,17 +246,19 @@ static int get_boost_mode(unsigned int cpu) /* --freq / -f */ -static int get_freq_kernel(unsigned int cpu, unsigned int human) +static int get_freq_kernel(unsigned int cpu, unsigned int human, int fail_msg) { unsigned long freq = cpufreq_get_freq_kernel(cpu); - printf(_(" current CPU frequency: ")); + if (!freq) { - printf(_(" Unable to call to kernel\n")); + if (fail_msg) + printf(_(" current CPU frequency: Unable to call to kernel\n")); return -EINVAL; } - if (human) { + printf(_(" current CPU frequency: ")); + if (human) print_speed(freq); - } else + else printf("%lu", freq); printf(_(" (asserted by call to kernel)\n")); return 0; @@ -265,17 +267,19 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) /* --hwfreq / -w */ -static int get_freq_hardware(unsigned int cpu, unsigned int human) +static int get_freq_hardware(unsigned int cpu, unsigned int human, int fail_msg) { unsigned long freq = cpufreq_get_freq_hardware(cpu); - printf(_(" current CPU frequency: ")); + if (!freq) { - printf("Unable to call hardware\n"); + if (fail_msg) + printf(_(" current CPU frequency: Unable to call hardware\n")); return -EINVAL; } - if (human) { + printf(_(" current CPU frequency: ")); + if (human) print_speed(freq); - } else + else printf("%lu", freq); printf(_(" (asserted by call to hardware)\n")); return 0; @@ -475,8 +479,8 @@ static void debug_output_one(unsigned int cpu) get_available_governors(cpu); get_policy(cpu); - if (get_freq_hardware(cpu, 1) < 0) - get_freq_kernel(cpu, 1); + if (get_freq_hardware(cpu, 1, 0) && get_freq_kernel(cpu, 1, 0)) + printf(_(" current CPU frequency: Unable to call hardware or kernel\n")); get_boost_mode(cpu); } @@ -627,10 +631,10 @@ int cmd_freq_info(int argc, char **argv) ret = get_hardware_limits(cpu, human); break; case 'w': - ret = get_freq_hardware(cpu, human); + ret = get_freq_hardware(cpu, human, 1); break; case 'f': - ret = get_freq_kernel(cpu, human); + ret = get_freq_kernel(cpu, human, 1); break; case 's': ret = get_freq_stats(cpu, human);
If a Dell system has "OS DBPM(Performance Per Watt OS)" set in BIOS, cpupower frequency-info outputs current CPU frequency: Unable to call hardware current CPU frequency: 2.50 GHz (asserted by call to kernel) to indicate that cpupower cannot read the frequency from hardware and is using the kernel's cpu frequency estimate. This output is confusing to end users who wonder why the hardware is not responding. Update the cpupower frequency-info output message to only output an error if both the hardware and kernel calls fail. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Renninger <trenn@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> --- tools/power/cpupower/utils/cpufreq-info.c | 32 +++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-)