cpufreq: intel_pstate: EPB with performance governor
diff mbox series

Message ID 000701d656be$c48083e0$4d818ba0$@net
State Not Applicable, archived
Headers show
Series
  • cpufreq: intel_pstate: EPB with performance governor
Related show

Commit Message

Doug Smythies July 10, 2020, 1:33 p.m. UTC
Hi Srinivas and/or Rafael,

Can you please confirm or deny that an older
commit:

commit 8442885fca09b2d26375b9fe507759879a6f661e
cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode

has been superseded by:

arch/x86/kernel/cpu/intel_epb.c

and that now there is no way to have some default EPB (say 6) for
governors other than performance, while still getting an EPB of 0
for the performance governor.

... Doug

Additional notes:
Both my test computers have EPB as 0 upon startup,
But I also tried this:


which did get rid of this message:
kernel: [    0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'

Comments

Rafael J. Wysocki July 15, 2020, 4:46 p.m. UTC | #1
On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Srinivas and/or Rafael,
>
> Can you please confirm or deny that an older
> commit:
>
> commit 8442885fca09b2d26375b9fe507759879a6f661e
> cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
>
> has been superseded by:
>
> arch/x86/kernel/cpu/intel_epb.c

No, it hasn't.

However, intel_pstate only touches the EPB if EPP is not supported,
which should become a non-issue after this patch posted by me
yesterday: https://patchwork.kernel.org/patch/11663315/

If EPP is supported, intel_pstate will use it and it will never look
at the EPB even.

> and that now there is no way to have some default EPB (say 6) for
> governors other than performance, while still getting an EPB of 0
> for the performance governor.

If EPP is supported, what happens to the EPB is completely orthogonal
to cpufreq etc.

So it is possible to have the EPB different from 0, but it should be
the same for all governors unless changed via energy_perf_bias.

If EPP is not supported, though, then without the patch mentioned
above, intel_pstate may fiddle with the EPB.

> Additional notes:
> Both my test computers have EPB as 0 upon startup,

That is before intel_epb_init() runs, because it will change the EPB
to "normal" (6).

> But I also tried this:
>
> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index f4dd73396f28..b536e381cd56 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -74,7 +74,8 @@ static int intel_epb_save(void)
>
>  static void intel_epb_restore(void)
>  {
> -       u64 val = this_cpu_read(saved_epb);
> +//     u64 val = this_cpu_read(saved_epb);
> +       u64 val = 6;
>         u64 epb;
>
>         rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
>
> which did get rid of this message:
> kernel: [    0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'

Which is exactly what happens when this message is printed.

Instead of commenting out the line of code above, which is not a
correct thing to do in general, you can simply set the EPB to 0 via
energy_perf_bias for all CPUs and it should stick.

Thanks!
Doug Smythies July 15, 2020, 10:43 p.m. UTC | #2
Hi Rafael,

Thank you for your reply.

On 2020.07.15 09:47 Rafael J. Wysocki wrote:
> On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote:
> >
> > Hi Srinivas and/or Rafael,
> >
> > Can you please confirm or deny that an older
> > commit:
> >
> > commit 8442885fca09b2d26375b9fe507759879a6f661e
> > cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
> >
> > has been superseded by:
> >
> > arch/x86/kernel/cpu/intel_epb.c
> 
> No, it hasn't.
> 
> However, intel_pstate only touches the EPB if EPP is not supported,
> which should become a non-issue after this patch posted by me
> yesterday: https://patchwork.kernel.org/patch/11663315/
> 
> If EPP is supported, intel_pstate will use it and it will never look
> at the EPB even.
> 
> > and that now there is no way to have some default EPB (say 6) for
> > governors other than performance, while still getting an EPB of 0
> > for the performance governor.
> 
> If EPP is supported, what happens to the EPB is completely orthogonal
> to cpufreq etc.

Yes, I know.
I am talking about when HWP is disabled.

And I do not understand your reference to that patch from yesterday,
as it seems unrelated to me.

> 
> So it is possible to have the EPB different from 0, but it should be
> the same for all governors unless changed via energy_perf_bias.

And I am saying that contradicts the earlier referenced patch.
We want EPB set to 0 for performance mode, otherwise I challenge the
name "performance" governor.

Yes, EPB can be whatever for the other governors.

> 
> If EPP is not supported, though, then without the patch mentioned
> above, intel_pstate may fiddle with the EPB.
> 
> > Additional notes:
> > Both my test computers have EPB as 0 upon startup,
> 
> That is before intel_epb_init() runs, because it will change the EPB
> to "normal" (6).

Yes, I know.

> 
> > But I also tried this:
> >
> > diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> > index f4dd73396f28..b536e381cd56 100644
> > --- a/arch/x86/kernel/cpu/intel_epb.c
> > +++ b/arch/x86/kernel/cpu/intel_epb.c
> > @@ -74,7 +74,8 @@ static int intel_epb_save(void)
> >
> >  static void intel_epb_restore(void)
> >  {
> > -       u64 val = this_cpu_read(saved_epb);
> > +//     u64 val = this_cpu_read(saved_epb);
> > +       u64 val = 6;
> >         u64 epb;
> >
> >         rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> >
> > which did get rid of this message:
> > kernel: [    0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> 
> Which is exactly what happens when this message is printed.
> 
> Instead of commenting out the line of code above, which is not a
> correct thing to do in general,

Yes, of course. That was just a test, and the only way I could think of
to prove that the system started with it as 0.

> you can simply set the EPB to 0 via
> energy_perf_bias for all CPUs and it should stick.

And I am saying I should not have to do that, or even know about it,
when I want to use the performance governor.
But yes, I expect the driver to remember the default, or otherwise set,
value of EPB for all the other governors.

... Doug
Rafael J. Wysocki July 16, 2020, noon UTC | #3
Hi Doug,

On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Rafael,
>
> Thank you for your reply.
>
> On 2020.07.15 09:47 Rafael J. Wysocki wrote:
> > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote:
> > >
> > > Hi Srinivas and/or Rafael,
> > >
> > > Can you please confirm or deny that an older
> > > commit:
> > >
> > > commit 8442885fca09b2d26375b9fe507759879a6f661e
> > > cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
> > >
> > > has been superseded by:
> > >
> > > arch/x86/kernel/cpu/intel_epb.c
> >
> > No, it hasn't.
> >
> > However, intel_pstate only touches the EPB if EPP is not supported,
> > which should become a non-issue after this patch posted by me
> > yesterday: https://patchwork.kernel.org/patch/11663315/
> >
> > If EPP is supported, intel_pstate will use it and it will never look
> > at the EPB even.
> >
> > > and that now there is no way to have some default EPB (say 6) for
> > > governors other than performance, while still getting an EPB of 0
> > > for the performance governor.
> >
> > If EPP is supported, what happens to the EPB is completely orthogonal
> > to cpufreq etc.
>
> Yes, I know.
> I am talking about when HWP is disabled.

I see.

> And I do not understand your reference to that patch from yesterday,
> as it seems unrelated to me.

If you are referring to when HWP is disabled, then it is not related indeed.

> > So it is possible to have the EPB different from 0, but it should be
> > the same for all governors unless changed via energy_perf_bias.
>
> And I am saying that contradicts the earlier referenced patch.
> We want EPB set to 0 for performance mode,

The "performance" governor in cpufreq (or the active-mode
"performance" scaling algorithm in intel_pstate) covers the CPU
performance scaling only, while the EPB potentially affects the other
system components too.  Thus driving the EPB from the CPU performance
scaling subsystem is not the right approach IMO.

> otherwise I challenge the name "performance" governor.

In that case "performance" means the maximum CPU capacity at the given
EPB level.

> Yes, EPB can be whatever for the other governors.

CPU performance scaling governors and the EPB should not be directly
related to each other.  The EPB is system-wide, which generally means
more than just CPUs (the uncore and memory may be affected by it in
principle, for example).

> >
> > If EPP is not supported, though, then without the patch mentioned
> > above, intel_pstate may fiddle with the EPB.
> >
> > > Additional notes:
> > > Both my test computers have EPB as 0 upon startup,
> >
> > That is before intel_epb_init() runs, because it will change the EPB
> > to "normal" (6).
>
> Yes, I know.
>
> >
> > > But I also tried this:
> > >
> > > diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> > > index f4dd73396f28..b536e381cd56 100644
> > > --- a/arch/x86/kernel/cpu/intel_epb.c
> > > +++ b/arch/x86/kernel/cpu/intel_epb.c
> > > @@ -74,7 +74,8 @@ static int intel_epb_save(void)
> > >
> > >  static void intel_epb_restore(void)
> > >  {
> > > -       u64 val = this_cpu_read(saved_epb);
> > > +//     u64 val = this_cpu_read(saved_epb);
> > > +       u64 val = 6;
> > >         u64 epb;
> > >
> > >         rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > >
> > > which did get rid of this message:
> > > kernel: [    0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> >
> > Which is exactly what happens when this message is printed.
> >
> > Instead of commenting out the line of code above, which is not a
> > correct thing to do in general,
>
> Yes, of course. That was just a test, and the only way I could think of
> to prove that the system started with it as 0.
>
> > you can simply set the EPB to 0 via
> > energy_perf_bias for all CPUs and it should stick.
>
> And I am saying I should not have to do that, or even know about it,
> when I want to use the performance governor.

Again, cpufreq governors are on top of the EPB.

> But yes, I expect the driver to remember the default, or otherwise set,
> value of EPB for all the other governors.

We clearly don't agree here.

Also in the passive mode of intel_pstate, when the regular cpufreq
"performance" governor is in use, it's all about setting the frequency
to the max alone through min = max without touching any other knobs
which need to be adjusted separately.  That's how it's been always
working and changing it now may confuse the users who have learned to
rely on this behavior.

Thanks!
Doug Smythies July 17, 2020, 9:22 p.m. UTC | #4
Hi Rafael,

Thank you for your reply.
I'll give it up after this, I promise.

On 2020.07.16 05:00 Of Rafael J. Wysocki
> On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote:
> > On 2020.07.15 09:47 Rafael J. Wysocki wrote:
> > > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote:
...
> > > you can simply set the EPB to 0 via
> > > energy_perf_bias for all CPUs and it should stick.
> >
> > And I am saying I should not have to do that, or even know about it,
> > when I want to use the performance governor.
> 
> Again, cpufreq governors are on top of the EPB.
> 
> > But yes, I expect the driver to remember the default, or otherwise set,
> > value of EPB for all the other governors.
> 
> We clearly don't agree here.

Agreed. (That we disagree.)

It is done with EPP in active mode with HWP between performance
and powersave governors, so I struggle with treating the EPB
case differently.
 
> Also in the passive mode of intel_pstate, when the regular cpufreq
> "performance" governor is in use, it's all about setting the frequency
> to the max alone through min = max without touching any other knobs
> which need to be adjusted separately.  That's how it's been always
> working and changing it now may confuse the users who have learned to
> rely on this behavior.

But the behaviour is inconsistent anyhow.

How can we possibly claim that this:

doug@s18:~$ sudo ~/turbostat --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 6
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
18.12   3700    25782   38      13.94   0.00
0.55    3701    3000    38      13.94   0.00
0.01    3701    49
19.29   3700    5529
35.97   3700    6051
26.99   3700    5177
25.92   3700    5976
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
18.12   3700    27042   40      14.15   0.00
0.55    3701    2978    40      14.15   0.00
0.01    3701    22
30.01   3700    6042
28.09   3700    6044
29.18   3700    6046
20.91   3700    5910
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
18.13   3700    27195   40      14.06   0.00
0.55    3701    2983    40      14.06   0.00
0.01    3701    20
27.64   3700    6039
20.31   3700    6043
36.12   3700    6056
24.18   3700    6054

is "performance" mode?
There is plenty enough load on 4 of the CPUs.
In performance mode I would expect 4.6 GHz.

You can see the request for pstate 46,
But only pstate 37 is granted:

root@s18:/home/doug# /home/doug/c/msr-decoder
8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :  37 :  37 :  37 :  37 :  37 :  37 :
B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable
9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :  46 :  46 :  46 :  46 :  46 :  46 :
C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6 :   6 :   6 :
1.) 0x19C: IA32_THERM_STATUS: 883E0000
2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18 reset
3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 883C0000
4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0
A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

... Doug
Rafael J. Wysocki July 19, 2020, 11:22 a.m. UTC | #5
On Fri, Jul 17, 2020 at 11:23 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Rafael,
>
> Thank you for your reply.
> I'll give it up after this, I promise.
>
> On 2020.07.16 05:00 Of Rafael J. Wysocki
> > On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote:
> > > On 2020.07.15 09:47 Rafael J. Wysocki wrote:
> > > > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote:
> ...
> > > > you can simply set the EPB to 0 via
> > > > energy_perf_bias for all CPUs and it should stick.
> > >
> > > And I am saying I should not have to do that, or even know about it,
> > > when I want to use the performance governor.
> >
> > Again, cpufreq governors are on top of the EPB.
> >
> > > But yes, I expect the driver to remember the default, or otherwise set,
> > > value of EPB for all the other governors.
> >
> > We clearly don't agree here.
>
> Agreed. (That we disagree.)
>
> It is done with EPP in active mode with HWP between performance
> and powersave governors, so I struggle with treating the EPB
> case differently.

First, it is a different scope.  The EPP is the processor only (and
specifically the CPU whose HWP request is updated) and the EPB is the
whole SoC (at least).

Second, the EPB is still there on HWP systems, so the EPP is not a
replacement for it.  It is a separate control on top of it.

Hence, whatever is done to the EPP has no bearing on what should be
done to the EPB.  Moreover, because the other cpufreq drivers don't
update the EPB, there is no reason for intel_pstate to do that.

> > Also in the passive mode of intel_pstate, when the regular cpufreq
> > "performance" governor is in use, it's all about setting the frequency
> > to the max alone through min = max without touching any other knobs
> > which need to be adjusted separately.  That's how it's been always
> > working and changing it now may confuse the users who have learned to
> > rely on this behavior.
>
> But the behaviour is inconsistent anyhow.
>
> How can we possibly claim that this:
>
> doug@s18:~$ sudo ~/turbostat --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 6
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 18.12   3700    25782   38      13.94   0.00
> 0.55    3701    3000    38      13.94   0.00
> 0.01    3701    49
> 19.29   3700    5529
> 35.97   3700    6051
> 26.99   3700    5177
> 25.92   3700    5976
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 18.12   3700    27042   40      14.15   0.00
> 0.55    3701    2978    40      14.15   0.00
> 0.01    3701    22
> 30.01   3700    6042
> 28.09   3700    6044
> 29.18   3700    6046
> 20.91   3700    5910
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 18.13   3700    27195   40      14.06   0.00
> 0.55    3701    2983    40      14.06   0.00
> 0.01    3701    20
> 27.64   3700    6039
> 20.31   3700    6043
> 36.12   3700    6056
> 24.18   3700    6054
>
> is "performance" mode?
> There is plenty enough load on 4 of the CPUs.
> In performance mode I would expect 4.6 GHz.
>
> You can see the request for pstate 46,
> But only pstate 37 is granted:

Well, that's how it goes, on this particular system.

What a "regular" cpufreq driver can do (and which also applies to
intel_pstate in the passive mode) is to ask for the frequency selected
by the governor.  Anything else is beyond the scope of its use.
Anything else would be inconsistent and so it may be confusing.

> root@s18:/home/doug# /home/doug/c/msr-decoder
> 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :  37 :  37 :  37 :  37 :  37 :  37 :
> B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable
> 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :  46 :  46 :  46 :  46 :  46 :  46 :
> C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6 :   6 :   6 :
> 1.) 0x19C: IA32_THERM_STATUS: 883E0000
> 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18 reset
> 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 883C0000
> 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0
> A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

As a general rule, on Linux you'll never see EPB==0 after a fresh boot
and so it needs to be adjusted via sysfs.

If you get better results with EPB==0, set it to 0, but that may not
match everybody's needs (which is why it is never 0 after a fresh boot
in the first place).

Patch
diff mbox series

diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index f4dd73396f28..b536e381cd56 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -74,7 +74,8 @@  static int intel_epb_save(void)

 static void intel_epb_restore(void)
 {
-       u64 val = this_cpu_read(saved_epb);
+//     u64 val = this_cpu_read(saved_epb);
+       u64 val = 6;
        u64 epb;

        rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);