Message ID | 000701d656be$c48083e0$4d818ba0$@net (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | cpufreq: intel_pstate: EPB with performance governor | expand |
On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Srinivas and/or Rafael, > > Can you please confirm or deny that an older > commit: > > commit 8442885fca09b2d26375b9fe507759879a6f661e > cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode > > has been superseded by: > > arch/x86/kernel/cpu/intel_epb.c No, it hasn't. However, intel_pstate only touches the EPB if EPP is not supported, which should become a non-issue after this patch posted by me yesterday: https://patchwork.kernel.org/patch/11663315/ If EPP is supported, intel_pstate will use it and it will never look at the EPB even. > and that now there is no way to have some default EPB (say 6) for > governors other than performance, while still getting an EPB of 0 > for the performance governor. If EPP is supported, what happens to the EPB is completely orthogonal to cpufreq etc. So it is possible to have the EPB different from 0, but it should be the same for all governors unless changed via energy_perf_bias. If EPP is not supported, though, then without the patch mentioned above, intel_pstate may fiddle with the EPB. > Additional notes: > Both my test computers have EPB as 0 upon startup, That is before intel_epb_init() runs, because it will change the EPB to "normal" (6). > But I also tried this: > > diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c > index f4dd73396f28..b536e381cd56 100644 > --- a/arch/x86/kernel/cpu/intel_epb.c > +++ b/arch/x86/kernel/cpu/intel_epb.c > @@ -74,7 +74,8 @@ static int intel_epb_save(void) > > static void intel_epb_restore(void) > { > - u64 val = this_cpu_read(saved_epb); > +// u64 val = this_cpu_read(saved_epb); > + u64 val = 6; > u64 epb; > > rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > which did get rid of this message: > kernel: [ 0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' Which is exactly what happens when this message is printed. Instead of commenting out the line of code above, which is not a correct thing to do in general, you can simply set the EPB to 0 via energy_perf_bias for all CPUs and it should stick. Thanks!
Hi Rafael, Thank you for your reply. On 2020.07.15 09:47 Rafael J. Wysocki wrote: > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote: > > > > Hi Srinivas and/or Rafael, > > > > Can you please confirm or deny that an older > > commit: > > > > commit 8442885fca09b2d26375b9fe507759879a6f661e > > cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode > > > > has been superseded by: > > > > arch/x86/kernel/cpu/intel_epb.c > > No, it hasn't. > > However, intel_pstate only touches the EPB if EPP is not supported, > which should become a non-issue after this patch posted by me > yesterday: https://patchwork.kernel.org/patch/11663315/ > > If EPP is supported, intel_pstate will use it and it will never look > at the EPB even. > > > and that now there is no way to have some default EPB (say 6) for > > governors other than performance, while still getting an EPB of 0 > > for the performance governor. > > If EPP is supported, what happens to the EPB is completely orthogonal > to cpufreq etc. Yes, I know. I am talking about when HWP is disabled. And I do not understand your reference to that patch from yesterday, as it seems unrelated to me. > > So it is possible to have the EPB different from 0, but it should be > the same for all governors unless changed via energy_perf_bias. And I am saying that contradicts the earlier referenced patch. We want EPB set to 0 for performance mode, otherwise I challenge the name "performance" governor. Yes, EPB can be whatever for the other governors. > > If EPP is not supported, though, then without the patch mentioned > above, intel_pstate may fiddle with the EPB. > > > Additional notes: > > Both my test computers have EPB as 0 upon startup, > > That is before intel_epb_init() runs, because it will change the EPB > to "normal" (6). Yes, I know. > > > But I also tried this: > > > > diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c > > index f4dd73396f28..b536e381cd56 100644 > > --- a/arch/x86/kernel/cpu/intel_epb.c > > +++ b/arch/x86/kernel/cpu/intel_epb.c > > @@ -74,7 +74,8 @@ static int intel_epb_save(void) > > > > static void intel_epb_restore(void) > > { > > - u64 val = this_cpu_read(saved_epb); > > +// u64 val = this_cpu_read(saved_epb); > > + u64 val = 6; > > u64 epb; > > > > rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > > > which did get rid of this message: > > kernel: [ 0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' > > Which is exactly what happens when this message is printed. > > Instead of commenting out the line of code above, which is not a > correct thing to do in general, Yes, of course. That was just a test, and the only way I could think of to prove that the system started with it as 0. > you can simply set the EPB to 0 via > energy_perf_bias for all CPUs and it should stick. And I am saying I should not have to do that, or even know about it, when I want to use the performance governor. But yes, I expect the driver to remember the default, or otherwise set, value of EPB for all the other governors. ... Doug
Hi Doug, On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Rafael, > > Thank you for your reply. > > On 2020.07.15 09:47 Rafael J. Wysocki wrote: > > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote: > > > > > > Hi Srinivas and/or Rafael, > > > > > > Can you please confirm or deny that an older > > > commit: > > > > > > commit 8442885fca09b2d26375b9fe507759879a6f661e > > > cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode > > > > > > has been superseded by: > > > > > > arch/x86/kernel/cpu/intel_epb.c > > > > No, it hasn't. > > > > However, intel_pstate only touches the EPB if EPP is not supported, > > which should become a non-issue after this patch posted by me > > yesterday: https://patchwork.kernel.org/patch/11663315/ > > > > If EPP is supported, intel_pstate will use it and it will never look > > at the EPB even. > > > > > and that now there is no way to have some default EPB (say 6) for > > > governors other than performance, while still getting an EPB of 0 > > > for the performance governor. > > > > If EPP is supported, what happens to the EPB is completely orthogonal > > to cpufreq etc. > > Yes, I know. > I am talking about when HWP is disabled. I see. > And I do not understand your reference to that patch from yesterday, > as it seems unrelated to me. If you are referring to when HWP is disabled, then it is not related indeed. > > So it is possible to have the EPB different from 0, but it should be > > the same for all governors unless changed via energy_perf_bias. > > And I am saying that contradicts the earlier referenced patch. > We want EPB set to 0 for performance mode, The "performance" governor in cpufreq (or the active-mode "performance" scaling algorithm in intel_pstate) covers the CPU performance scaling only, while the EPB potentially affects the other system components too. Thus driving the EPB from the CPU performance scaling subsystem is not the right approach IMO. > otherwise I challenge the name "performance" governor. In that case "performance" means the maximum CPU capacity at the given EPB level. > Yes, EPB can be whatever for the other governors. CPU performance scaling governors and the EPB should not be directly related to each other. The EPB is system-wide, which generally means more than just CPUs (the uncore and memory may be affected by it in principle, for example). > > > > If EPP is not supported, though, then without the patch mentioned > > above, intel_pstate may fiddle with the EPB. > > > > > Additional notes: > > > Both my test computers have EPB as 0 upon startup, > > > > That is before intel_epb_init() runs, because it will change the EPB > > to "normal" (6). > > Yes, I know. > > > > > > But I also tried this: > > > > > > diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c > > > index f4dd73396f28..b536e381cd56 100644 > > > --- a/arch/x86/kernel/cpu/intel_epb.c > > > +++ b/arch/x86/kernel/cpu/intel_epb.c > > > @@ -74,7 +74,8 @@ static int intel_epb_save(void) > > > > > > static void intel_epb_restore(void) > > > { > > > - u64 val = this_cpu_read(saved_epb); > > > +// u64 val = this_cpu_read(saved_epb); > > > + u64 val = 6; > > > u64 epb; > > > > > > rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > > > > > which did get rid of this message: > > > kernel: [ 0.102158] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' > > > > Which is exactly what happens when this message is printed. > > > > Instead of commenting out the line of code above, which is not a > > correct thing to do in general, > > Yes, of course. That was just a test, and the only way I could think of > to prove that the system started with it as 0. > > > you can simply set the EPB to 0 via > > energy_perf_bias for all CPUs and it should stick. > > And I am saying I should not have to do that, or even know about it, > when I want to use the performance governor. Again, cpufreq governors are on top of the EPB. > But yes, I expect the driver to remember the default, or otherwise set, > value of EPB for all the other governors. We clearly don't agree here. Also in the passive mode of intel_pstate, when the regular cpufreq "performance" governor is in use, it's all about setting the frequency to the max alone through min = max without touching any other knobs which need to be adjusted separately. That's how it's been always working and changing it now may confuse the users who have learned to rely on this behavior. Thanks!
Hi Rafael, Thank you for your reply. I'll give it up after this, I promise. On 2020.07.16 05:00 Of Rafael J. Wysocki > On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote: > > On 2020.07.15 09:47 Rafael J. Wysocki wrote: > > > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote: ... > > > you can simply set the EPB to 0 via > > > energy_perf_bias for all CPUs and it should stick. > > > > And I am saying I should not have to do that, or even know about it, > > when I want to use the performance governor. > > Again, cpufreq governors are on top of the EPB. > > > But yes, I expect the driver to remember the default, or otherwise set, > > value of EPB for all the other governors. > > We clearly don't agree here. Agreed. (That we disagree.) It is done with EPP in active mode with HWP between performance and powersave governors, so I struggle with treating the EPB case differently. > Also in the passive mode of intel_pstate, when the regular cpufreq > "performance" governor is in use, it's all about setting the frequency > to the max alone through min = max without touching any other knobs > which need to be adjusted separately. That's how it's been always > working and changing it now may confuse the users who have learned to > rely on this behavior. But the behaviour is inconsistent anyhow. How can we possibly claim that this: doug@s18:~$ sudo ~/turbostat --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 6 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 18.12 3700 25782 38 13.94 0.00 0.55 3701 3000 38 13.94 0.00 0.01 3701 49 19.29 3700 5529 35.97 3700 6051 26.99 3700 5177 25.92 3700 5976 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 18.12 3700 27042 40 14.15 0.00 0.55 3701 2978 40 14.15 0.00 0.01 3701 22 30.01 3700 6042 28.09 3700 6044 29.18 3700 6046 20.91 3700 5910 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 18.13 3700 27195 40 14.06 0.00 0.55 3701 2983 40 14.06 0.00 0.01 3701 20 27.64 3700 6039 20.31 3700 6043 36.12 3700 6056 24.18 3700 6054 is "performance" mode? There is plenty enough load on 4 of the CPUs. In performance mode I would expect 4.6 GHz. You can see the request for pstate 46, But only pstate 37 is granted: root@s18:/home/doug# /home/doug/c/msr-decoder 8.) 0x198: IA32_PERF_STATUS : CPU 0-5 : 37 : 37 : 37 : 37 : 37 : 37 : B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable 9.) 0x199: IA32_PERF_CTL : CPU 0-5 : 46 : 46 : 46 : 46 : 46 : 46 : C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 : 6 : 6 : 6 : 6 : 6 : 6 : 1.) 0x19C: IA32_THERM_STATUS: 883E0000 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18 reset 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 883C0000 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0 A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable ... Doug
On Fri, Jul 17, 2020 at 11:23 PM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Rafael, > > Thank you for your reply. > I'll give it up after this, I promise. > > On 2020.07.16 05:00 Of Rafael J. Wysocki > > On Thu, Jul 16, 2020 at 12:44 AM Doug Smythies <dsmythies@telus.net> wrote: > > > On 2020.07.15 09:47 Rafael J. Wysocki wrote: > > > > On Fri, Jul 10, 2020 at 3:34 PM Doug Smythies <dsmythies@telus.net> wrote: > ... > > > > you can simply set the EPB to 0 via > > > > energy_perf_bias for all CPUs and it should stick. > > > > > > And I am saying I should not have to do that, or even know about it, > > > when I want to use the performance governor. > > > > Again, cpufreq governors are on top of the EPB. > > > > > But yes, I expect the driver to remember the default, or otherwise set, > > > value of EPB for all the other governors. > > > > We clearly don't agree here. > > Agreed. (That we disagree.) > > It is done with EPP in active mode with HWP between performance > and powersave governors, so I struggle with treating the EPB > case differently. First, it is a different scope. The EPP is the processor only (and specifically the CPU whose HWP request is updated) and the EPB is the whole SoC (at least). Second, the EPB is still there on HWP systems, so the EPP is not a replacement for it. It is a separate control on top of it. Hence, whatever is done to the EPP has no bearing on what should be done to the EPB. Moreover, because the other cpufreq drivers don't update the EPB, there is no reason for intel_pstate to do that. > > Also in the passive mode of intel_pstate, when the regular cpufreq > > "performance" governor is in use, it's all about setting the frequency > > to the max alone through min = max without touching any other knobs > > which need to be adjusted separately. That's how it's been always > > working and changing it now may confuse the users who have learned to > > rely on this behavior. > > But the behaviour is inconsistent anyhow. > > How can we possibly claim that this: > > doug@s18:~$ sudo ~/turbostat --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 6 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 18.12 3700 25782 38 13.94 0.00 > 0.55 3701 3000 38 13.94 0.00 > 0.01 3701 49 > 19.29 3700 5529 > 35.97 3700 6051 > 26.99 3700 5177 > 25.92 3700 5976 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 18.12 3700 27042 40 14.15 0.00 > 0.55 3701 2978 40 14.15 0.00 > 0.01 3701 22 > 30.01 3700 6042 > 28.09 3700 6044 > 29.18 3700 6046 > 20.91 3700 5910 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 18.13 3700 27195 40 14.06 0.00 > 0.55 3701 2983 40 14.06 0.00 > 0.01 3701 20 > 27.64 3700 6039 > 20.31 3700 6043 > 36.12 3700 6056 > 24.18 3700 6054 > > is "performance" mode? > There is plenty enough load on 4 of the CPUs. > In performance mode I would expect 4.6 GHz. > > You can see the request for pstate 46, > But only pstate 37 is granted: Well, that's how it goes, on this particular system. What a "regular" cpufreq driver can do (and which also applies to intel_pstate in the passive mode) is to ask for the frequency selected by the governor. Anything else is beyond the scope of its use. Anything else would be inconsistent and so it may be confusing. > root@s18:/home/doug# /home/doug/c/msr-decoder > 8.) 0x198: IA32_PERF_STATUS : CPU 0-5 : 37 : 37 : 37 : 37 : 37 : 37 : > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable > 9.) 0x199: IA32_PERF_CTL : CPU 0-5 : 46 : 46 : 46 : 46 : 46 : 46 : > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 : 6 : 6 : 6 : 6 : 6 : 6 : > 1.) 0x19C: IA32_THERM_STATUS: 883E0000 > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18 reset > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 883C0000 > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0 > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable As a general rule, on Linux you'll never see EPB==0 after a fresh boot and so it needs to be adjusted via sysfs. If you get better results with EPB==0, set it to 0, but that may not match everybody's needs (which is why it is never 0 after a fresh boot in the first place).
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c index f4dd73396f28..b536e381cd56 100644 --- a/arch/x86/kernel/cpu/intel_epb.c +++ b/arch/x86/kernel/cpu/intel_epb.c @@ -74,7 +74,8 @@ static int intel_epb_save(void) static void intel_epb_restore(void) { - u64 val = this_cpu_read(saved_epb); +// u64 val = this_cpu_read(saved_epb); + u64 val = 6; u64 epb; rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);