Message ID | 1605148.8jT99SsvVP@aspire.rjw.lan (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Rafael Wysocki |
Headers | show |
Series | PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS handling fixes and sysfs i/f | expand |
On 3/21/19 11:18 PM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is > problematic, because it may cause changes made by user space to that > MSR (with the help of the x86_energy_perf_policy tool, for example) > to be lost every time a CPU goes offline and then back online as well > as during system-wide power management transitions into sleep states > and back into the working state. > > The first problem is that if the current EPB value for a CPU going > online is 0 ('performance'), the kernel will change it to 6 ('normal') > regardless of whether or not this is the first bring-up of that CPU. > That also happens during system-wide resume from sleep states > (including, but not limited to, hibernation). However, the EPB may > have been adjusted by user space this way and the kernel should not > blindly override that setting. > > The second problem is that if the platform firmware resets the EPB > values for any CPUs during system-wide resume from a sleep state, > the kernel will not restore their previous EPB values that may > have been set by user space before the preceding system-wide > suspend transition. Again, that behavior may at least be confusing > from the user space perspective. > > In order to address these issues, rework the handling of > MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU > offline and restored on CPU online as well as (for the boot CPU) > during the syscore stages of system-wide suspend and resume > transitions, respectively. > > However, retain the policy by which the EPB is set to 6 ('normal') > on the first bring-up of each CPU if its initial value is 0, based > on the observation that 0 may mean 'not initialized' just as well as > 'performance' in that case. > > While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into > a separate file and document it in Documentation/admin-guide. > > Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS) > Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume) > Reported-by: Thomas Renninger <trenn@suse.de> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is > problematic, because it may cause changes made by user space to that > MSR (with the help of the x86_energy_perf_policy tool, for example) One more reason to control MSR accesses from userspace. I'm working on a series to even completely forbid accesses to some MSRs over /dev/msr so I think accessing MSR_IA32_ENERGY_PERF_BIAS solely over the new interface in patch 2 would be much better. So, you're carrying those and you'd like to have an ACK from me? Btw, a couple of nitpicks below. > Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c > =================================================================== > --- /dev/null > +++ linux-pm/arch/x86/kernel/cpu/intel_epb.c > @@ -0,0 +1,131 @@ > +// SPDX-License-Identifier: GPL-2.0 ... > +static DEFINE_PER_CPU(u8, saved_epb); > + > +#define EPB_MASK 0x0fULL > +#define EPB_SAVED 0x10ULL > + > +static int intel_epb_save(void) I'd drop that "intel_epb_" prefix from those static functions, but your call... > +{ > + u64 epb; > + > + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > + /* > + * Ensure that saved_epb will always be nonzero after this write even if > + * the EPB value read from the MSR is 0. > + */ > + this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED); > + > + return 0; > +} ... > Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst > =================================================================== > --- /dev/null > +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 #345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1: +====================================== > @@ -0,0 +1,6 @@ > +====================================== > +Intel Performance and Energy Bias Hint > +====================================== > + > +.. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c > + :doc: overview > Index: linux-pm/Documentation/admin-guide/pm/working-state.rst > =================================================================== > --- linux-pm.orig/Documentation/admin-guide/pm/working-state.rst > +++ linux-pm/Documentation/admin-guide/pm/working-state.rst > @@ -8,3 +8,4 @@ Working-State Power Management > cpuidle > cpufreq > intel_pstate > + intel_epb >
On Fri, 22 Mar 2019, Borislav Petkov wrote: > On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote: > > Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst > > =================================================================== > > --- /dev/null > > +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst > > WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 > #345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1: > +====================================== We have no proper decision/recommendation about documentation licensing. That's being worked on. Thanks, tglx
On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote: > We have no proper decision/recommendation about documentation > licensing. That's being worked on. Ok, I'm ignoring this checkpatch warning from now on. Thx.
On Fri, 22 Mar 2019, Borislav Petkov wrote: > On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote: > > We have no proper decision/recommendation about documentation > > licensing. That's being worked on. > > Ok, I'm ignoring this checkpatch warning from now on. Only for Documentation/* files please.
Thanks Rafael for your quick look at and all the time you spend for this! A /sys userspace knob will certainly not be enough for us. You'll need a tool installed fixing this. powertop on laptops or tuned on servers or a well hidden bootup quirk or whatsoever. The patch I sent with this part: + if (acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER || + acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER) + return; and not touching the EBP value then should at least match most of our users and OEMs who want a "performance" setting out of the box and set this on purpose. Even nicer would be compile option to not touch this at all. On Thursday, March 21, 2019 11:18:01 PM CET Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> ... > + * > + * Second, on many systems the initial EPB value coming from the platform > + * firmware is 0 ('performance') and at least on some of them that is > because + * the platform firmware does not initialize EPB Why does the CPU not initialize this value to 6? That would allow OEMs/BIOS to also suggest an init value for the system. We should try to get microcode people or whoever is in charge to initialize this value "properly" if Intel thinks 6 is the correct init value. > at all with the > assumption that + * the OS will do that anyway. That sometimes is > problematic, as it may cause + * the system battery to drain too fast, for > example, so it is better to adjust + * it on CPU bring-up and if the > initial EPB value for a given CPU is 0, the + * kernel changes it to 6 > ('normal'). I have an idea to let the kernel more decide about such policies. It's a nice example that it makes sense to let the kernel set default values. But not unconditionally, according to what the system is intended to do. I wanted to do this for quite some time.., I hopefully find the time and motivation now. Thanks Rafael. Sorry for the somewhat rude sounding previous mail, that was not on purpose. You helped me quite a lot in the past and you obviously still do. Thomas
On Fri, Mar 22, 2019 at 05:27:43PM +0100, Thomas Renninger wrote: > Thanks Rafael for your quick look at and all the time you spend for this! > > A /sys userspace knob will certainly not be enough for us. > You'll need a tool installed fixing this. We won't pick up the second patch converting to the sysfs interface of course - only the first one. The second patch is beginning the slow phasing out of the direct /dev/msr access.
On Fri, 2019-03-22 at 17:12 +0100, Thomas Gleixner wrote: > On Fri, 22 Mar 2019, Borislav Petkov wrote: > > > On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote: > > > We have no proper decision/recommendation about documentation > > > licensing. That's being worked on. > > > > Ok, I'm ignoring this checkpatch warning from now on. > > Only for Documentation/* files please. Perhaps --- scripts/checkpatch.pl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index d0001fd1112d..00c1457fe79b 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3058,7 +3058,8 @@ sub process { "Improper SPDX comment style for '$realfile', please use '$comment' instead\n" . $herecurr); } - if ($comment !~ /^$/ && + if ($realfile !~ m@^Documentation/@ && + $comment !~ /^$/ && $rawline !~ m@^\+\Q$comment\E SPDX-License-Identifier: @) { WARN("SPDX_LICENSE_TAG", "Missing or malformed SPDX-License-Identifier tag in line $checklicenseline\n" . $herecurr);
On Fri, Mar 22, 2019 at 3:28 PM Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is > > problematic, because it may cause changes made by user space to that > > MSR (with the help of the x86_energy_perf_policy tool, for example) > > One more reason to control MSR accesses from userspace. I'm working on > a series to even completely forbid accesses to some MSRs over /dev/msr > so I think accessing MSR_IA32_ENERGY_PERF_BIAS solely over the new > interface in patch 2 would be much better. > > So, you're carrying those and you'd like to have an ACK from me? > > Btw, a couple of nitpicks below. > > > Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c > > =================================================================== > > --- /dev/null > > +++ linux-pm/arch/x86/kernel/cpu/intel_epb.c > > @@ -0,0 +1,131 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > ... > > > +static DEFINE_PER_CPU(u8, saved_epb); > > + > > +#define EPB_MASK 0x0fULL > > +#define EPB_SAVED 0x10ULL > > + > > +static int intel_epb_save(void) > > I'd drop that "intel_epb_" prefix from those static functions, but your > call... They help indexing tools (elixir.bootlin.com and similar) a bit, so I'd rather retain them. > > +{ > > + u64 epb; > > + > > + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > + /* > > + * Ensure that saved_epb will always be nonzero after this write even if > > + * the EPB value read from the MSR is 0. > > + */ > > + this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED); > > + > > + return 0; > > +} > > ... > > > Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst > > =================================================================== > > --- /dev/null > > +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst > > WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 > #345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1: Well, this is documentation, so ...
On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is > problematic, because it may cause changes made by user space to that > MSR (with the help of the x86_energy_perf_policy tool, for example) > to be lost every time a CPU goes offline and then back online as well > as during system-wide power management transitions into sleep states > and back into the working state. > > The first problem is that if the current EPB value for a CPU going > online is 0 ('performance'), the kernel will change it to 6 ('normal') > regardless of whether or not this is the first bring-up of that CPU. > That also happens during system-wide resume from sleep states > (including, but not limited to, hibernation). However, the EPB may > have been adjusted by user space this way and the kernel should not > blindly override that setting. > > The second problem is that if the platform firmware resets the EPB > values for any CPUs during system-wide resume from a sleep state, > the kernel will not restore their previous EPB values that may > have been set by user space before the preceding system-wide > suspend transition. Again, that behavior may at least be confusing > from the user space perspective. > > In order to address these issues, rework the handling of > MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU > offline and restored on CPU online as well as (for the boot CPU) > during the syscore stages of system-wide suspend and resume > transitions, respectively. > > However, retain the policy by which the EPB is set to 6 ('normal') > on the first bring-up of each CPU if its initial value is 0, based > on the observation that 0 may mean 'not initialized' just as well as > 'performance' in that case. > > While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into > a separate file and document it in Documentation/admin-guide. > > Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS) > Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume) > Reported-by: Thomas Renninger <trenn@suse.de> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Borislav Petkov <bp@suse.de>
Index: linux-pm/include/linux/cpuhotplug.h =================================================================== --- linux-pm.orig/include/linux/cpuhotplug.h +++ linux-pm/include/linux/cpuhotplug.h @@ -147,6 +147,7 @@ enum cpuhp_state { CPUHP_AP_X86_VDSO_VMA_ONLINE, CPUHP_AP_IRQ_AFFINITY_ONLINE, CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS, + CPUHP_AP_X86_INTEL_EPB_ONLINE, CPUHP_AP_PERF_ONLINE, CPUHP_AP_PERF_X86_ONLINE, CPUHP_AP_PERF_X86_UNCORE_ONLINE, Index: linux-pm/arch/x86/kernel/cpu/intel.c =================================================================== --- linux-pm.orig/arch/x86/kernel/cpu/intel.c +++ linux-pm/arch/x86/kernel/cpu/intel.c @@ -596,36 +596,6 @@ detect_keyid_bits: c->x86_phys_bits -= keyid_bits; } -static void init_intel_energy_perf(struct cpuinfo_x86 *c) -{ - u64 epb; - - /* - * Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. - * (x86_energy_perf_policy(8) is available to change it at run-time.) - */ - if (!cpu_has(c, X86_FEATURE_EPB)) - return; - - rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); - if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) - return; - - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); -} - -static void intel_bsp_resume(struct cpuinfo_x86 *c) -{ - /* - * MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume, - * so reinitialize it properly like during bootup: - */ - init_intel_energy_perf(c); -} - static void init_cpuid_fault(struct cpuinfo_x86 *c) { u64 msr; @@ -763,8 +733,6 @@ static void init_intel(struct cpuinfo_x8 if (cpu_has(c, X86_FEATURE_TME)) detect_tme(c); - init_intel_energy_perf(c); - init_intel_misc_features(c); } @@ -1023,9 +991,7 @@ static const struct cpu_dev intel_cpu_de .c_detect_tlb = intel_detect_tlb, .c_early_init = early_init_intel, .c_init = init_intel, - .c_bsp_resume = intel_bsp_resume, .c_x86_vendor = X86_VENDOR_INTEL, }; cpu_dev_register(intel_cpu_dev); - Index: linux-pm/arch/x86/kernel/cpu/common.c =================================================================== --- linux-pm.orig/arch/x86/kernel/cpu/common.c +++ linux-pm/arch/x86/kernel/cpu/common.c @@ -1864,23 +1864,6 @@ void cpu_init(void) } #endif -static void bsp_resume(void) -{ - if (this_cpu->c_bsp_resume) - this_cpu->c_bsp_resume(&boot_cpu_data); -} - -static struct syscore_ops cpu_syscore_ops = { - .resume = bsp_resume, -}; - -static int __init init_cpu_syscore(void) -{ - register_syscore_ops(&cpu_syscore_ops); - return 0; -} -core_initcall(init_cpu_syscore); - /* * The microcode loader calls this upon late microcode load to recheck features, * only when microcode has been updated. Caller holds microcode_mutex and CPU Index: linux-pm/arch/x86/kernel/cpu/cpu.h =================================================================== --- linux-pm.orig/arch/x86/kernel/cpu/cpu.h +++ linux-pm/arch/x86/kernel/cpu/cpu.h @@ -14,7 +14,6 @@ struct cpu_dev { void (*c_init)(struct cpuinfo_x86 *); void (*c_identify)(struct cpuinfo_x86 *); void (*c_detect_tlb)(struct cpuinfo_x86 *); - void (*c_bsp_resume)(struct cpuinfo_x86 *); int c_x86_vendor; #ifdef CONFIG_X86_32 /* Optional vendor specific routine to obtain the cache size. */ Index: linux-pm/arch/x86/kernel/cpu/Makefile =================================================================== --- linux-pm.orig/arch/x86/kernel/cpu/Makefile +++ linux-pm/arch/x86/kernel/cpu/Makefile @@ -28,7 +28,7 @@ obj-y += cpuid-deps.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o -obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o +obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o intel_epb.o obj-$(CONFIG_CPU_SUP_AMD) += amd.o obj-$(CONFIG_CPU_SUP_HYGON) += hygon.o obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c =================================================================== --- /dev/null +++ linux-pm/arch/x86/kernel/cpu/intel_epb.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Performance and Energy Bias Hint support. + * + * Copyright (C) 2019 Intel Corporation + * + * Author: + * Rafael J. Wysocki <rafael.j.wysocki@intel.com> + */ + +#include <linux/cpuhotplug.h> +#include <linux/kernel.h> +#include <linux/syscore_ops.h> + +#include <asm/cpufeature.h> +#include <asm/msr.h> + +/** + * DOC: overview + * + * The Performance and Energy Bias Hint (EPB) allows software to specify its + * preference with respect to the power-performance tradeoffs present in the + * processor. Generally, the EPB is expected to be set by user space through + * the generic MSR interface (with the help of the x86_energy_perf_policy tool), + * but there are two reasons for the kernel to touch it. + * + * First, there are systems where the platform firmware resets the EPB during + * system-wide transitions from sleep states back into the working state + * effectively causing the previous EPB updates by user space to be lost. + * Thus the kernel needs to save the current EPB values for all CPUs during + * system-wide transitions to sleep states and restore them on the way back to + * the working state. That can be achieved by saving EPB for secondary CPUs + * when they are taken offline during transitions into system sleep states and + * for the boot CPU in a syscore suspend operation, so that it can be restored + * for the boot CPU in a syscore resume operation and for the other CPUs when + * they are brought back online. However, CPUs that are already offline when + * a system-wide PM transition is started are not taken offline again, but their + * EPB values may still be reset by the platform firmware during the transition, + * so in fact it is necessary to save the EPB of any CPU taken offline and to + * restore it when the given CPU goes back online at all times. + * + * Second, on many systems the initial EPB value coming from the platform + * firmware is 0 ('performance') and at least on some of them that is because + * the platform firmware does not initialize EPB at all with the assumption that + * the OS will do that anyway. That sometimes is problematic, as it may cause + * the system battery to drain too fast, for example, so it is better to adjust + * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the + * kernel changes it to 6 ('normal'). + */ + +static DEFINE_PER_CPU(u8, saved_epb); + +#define EPB_MASK 0x0fULL +#define EPB_SAVED 0x10ULL + +static int intel_epb_save(void) +{ + u64 epb; + + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + /* + * Ensure that saved_epb will always be nonzero after this write even if + * the EPB value read from the MSR is 0. + */ + this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED); + + return 0; +} + +static void intel_epb_restore(void) +{ + u64 val = this_cpu_read(saved_epb); + u64 epb; + + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + if (val) { + val &= EPB_MASK; + } else { + /* + * Because intel_epb_save() has not run for the current CPU yet, + * it is going online for the first time, so if its EPB value is + * 0 ('performance') at this point, assume that it has not been + * initialized by the platform firmware and set it to 6 + * ('normal'). + */ + val = epb & EPB_MASK; + if (val == ENERGY_PERF_BIAS_PERFORMANCE) { + val = ENERGY_PERF_BIAS_NORMAL; + pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); + } + } + wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val); +} + +static struct syscore_ops intel_epb_syscore_ops = { + .suspend = intel_epb_save, + .resume = intel_epb_restore, +}; + +static int intel_epb_online(unsigned int cpu) +{ + intel_epb_restore(); + return 0; +} + +static int intel_epb_offline(unsigned int cpu) +{ + return intel_epb_save(); +} + +static __init int intel_epb_init(void) +{ + int ret; + + if (!boot_cpu_has(X86_FEATURE_EPB)) + return -ENODEV; + + ret = cpuhp_setup_state(CPUHP_AP_X86_INTEL_EPB_ONLINE, + "x86/intel/epb:online", intel_epb_online, + intel_epb_offline); + if (ret < 0) + goto err_out_online; + + register_syscore_ops(&intel_epb_syscore_ops); + return 0; + +err_out_online: + cpuhp_remove_state(CPUHP_AP_X86_INTEL_EPB_ONLINE); + return ret; +} +subsys_initcall(intel_epb_init); Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst =================================================================== --- /dev/null +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst @@ -0,0 +1,6 @@ +====================================== +Intel Performance and Energy Bias Hint +====================================== + +.. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c + :doc: overview Index: linux-pm/Documentation/admin-guide/pm/working-state.rst =================================================================== --- linux-pm.orig/Documentation/admin-guide/pm/working-state.rst +++ linux-pm/Documentation/admin-guide/pm/working-state.rst @@ -8,3 +8,4 @@ Working-State Power Management cpuidle cpufreq intel_pstate + intel_epb