Message ID | baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
* Len Brown <lenb@kernel.org> wrote: > From: Len Brown <len.brown@intel.com> > > Linux 3.10 changed the timing of how thread_info->flags is touched: > > x86: Use generic idle loop > (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) > > This caused Intel NHM-EX and WSM-EX servers to experience a large number > of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote > from deep C-states to shallow C-states, which caused these platforms > to experience a significant increase in idle power. > > Note that this issue was already present before the commit above, > however, it wasn't seen often enough to be noticed in power measurements. > > Here we extend an errata workaround from the Core2 EX "Dunnington" > to extend to NHM-EX and WSM-EX, to prevent these immediate > returns from MWAIT, reducing idle power on these platforms. > > While only acpi_idle ran on Dunnington, intel_idle > may also run on these two newer systems. > As of today, there are no other models that are known > to need this tweak. > > ref: https://lkml.org/lkml/2013/12/7/22 > Signed-off-by: Len Brown <len.brown@intel.com> > Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x > --- > arch/x86/kernel/cpu/intel.c | 3 ++- > drivers/idle/intel_idle.c | 3 +++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > index dc1ec0d..ea04b34 100644 > --- a/arch/x86/kernel/cpu/intel.c > +++ b/arch/x86/kernel/cpu/intel.c > @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c) > set_cpu_cap(c, X86_FEATURE_PEBS); > } > > - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush) > + if (c->x86 == 6 && cpu_has_clflush && > + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) > set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR); > > #ifdef CONFIG_X86_64 > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index 92d1206..f80b700 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, > > if (!current_set_polling_and_test()) { > > + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > + clflush((void *)¤t_thread_info()->flags); > + > __monitor((void *)¤t_thread_info()->flags, 0, 0); I don't think either of these casts to '(void *)' is needed, both the clflush() and __monitor() will take pointers. Looks good to me otherwise - except that maybe the best way to represent this quirk would be for the CLFLUSH+MONITOR sequence to be a single 'instruction' which is patched in dynamically during bootup, using our usual alternatives framework. On non-affected CPUs a NOP would remain in place of the CLFLUSH, eliminating the branch above. So the whole thing could be thought of as a slightly more complex 'monitor' instruction - not exposing the quirk details to actual usage sites. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
... or just use static_cpu_has() maybe? Ingo Molnar <mingo@kernel.org> wrote: > >* Len Brown <lenb@kernel.org> wrote: > >> From: Len Brown <len.brown@intel.com> >> >> Linux 3.10 changed the timing of how thread_info->flags is touched: >> >> x86: Use generic idle loop >> (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) >> >> This caused Intel NHM-EX and WSM-EX servers to experience a large >number >> of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to >demote >> from deep C-states to shallow C-states, which caused these platforms >> to experience a significant increase in idle power. >> >> Note that this issue was already present before the commit above, >> however, it wasn't seen often enough to be noticed in power >measurements. >> >> Here we extend an errata workaround from the Core2 EX "Dunnington" >> to extend to NHM-EX and WSM-EX, to prevent these immediate >> returns from MWAIT, reducing idle power on these platforms. >> >> While only acpi_idle ran on Dunnington, intel_idle >> may also run on these two newer systems. >> As of today, there are no other models that are known >> to need this tweak. >> >> ref: https://lkml.org/lkml/2013/12/7/22 >> Signed-off-by: Len Brown <len.brown@intel.com> >> Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x >> --- >> arch/x86/kernel/cpu/intel.c | 3 ++- >> drivers/idle/intel_idle.c | 3 +++ >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/intel.c >b/arch/x86/kernel/cpu/intel.c >> index dc1ec0d..ea04b34 100644 >> --- a/arch/x86/kernel/cpu/intel.c >> +++ b/arch/x86/kernel/cpu/intel.c >> @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c) >> set_cpu_cap(c, X86_FEATURE_PEBS); >> } >> >> - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush) >> + if (c->x86 == 6 && cpu_has_clflush && >> + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == >47)) >> set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR); >> >> #ifdef CONFIG_X86_64 >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c >> index 92d1206..f80b700 100644 >> --- a/drivers/idle/intel_idle.c >> +++ b/drivers/idle/intel_idle.c >> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, >> >> if (!current_set_polling_and_test()) { >> >> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) >> + clflush((void *)¤t_thread_info()->flags); >> + >> __monitor((void *)¤t_thread_info()->flags, 0, 0); > >I don't think either of these casts to '(void *)' is needed, both the >clflush() and __monitor() will take pointers. > >Looks good to me otherwise - except that maybe the best way to >represent this quirk would be for the CLFLUSH+MONITOR sequence to be a >single 'instruction' which is patched in dynamically during bootup, >using our usual alternatives framework. > >On non-affected CPUs a NOP would remain in place of the CLFLUSH, >eliminating the branch above. > >So the whole thing could be thought of as a slightly more complex >'monitor' instruction - not exposing the quirk details to actual usage >sites. > >Thanks, > > Ingo
On Thu, Dec 19, 2013 at 06:40:41AM -0800, H. Peter Anvin wrote:
> ... or just use static_cpu_has() maybe?
Right, if we can get the compiler to generate the shortest CLFLUSH of 3
bytes with register indirect addressing for the operand, then stomping
over those 3 bytes with the alternatives would be the fastest solution.
If, of course, three NOPs which get discarded in the front end are
cheaper than the near JMP from static_cpu_has.
On 12/19/2013 04:22 AM, Ingo Molnar wrote: >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c >> index 92d1206..f80b700 100644 >> --- a/drivers/idle/intel_idle.c >> +++ b/drivers/idle/intel_idle.c >> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, >> >> if (!current_set_polling_and_test()) { >> >> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) >> + clflush((void *)¤t_thread_info()->flags); >> + >> __monitor((void *)¤t_thread_info()->flags, 0, 0); > > I don't think either of these casts to '(void *)' is needed, both the > clflush() and __monitor() will take pointers. __monitor() currently doesn't, which is idiotic. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* H. Peter Anvin <hpa@zytor.com> wrote: > On 12/19/2013 04:22 AM, Ingo Molnar wrote: > >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > >> index 92d1206..f80b700 100644 > >> --- a/drivers/idle/intel_idle.c > >> +++ b/drivers/idle/intel_idle.c > >> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, > >> > >> if (!current_set_polling_and_test()) { > >> > >> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > >> + clflush((void *)¤t_thread_info()->flags); > >> + > >> __monitor((void *)¤t_thread_info()->flags, 0, 0); > > > > I don't think either of these casts to '(void *)' is needed, both the > > clflush() and __monitor() will take pointers. > > __monitor() currently doesn't, which is idiotic. Hm, __monitor() seems to take a void *: arch/x86/include/asm/processor.h:static inline void __monitor(const void *eax, unsigned long ecx, So writing: if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) clflush(¤t_thread_info()->flags); __monitor(¤t_thread_info()->flags, 0, 0); ought to work just fine. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/19/2013 08:02 AM, Ingo Molnar wrote: >> >> __monitor() currently doesn't, which is idiotic. > > Hm, __monitor() seems to take a void *: > > arch/x86/include/asm/processor.h:static inline void __monitor(const void *eax, unsigned long ecx, > > So writing: > > if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > clflush(¤t_thread_info()->flags); > > __monitor(¤t_thread_info()->flags, 0, 0); > Yes, brain failure on my part. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index dc1ec0d..ea04b34 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_PEBS); } - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush) + if (c->x86 == 6 && cpu_has_clflush && + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR); #ifdef CONFIG_X86_64 diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 92d1206..f80b700 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, if (!current_set_polling_and_test()) { + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)¤t_thread_info()->flags); + __monitor((void *)¤t_thread_info()->flags, 0, 0); smp_mb(); if (!need_resched())