x86/cpufeature: Remove __pure attribute to _static_cpu_has()
diff mbox series

Message ID 20190307151036.GD26566@zn.tnic
State New
Headers show
Series
  • x86/cpufeature: Remove __pure attribute to _static_cpu_has()
Related show

Commit Message

Borislav Petkov March 7, 2019, 3:10 p.m. UTC
On Mon, Feb 11, 2019 at 12:32:41PM -0800, Nadav Amit wrote:
> BTW: the “__pure” attribute is useless when “__always_inline” is used.
> Unless it is intended to be some sort of comment, of course.

---
From: Borislav Petkov <bp@suse.de>
Date: Thu, 7 Mar 2019 15:54:51 +0100

__pure is used to make gcc do Common Subexpression Elimination (CSE)
and thus save subsequent invocations of a function which does a complex
computation (without side effects). As a simple example:

  bool a = _static_cpu_has(x);
  bool b = _static_cpu_has(x);

gets turned into

  bool a = _static_cpu_has(x);
  bool b = a;

However, gcc doesn't do CSE with asm()s when those get inlined - like it
is done with _static_cpu_has() - because, for example, the t_yes/t_no
labels are different for each inlined function body and thus cannot be
detected as equivalent anymore for the CSE heuristic to hit.

However, this all is beside the point because best it should be avoided
to have more than one call to _static_cpu_has(X) in the same function
due to the fact that each such call is an alternatives patch site and it
is simply pointless.

Therefore, drop the __pure attribute as it is not doing anything.

Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
---
 arch/x86/include/asm/cpufeature.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

H. Peter Anvin March 7, 2019, 4:43 p.m. UTC | #1
On March 7, 2019 7:10:36 AM PST, Borislav Petkov <bp@alien8.de> wrote:
>On Mon, Feb 11, 2019 at 12:32:41PM -0800, Nadav Amit wrote:
>> BTW: the “__pure” attribute is useless when “__always_inline” is
>used.
>> Unless it is intended to be some sort of comment, of course.
>
>---
>From: Borislav Petkov <bp@suse.de>
>Date: Thu, 7 Mar 2019 15:54:51 +0100
>
>__pure is used to make gcc do Common Subexpression Elimination (CSE)
>and thus save subsequent invocations of a function which does a complex
>computation (without side effects). As a simple example:
>
>  bool a = _static_cpu_has(x);
>  bool b = _static_cpu_has(x);
>
>gets turned into
>
>  bool a = _static_cpu_has(x);
>  bool b = a;
>
>However, gcc doesn't do CSE with asm()s when those get inlined - like
>it
>is done with _static_cpu_has() - because, for example, the t_yes/t_no
>labels are different for each inlined function body and thus cannot be
>detected as equivalent anymore for the CSE heuristic to hit.
>
>However, this all is beside the point because best it should be avoided
>to have more than one call to _static_cpu_has(X) in the same function
>due to the fact that each such call is an alternatives patch site and
>it
>is simply pointless.
>
>Therefore, drop the __pure attribute as it is not doing anything.
>
>Reported-by: Nadav Amit <nadav.amit@gmail.com>
>Signed-off-by: Borislav Petkov <bp@suse.de>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: x86@kernel.org
>---
> arch/x86/include/asm/cpufeature.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/cpufeature.h
>b/arch/x86/include/asm/cpufeature.h
>index e25d11ad7a88..6d6d5cc4302b 100644
>--- a/arch/x86/include/asm/cpufeature.h
>+++ b/arch/x86/include/asm/cpufeature.h
>@@ -162,7 +162,7 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c,
>unsigned int bit);
>* majority of cases and you should stick to using it as it is generally
>  * only two instructions: a RIP-relative MOV and a TEST.
>  */
>-static __always_inline __pure bool _static_cpu_has(u16 bit)
>+static __always_inline bool _static_cpu_has(u16 bit)
> {
> 	asm_volatile_goto("1: jmp 6f\n"
> 		 "2:\n"

Uhm... (a) it is correct, even if the compiler doesn't use it now, it allows the compiler to CSE it in the future; (b) it is documentation; (c) there is an actual bug here: the "volatile" implies a side effect, which in reality is not present, inhibiting CSE.

So the correct fix is to remove "volatile", not remove "__pure".
Borislav Petkov March 7, 2019, 5:02 p.m. UTC | #2
Lemme preface this by saying that I've talked to gcc guys before doing
this.

On Thu, Mar 07, 2019 at 08:43:50AM -0800, hpa@zytor.com wrote:
> Uhm... (a) it is correct, even if the compiler doesn't use it now, it
> allows the compiler to CSE it in the future;

Well, the compiler won't CSE asm blocks due to the difference in the
labels, for example, so the heuristic won't detect them as equivalent
blocks.

Also, compiler guys said that they might consider inlining pure
functions later, in the IPA stage but that's future stuff.

This is how I understood it, at least.

> (b) it is documentation;

That could be a comment instead. Otherwise we will wonder again why this
is marked pure.

> (c) there is an actual bug here: the "volatile" implies a side effect,
> which in reality is not present, inhibiting CSE.
>
> So the correct fix is to remove "volatile", not remove "__pure".

There's not really a volatile there:

/*
 * GCC 'asm goto' miscompiles certain code sequences:
 *
 *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
 *
 * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
 *
 * (asm goto is automatically volatile - the naming reflects this.)
 */
#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)

Patch
diff mbox series

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index e25d11ad7a88..6d6d5cc4302b 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -162,7 +162,7 @@  extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
  * majority of cases and you should stick to using it as it is generally
  * only two instructions: a RIP-relative MOV and a TEST.
  */
-static __always_inline __pure bool _static_cpu_has(u16 bit)
+static __always_inline bool _static_cpu_has(u16 bit)
 {
 	asm_volatile_goto("1: jmp 6f\n"
 		 "2:\n"