[3/6] x86: Add support for the clwb instruction

Message ID	1415731396-19364-4-git-send-email-ross.zwisler@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Ross Zwisler <ross.zwisler@linux.intel.com> To: linux-kernel@vger.kernel.org Subject: [PATCH 3/6] x86: Add support for the clwb instruction Date: Tue, 11 Nov 2014 11:43:13 -0700 Message-Id: <1415731396-19364-4-git-send-email-ross.zwisler@linux.intel.com> In-Reply-To: <1415731396-19364-1-git-send-email-ross.zwisler@linux.intel.com> References: <1415731396-19364-1-git-send-email-ross.zwisler@linux.intel.com> Cc: x86@kernel.org, dri-devel@lists.freedesktop.org, Thomas Gleixner <tglx@linutronix.de>, Ross Zwisler <ross.zwisler@linux.intel.com>, H Peter Anvin <h.peter.anvin@intel.com>, Ingo Molnar <mingo@kernel.org> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Ross Zwisler Nov. 11, 2014, 6:43 p.m. UTC

Add support for the new clwb instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Here are some things of note:

 - As with the clflushopt patches before this, I'm assuming that the addressing
   mode generated by the original clflush instruction will match the new
   clflush instruction with the 0x66 prefix for clflushopt, and for the
   xsaveopt instruction with the 0x66 prefix for clwb.  For all the test cases
   that I've come up with and for the new clwb code generated by this patch
   series, this has proven to be true on my test machine.

 - According to the SDM, xsaveopt has a form where it has a REX.W prefix.  I
   believe that this prefix will not be generated by gcc in x86_64 kernel code.
   Based on this, I don't believe I need to account for this extra prefix when
   dealing with the assembly language created for clwb.  Please correct me if
   I'm wrong.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Cc: x86@kernel.org
---
 arch/x86/include/asm/cpufeature.h    |  1 +
 arch/x86/include/asm/special_insns.h | 10 ++++++++++
 2 files changed, 11 insertions(+)

Ross Zwisler Nov. 11, 2014, 7:40 p.m. UTC | #1

On Tue, 2014-11-11 at 20:19 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 08:12:39PM +0100, Borislav Petkov wrote:
> > > +			 ".byte 0x66; xsaveopt %P0",
> > 
> > Huh, XSAVEOPT?!? Shouldn't that be CLWB??
> 
> Bah, the same opcodes, only 0x66 prefix makes it into CLWB. Could use a
> comment I guess.

Yep, it's weird, I know.  :)  I'll add a comment.

Thanks,
- Ross

Ross Zwisler Nov. 11, 2014, 7:48 p.m. UTC | #2

On Tue, 2014-11-11 at 20:12 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> > Add support for the new clwb instruction.  This instruction was
> > announced in the document "Intel Architecture Instruction Set Extensions
> > Programming Reference" with reference number 319433-022.
> > 
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> > 
> > Here are some things of note:
> > 
> >  - As with the clflushopt patches before this, I'm assuming that the addressing
> >    mode generated by the original clflush instruction will match the new
> >    clflush instruction with the 0x66 prefix for clflushopt, and for the
> >    xsaveopt instruction with the 0x66 prefix for clwb.  For all the test cases
> >    that I've come up with and for the new clwb code generated by this patch
> >    series, this has proven to be true on my test machine.
> > 
> >  - According to the SDM, xsaveopt has a form where it has a REX.W prefix.  I
> >    believe that this prefix will not be generated by gcc in x86_64 kernel code.
> >    Based on this, I don't believe I need to account for this extra prefix when
> >    dealing with the assembly language created for clwb.  Please correct me if
> >    I'm wrong.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: H Peter Anvin <h.peter.anvin@intel.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: x86@kernel.org
> > ---
> >  arch/x86/include/asm/cpufeature.h    |  1 +
> >  arch/x86/include/asm/special_insns.h | 10 ++++++++++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index b3e6b89..fbbed34 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -227,6 +227,7 @@
> >  #define X86_FEATURE_SMAP	( 9*32+20) /* Supervisor Mode Access Prevention */
> >  #define X86_FEATURE_PCOMMIT	( 9*32+22) /* PCOMMIT instruction */
> >  #define X86_FEATURE_CLFLUSHOPT	( 9*32+23) /* CLFLUSHOPT instruction */
> > +#define X86_FEATURE_CLWB	( 9*32+24) /* CLWB instruction */
> >  #define X86_FEATURE_AVX512PF	( 9*32+26) /* AVX-512 Prefetch */
> >  #define X86_FEATURE_AVX512ER	( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> >  #define X86_FEATURE_AVX512CD	( 9*32+28) /* AVX-512 Conflict Detection */
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 1709a2e..a328460 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
> >  		       "+m" (*(volatile char __force *)__p));
> >  }
> >  
> > +static inline void clwb(volatile void *__p)
> > +{
> > +	alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> 
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?
> 
> :-)

Essentially we need one additional byte at the beginning of the clflush so
that we can flip it into a clflushopt by changing that byte into a 0x66
prefix.  Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
byte NOP_DS_PREFIX.  Both have no functional effect with the plain clflush,
but I've been told that executing a clflush + prefix should be faster than
executing a clflush + NOP.

Ross Zwisler Nov. 11, 2014, 7:54 p.m. UTC | #3

On Tue, 2014-11-11 at 20:46 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 12:40:00PM -0700, Ross Zwisler wrote:
> > Yep, it's weird, I know.  :)
> 
> But sure, saving opcode space, makes sense to me.
> 
> Btw, I'd still be interested about this:
> 
> > +static inline void clwb(volatile void *__p)
> > +{
> > +     alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> 
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?

Ah, sorry, I was still responding to your first mail.  :)  Response
copied here to save searching:

        Essentially we need one additional byte at the beginning of the
        clflush so that we can flip it into a clflushopt by changing that byte
        into a 0x66 prefix.  Two options are to either insert a 1 byte
        ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no functional
        effect with the plain clflush, but I've been told that executing a
        clflush + prefix should be faster than executing a clflush + NOP.

I agree, this is useful info - I'll add it to the patch comments for v2.

Thank you for the feedback.

- Ross

Borislav Petkov Nov. 12, 2014, 12:39 p.m. UTC | #4

On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> +static inline void clwb(volatile void *__p)
> +{
> +	alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> +			 ".byte 0x66; clflush %P0",
> +			 X86_FEATURE_CLFLUSHOPT,
> +			 ".byte 0x66; xsaveopt %P0",

Btw, I'm afraid you're going to have to spell out those new instruction
mnemonics as they aren't supported by older compilers:

  CC      drivers/gpu/drm/drm_cache.o
{standard input}: Assembler messages:
{standard input}:63: Error: no such instruction: `xsaveopt (%rax)'
{standard input}:249: Error: no such instruction: `xsaveopt (%rdi)'
{standard input}:283: Error: no such instruction: `xsaveopt -1(%rdx)'
make[1]: *** [drivers/gpu/drm/drm_cache.o] Error 1
make: *** [drivers/gpu/drm/drm_cache.o] Error 2
[boris@etch:14:35:40:k:14)->  gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

This is on a very old guest I have which has this gcc in it and the
kernel supports everything gcc >= 3.2.

H. Peter Anvin Nov. 12, 2014, 1:38 p.m. UTC | #5

No, it doesn't.  x86 requires 3.4+ at a minimum.

Borislav Petkov Nov. 12, 2014, 2:12 p.m. UTC | #6

On Wed, Nov 12, 2014 at 01:38:45PM +0000, Anvin, H Peter wrote:
> No, it doesn't.  x86 requires 3.4+ at a minimum.

The only test I see is:

#if GCC_VERSION < 30200
# error Sorry, your compiler is too old - please upgrade it.
#endif

And even if we do require 3.4, the build fails with 4.1+ so...

Ross Zwisler Nov. 13, 2014, 5:47 p.m. UTC | #7

On Wed, 2014-11-12 at 15:12 +0100, Borislav Petkov wrote:
> On Wed, Nov 12, 2014 at 01:38:45PM +0000, Anvin, H Peter wrote:
> > No, it doesn't.  x86 requires 3.4+ at a minimum.
> 
> The only test I see is:
> 
> #if GCC_VERSION < 30200
> # error Sorry, your compiler is too old - please upgrade it.
> #endif
> 
> And even if we do require 3.4, the build fails with 4.1+ so...

Ah, dang, you're right.  Okay, I'll figure out how to do this without
using xsaveopt.

Thank you for pointing this out.

- Ross

[3/6] x86: Add support for the clwb instruction

Commit Message

Comments

Patch