diff mbox series

[v2,6/6] x86: limit amount of INT3 in IND_THUNK_*

Message ID 4d66eb4d-4044-8b48-d7cc-354a236e6b26@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: some assembler macro rework | expand

Commit Message

Jan Beulich Sept. 28, 2020, 12:32 p.m. UTC
There's no point having every replacement variant to also specify the
INT3 - just have it once in the base macro. When patching, NOPs will get
inserted, which are fine to speculate through (until reaching the INT3).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I also wonder whether the LFENCE in IND_THUNK_RETPOLINE couldn't be
replaced by INT3 as well. Of course the effect will be marginal, as the
size of the thunk will still be 16 bytes when including tail padding
resulting from alignment.
---
v2: New.

Comments

Roger Pau Monné Oct. 8, 2020, 4:35 p.m. UTC | #1
On Mon, Sep 28, 2020 at 02:32:24PM +0200, Jan Beulich wrote:
> There's no point having every replacement variant to also specify the
> INT3 - just have it once in the base macro. When patching, NOPs will get
> inserted, which are fine to speculate through (until reaching the INT3).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> I also wonder whether the LFENCE in IND_THUNK_RETPOLINE couldn't be
> replaced by INT3 as well. Of course the effect will be marginal, as the
> size of the thunk will still be 16 bytes when including tail padding
> resulting from alignment.

I think Andrew is the best one to have an opinion on this.

Thanks, Roger.
Andrew Cooper Oct. 13, 2020, 12:58 p.m. UTC | #2
On 28/09/2020 13:32, Jan Beulich wrote:
> There's no point having every replacement variant to also specify the
> INT3 - just have it once in the base macro. When patching, NOPs will get
> inserted, which are fine to speculate through (until reaching the INT3).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I also wonder whether the LFENCE in IND_THUNK_RETPOLINE couldn't be
> replaced by INT3 as well. Of course the effect will be marginal, as the
> size of the thunk will still be 16 bytes when including tail padding
> resulting from alignment.

There are surprising performance implications from the choice of
speculation blocker.  RSB filling in particular had a benefit (up to 6%
iirc) from unrolling the loop.

Any differences here are likely to be marginal, whereas for inline
retpoline, the code volume reduction might easily be the winning factor.

> ---
> v2: New.
>
> --- a/xen/arch/x86/indirect-thunk.S
> +++ b/xen/arch/x86/indirect-thunk.S
> @@ -11,6 +11,8 @@
>  
>  #include <asm/asm_defns.h>
>  
> +.purgem ret

This needs a comment.

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -11,6 +11,8 @@ 
 
 #include <asm/asm_defns.h>
 
+.purgem ret
+
 .macro IND_THUNK_RETPOLINE reg:req
         call 2f
 1:
@@ -24,12 +26,10 @@ 
 .macro IND_THUNK_LFENCE reg:req
         lfence
         jmp *%\reg
-        int3 /* Halt straight-line speculation */
 .endm
 
 .macro IND_THUNK_JMP reg:req
         jmp *%\reg
-        int3 /* Halt straight-line speculation */
 .endm
 
 /*
@@ -44,6 +44,8 @@  ENTRY(__x86_indirect_thunk_\reg)
         __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
         __stringify(IND_THUNK_JMP \reg),    X86_FEATURE_IND_THUNK_JMP
 
+        int3 /* Halt straight-line speculation */
+
         .size __x86_indirect_thunk_\reg, . - __x86_indirect_thunk_\reg
         .type __x86_indirect_thunk_\reg, @function
 .endm