diff mbox series

[1/3] x86/spec-ctrl: Consistently halt speculation using int3

Message ID 20220718205009.3557-2-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series XSA-407 followon fixes | expand

Commit Message

Andrew Cooper July 18, 2022, 8:50 p.m. UTC
The RSB stuffing loop and retpoline thunks date from the very beginning, when
halting speculation was a brand new field.

These days, we've largely settled on int3 for halting speculation in
non-architectural paths.  It's a single byte, and is fully serialising - a
requirement for delivering #BP if it were to execute.

Update the thunks.  Mostly for consistency across the codebase, but it does
shrink every entrypath in Xen by 6 bytes which is a marginal win.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 11 +++--------
 xen/arch/x86/indirect-thunk.S            |  6 ++----
 2 files changed, 5 insertions(+), 12 deletions(-)

Comments

Jan Beulich July 19, 2022, 6:13 a.m. UTC | #1
On 18.07.2022 22:50, Andrew Cooper wrote:
> The RSB stuffing loop and retpoline thunks date from the very beginning, when
> halting speculation was a brand new field.
> 
> These days, we've largely settled on int3 for halting speculation in
> non-architectural paths.  It's a single byte, and is fully serialising - a
> requirement for delivering #BP if it were to execute.
> 
> Update the thunks.  Mostly for consistency across the codebase, but it does
> shrink every entrypath in Xen by 6 bytes which is a marginal win.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 9eb4ad9ab71d..fab27ff5532b 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -126,9 +126,8 @@ 
  * change. Based on Google's performance numbers, the loop is unrolled to 16
  * iterations and two calls per iteration.
  *
- * The call filling the RSB needs a nonzero displacement.  A nop would do, but
- * we use "1: pause; lfence; jmp 1b" to safely contains any ret-based
- * speculation, even if the loop is speculatively executed prematurely.
+ * The call filling the RSB needs a nonzero displacement, and int3 halts
+ * speculation.
  *
  * %rsp is preserved by using an extra GPR because a) we've got plenty spare,
  * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can be
@@ -141,11 +140,7 @@ 
 
     .irp n, 1, 2                    /* Unrolled twice. */
     call .L\@_insert_rsb_entry_\n   /* Create an RSB entry. */
-
-.L\@_capture_speculation_\n:
-    pause
-    lfence
-    jmp .L\@_capture_speculation_\n /* Capture rogue speculation. */
+    int3                            /* Halt rogue speculation. */
 
 .L\@_insert_rsb_entry_\n:
     .endr
diff --git a/xen/arch/x86/indirect-thunk.S b/xen/arch/x86/indirect-thunk.S
index 7cc22da0ef93..de6aef606832 100644
--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -12,11 +12,9 @@ 
 #include <asm/asm_defns.h>
 
 .macro IND_THUNK_RETPOLINE reg:req
-        call 2f
+        call 1f
+        int3
 1:
-        lfence
-        jmp 1b
-2:
         mov %\reg, (%rsp)
         ret
 .endm