diff mbox series

[v4,1/4] x86/spec-ctrl: add logic to issue IBPB on exit to guest

Message ID 31da79de-bd6b-af95-793a-c16516992bc7@suse.com (mailing list archive)
State New, archived
Headers show
Series x86/spec-ctrl: IBPB improvements | expand

Commit Message

Jan Beulich Feb. 14, 2023, 4:10 p.m. UTC
In order to be able to defer the context switch IBPB to the last
possible point, add logic to the exit-to-guest paths to issue the
barrier there, including the "IBPB doesn't flush the RSB/RAS"
workaround. Since alternatives, for now at least, can't nest, emit JMP
to skip past both constructs where both are needed. This may be more
efficient anyway, as the sequence of NOPs is pretty long.

As with all other conditional blocks on exit-to-guest paths, no
Spectre-v1 protections are necessary as execution will imminently be
hitting a serialising event.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I have to admit that I'm not really certain about the placement of the
IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
entry".

Since we're going to run out of SCF_* bits soon and since the new flag
is meaningful only in struct cpu_info's spec_ctrl_flags, we could choose
to widen that field to 16 bits right away and then use bit 8 (or higher)
for the purpose here.
---
v4: Alter parts of the description. Re-word a comment. Rename flag and
    feature identifiers.
v3: New.

Comments

Roger Pau Monné Dec. 18, 2023, 12:11 p.m. UTC | #1
Hello,

I'm not as expert as Andrew in all the speculation stuff, but I will
try to provide some feedback.

On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
> In order to be able to defer the context switch IBPB to the last
> possible point, add logic to the exit-to-guest paths to issue the
> barrier there, including the "IBPB doesn't flush the RSB/RAS"
> workaround. Since alternatives, for now at least, can't nest, emit JMP
> to skip past both constructs where both are needed. This may be more
> efficient anyway, as the sequence of NOPs is pretty long.

Could you elaborate on the reason why deferring the IBPB to the exit
to guest path is helpful?  So far it just seem to make the logic more
complex without nay justification (at least in the changelog).

> 
> As with all other conditional blocks on exit-to-guest paths, no
> Spectre-v1 protections are necessary as execution will imminently be
> hitting a serialising event.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I have to admit that I'm not really certain about the placement of the
> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
> entry".

Maybe it would easier to just add the MSR_PRED_CMD PRED_CMD_IBPB write
to the vmcs MSR load list?

It's a one-time only AFAICT, as you would only want to do this for
context-switch AFAICT.

Thanks, Roger.
Jan Beulich Dec. 18, 2023, 1:46 p.m. UTC | #2
On 18.12.2023 13:11, Roger Pau Monné wrote:
> Hello,
> 
> I'm not as expert as Andrew in all the speculation stuff, but I will
> try to provide some feedback.
> 
> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
>> In order to be able to defer the context switch IBPB to the last
>> possible point, add logic to the exit-to-guest paths to issue the
>> barrier there, including the "IBPB doesn't flush the RSB/RAS"
>> workaround. Since alternatives, for now at least, can't nest, emit JMP
>> to skip past both constructs where both are needed. This may be more
>> efficient anyway, as the sequence of NOPs is pretty long.
> 
> Could you elaborate on the reason why deferring the IBPB to the exit
> to guest path is helpful?  So far it just seem to make the logic more
> complex without nay justification (at least in the changelog).

I've added "(to leave behind as little as possible)" ahead of the 1st
comma - is that sufficient, do you think?

>> ---
>> I have to admit that I'm not really certain about the placement of the
>> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
>> entry".
> 
> Maybe it would easier to just add the MSR_PRED_CMD PRED_CMD_IBPB write
> to the vmcs MSR load list?
> 
> It's a one-time only AFAICT, as you would only want to do this for
> context-switch AFAICT.

That would be a back and forth of adding and removing the MSR to/from
that list then, which I'm not convinced is helpful. With these special
MSRs I would further be uncertain as to their effect when used via one
of these lists.

Jan
Jan Beulich Dec. 18, 2023, 1:50 p.m. UTC | #3
On 18.12.2023 14:46, Jan Beulich wrote:
> On 18.12.2023 13:11, Roger Pau Monné wrote:
>> Hello,
>>
>> I'm not as expert as Andrew in all the speculation stuff, but I will
>> try to provide some feedback.
>>
>> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
>>> In order to be able to defer the context switch IBPB to the last
>>> possible point, add logic to the exit-to-guest paths to issue the
>>> barrier there, including the "IBPB doesn't flush the RSB/RAS"
>>> workaround. Since alternatives, for now at least, can't nest, emit JMP
>>> to skip past both constructs where both are needed. This may be more
>>> efficient anyway, as the sequence of NOPs is pretty long.
>>
>> Could you elaborate on the reason why deferring the IBPB to the exit
>> to guest path is helpful?  So far it just seem to make the logic more
>> complex without nay justification (at least in the changelog).
> 
> I've added "(to leave behind as little as possible)" ahead of the 1st
> comma - is that sufficient, do you think?

Actually, the next patch supplies better context, i.e. is more / also
about avoiding to clobber Xen's own predictions.

Jan
Jan Beulich Dec. 18, 2023, 1:54 p.m. UTC | #4
On 18.12.2023 14:46, Jan Beulich wrote:
> On 18.12.2023 13:11, Roger Pau Monné wrote:
>> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
>>> ---
>>> I have to admit that I'm not really certain about the placement of the
>>> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
>>> entry".
>>
>> Maybe it would easier to just add the MSR_PRED_CMD PRED_CMD_IBPB write
>> to the vmcs MSR load list?
>>
>> It's a one-time only AFAICT, as you would only want to do this for
>> context-switch AFAICT.
> 
> That would be a back and forth of adding and removing the MSR to/from
> that list then, which I'm not convinced is helpful. With these special
> MSRs I would further be uncertain as to their effect when used via one
> of these lists.

Plus (as only a secondary argument, but still) it would make PV and HVM
mechanisms entirely different.

Jan
Roger Pau Monné Dec. 18, 2023, 3:40 p.m. UTC | #5
On Mon, Dec 18, 2023 at 02:46:37PM +0100, Jan Beulich wrote:
> On 18.12.2023 13:11, Roger Pau Monné wrote:
> > Hello,
> > 
> > I'm not as expert as Andrew in all the speculation stuff, but I will
> > try to provide some feedback.
> > 
> > On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
> >> In order to be able to defer the context switch IBPB to the last
> >> possible point, add logic to the exit-to-guest paths to issue the
> >> barrier there, including the "IBPB doesn't flush the RSB/RAS"
> >> workaround. Since alternatives, for now at least, can't nest, emit JMP
> >> to skip past both constructs where both are needed. This may be more
> >> efficient anyway, as the sequence of NOPs is pretty long.
> > 
> > Could you elaborate on the reason why deferring the IBPB to the exit
> > to guest path is helpful?  So far it just seem to make the logic more
> > complex without nay justification (at least in the changelog).
> 
> I've added "(to leave behind as little as possible)" ahead of the 1st
> comma - is that sufficient, do you think?

Please bear with me, but I'm still uncertain.

Even if IBRS is not enabled, and such indirect branch predictions are
at the same predictor mode, how would that be of any use to a guest?
My understanding was that the attacker is the one that has to control
the indirect branch predictor contents in order to attack the
hypervisor or other guests.

> >> ---
> >> I have to admit that I'm not really certain about the placement of the
> >> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
> >> entry".
> > 
> > Maybe it would easier to just add the MSR_PRED_CMD PRED_CMD_IBPB write
> > to the vmcs MSR load list?
> > 
> > It's a one-time only AFAICT, as you would only want to do this for
> > context-switch AFAICT.
> 
> That would be a back and forth of adding and removing the MSR to/from
> that list then, which I'm not convinced is helpful. With these special
> MSRs I would further be uncertain as to their effect when used via one
> of these lists.

Hm, we do seem to already use MSR_PRED_CMD with such lists, so I would
assume they work just fine.

Anyway, was mostly a recommendation towards clarity, because I think
the return to guest context assembly is getting rather convoluted, and
it's IMO critical for it to be easy to follow.

Thanks, Roger.
Roger Pau Monné Dec. 18, 2023, 3:43 p.m. UTC | #6
On Mon, Dec 18, 2023 at 02:50:27PM +0100, Jan Beulich wrote:
> On 18.12.2023 14:46, Jan Beulich wrote:
> > On 18.12.2023 13:11, Roger Pau Monné wrote:
> >> Hello,
> >>
> >> I'm not as expert as Andrew in all the speculation stuff, but I will
> >> try to provide some feedback.
> >>
> >> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
> >>> In order to be able to defer the context switch IBPB to the last
> >>> possible point, add logic to the exit-to-guest paths to issue the
> >>> barrier there, including the "IBPB doesn't flush the RSB/RAS"
> >>> workaround. Since alternatives, for now at least, can't nest, emit JMP
> >>> to skip past both constructs where both are needed. This may be more
> >>> efficient anyway, as the sequence of NOPs is pretty long.
> >>
> >> Could you elaborate on the reason why deferring the IBPB to the exit
> >> to guest path is helpful?  So far it just seem to make the logic more
> >> complex without nay justification (at least in the changelog).
> > 
> > I've added "(to leave behind as little as possible)" ahead of the 1st
> > comma - is that sufficient, do you think?
> 
> Actually, the next patch supplies better context, i.e. is more / also
> about avoiding to clobber Xen's own predictions.

Right, that I got from next patch, but given context switch is already
a quite heavy operation, does avoiding the cleaning of the branch
predictor make that much of a difference?

IMO it needs good justification given it's a change that makes the
logic harder to follow, so if it turns out there's no difference I
would rather leave the IBPB at context switch.

Thanks, Roger.
Jan Beulich Dec. 18, 2023, 4 p.m. UTC | #7
On 18.12.2023 16:40, Roger Pau Monné wrote:
> On Mon, Dec 18, 2023 at 02:46:37PM +0100, Jan Beulich wrote:
>> On 18.12.2023 13:11, Roger Pau Monné wrote:
>>> Hello,
>>>
>>> I'm not as expert as Andrew in all the speculation stuff, but I will
>>> try to provide some feedback.
>>>
>>> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
>>>> In order to be able to defer the context switch IBPB to the last
>>>> possible point, add logic to the exit-to-guest paths to issue the
>>>> barrier there, including the "IBPB doesn't flush the RSB/RAS"
>>>> workaround. Since alternatives, for now at least, can't nest, emit JMP
>>>> to skip past both constructs where both are needed. This may be more
>>>> efficient anyway, as the sequence of NOPs is pretty long.
>>>
>>> Could you elaborate on the reason why deferring the IBPB to the exit
>>> to guest path is helpful?  So far it just seem to make the logic more
>>> complex without nay justification (at least in the changelog).
>>
>> I've added "(to leave behind as little as possible)" ahead of the 1st
>> comma - is that sufficient, do you think?
> 
> Please bear with me, but I'm still uncertain.
> 
> Even if IBRS is not enabled, and such indirect branch predictions are
> at the same predictor mode, how would that be of any use to a guest?
> My understanding was that the attacker is the one that has to control
> the indirect branch predictor contents in order to attack the
> hypervisor or other guests.

Right; see my later reply.

>>>> ---
>>>> I have to admit that I'm not really certain about the placement of the
>>>> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
>>>> entry".
>>>
>>> Maybe it would easier to just add the MSR_PRED_CMD PRED_CMD_IBPB write
>>> to the vmcs MSR load list?
>>>
>>> It's a one-time only AFAICT, as you would only want to do this for
>>> context-switch AFAICT.
>>
>> That would be a back and forth of adding and removing the MSR to/from
>> that list then, which I'm not convinced is helpful. With these special
>> MSRs I would further be uncertain as to their effect when used via one
>> of these lists.
> 
> Hm, we do seem to already use MSR_PRED_CMD with such lists, so I would
> assume they work just fine.

Ah, yes, I forgot about that. Still it would be a back and forth if we
wanted the MSR on the list only after a context switch, but not anymore
for later VM entry. Also iirc these lists are VMX-only?

Jan

> Anyway, was mostly a recommendation towards clarity, because I think
> the return to guest context assembly is getting rather convoluted, and
> it's IMO critical for it to be easy to follow.
> 
> Thanks, Roger.
Jan Beulich Dec. 18, 2023, 4:02 p.m. UTC | #8
On 18.12.2023 16:43, Roger Pau Monné wrote:
> On Mon, Dec 18, 2023 at 02:50:27PM +0100, Jan Beulich wrote:
>> On 18.12.2023 14:46, Jan Beulich wrote:
>>> On 18.12.2023 13:11, Roger Pau Monné wrote:
>>>> Hello,
>>>>
>>>> I'm not as expert as Andrew in all the speculation stuff, but I will
>>>> try to provide some feedback.
>>>>
>>>> On Tue, Feb 14, 2023 at 05:10:42PM +0100, Jan Beulich wrote:
>>>>> In order to be able to defer the context switch IBPB to the last
>>>>> possible point, add logic to the exit-to-guest paths to issue the
>>>>> barrier there, including the "IBPB doesn't flush the RSB/RAS"
>>>>> workaround. Since alternatives, for now at least, can't nest, emit JMP
>>>>> to skip past both constructs where both are needed. This may be more
>>>>> efficient anyway, as the sequence of NOPs is pretty long.
>>>>
>>>> Could you elaborate on the reason why deferring the IBPB to the exit
>>>> to guest path is helpful?  So far it just seem to make the logic more
>>>> complex without nay justification (at least in the changelog).
>>>
>>> I've added "(to leave behind as little as possible)" ahead of the 1st
>>> comma - is that sufficient, do you think?
>>
>> Actually, the next patch supplies better context, i.e. is more / also
>> about avoiding to clobber Xen's own predictions.
> 
> Right, that I got from next patch, but given context switch is already
> a quite heavy operation, does avoiding the cleaning of the branch
> predictor make that much of a difference?
> 
> IMO it needs good justification given it's a change that makes the
> logic harder to follow, so if it turns out there's no difference I
> would rather leave the IBPB at context switch.

As per another reply, I guess we want to discuss this with Andrew, so
maybe a good thing to try to remember to put up on the next x86 meeting
we're going to have.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -75,6 +75,12 @@  __UNLIKELY_END(nsvm_hap)
         .endm
         ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
 
+        ALTERNATIVE "jmp 2f", __stringify(DO_SPEC_CTRL_EXIT_IBPB disp=(2f-1f)), \
+                    X86_FEATURE_NEW_PRED_CTXT_HVM
+1:
+        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_BUG_IBPB_NO_RET
+2:
+
         pop  %r15
         pop  %r14
         pop  %r13
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -86,7 +86,8 @@  UNLIKELY_END(realmode)
         jz .Lvmx_vmentry_restart
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        /* SPEC_CTRL_EXIT_TO_VMX   Req: %rsp=regs/cpuinfo              Clob:    */
+        /* SPEC_CTRL_EXIT_TO_VMX   Req: %rsp=regs/cpuinfo              Clob: acd */
+        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_IBPB, X86_FEATURE_NEW_PRED_CTXT_HVM
         DO_SPEC_CTRL_COND_VERW
 
         mov  VCPU_hvm_guest_cr2(%rbx),%rax
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -39,8 +39,10 @@  XEN_CPUFEATURE(XEN_LBR,           X86_SY
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
 XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 XEN_CPUFEATURE(XEN_IBT,           X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
-XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
-XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen when entered from PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen when entered from HVM */
+XEN_CPUFEATURE(NEW_PRED_CTXT_PV,  X86_SYNTH(30)) /* issue prediction barrier when exiting to PV */
+XEN_CPUFEATURE(NEW_PRED_CTXT_HVM, X86_SYNTH(31)) /* issue prediction barrier when exiting to HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
--- a/xen/arch/x86/include/asm/current.h
+++ b/xen/arch/x86/include/asm/current.h
@@ -55,9 +55,13 @@  struct cpu_info {
 
     /* See asm/spec_ctrl_asm.h for usage. */
     unsigned int shadow_spec_ctrl;
+    /*
+     * spec_ctrl_flags is accessed as a 32-bit entity in certain cases. Place
+     * it accordingly.
+     */
+    uint8_t      spec_ctrl_flags;
     uint8_t      xen_spec_ctrl;
     uint8_t      last_spec_ctrl;
-    uint8_t      spec_ctrl_flags;
 
     /*
      * The following field controls copying of the L4 page table of 64-bit
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -36,6 +36,8 @@ 
 #define SCF_verw       (1 << 3)
 #define SCF_ist_ibpb   (1 << 4)
 #define SCF_entry_ibpb (1 << 5)
+#define SCF_new_pred_ctxt_bit 6
+#define SCF_new_pred_ctxt (1 << SCF_new_pred_ctxt_bit)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -117,6 +117,27 @@ 
 .L\@_done:
 .endm
 
+.macro DO_SPEC_CTRL_EXIT_IBPB disp=0
+/*
+ * Requires %rsp=regs
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_new_pred_ctxt is active.  The macro
+ * invocation may be followed by X86_BUG_IBPB_NO_RET workaround code.  The
+ * "disp" argument is to allow invocation sites to pass in the extra amount
+ * of code which needs skipping in case no action is necessary.
+ *
+ * The flag is a "one-shot" indicator, so it is being cleared at the same time.
+ */
+    btrl    $SCF_new_pred_ctxt_bit, CPUINFO_spec_ctrl_flags(%rsp)
+    jnc     .L\@_skip + (\disp)
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    xor     %edx, %edx
+    wrmsr
+.L\@_skip:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -272,6 +293,14 @@ 
 #define SPEC_CTRL_EXIT_TO_PV                                            \
     ALTERNATIVE "",                                                     \
         DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
+    ALTERNATIVE __stringify(jmp PASTE(.Lscexitpv_done, __LINE__)),      \
+        __stringify(DO_SPEC_CTRL_EXIT_IBPB                              \
+                    disp=(PASTE(.Lscexitpv_done, __LINE__) -            \
+                          PASTE(.Lscexitpv_rsb, __LINE__))),            \
+        X86_FEATURE_NEW_PRED_CTXT_PV;                                   \
+PASTE(.Lscexitpv_rsb, __LINE__):                                        \
+    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_BUG_IBPB_NO_RET;              \
+PASTE(.Lscexitpv_done, __LINE__):                                       \
     DO_SPEC_CTRL_COND_VERW
 
 /*
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -8,6 +8,7 @@ 
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <asm/desc.h>
+#include <xen/lib.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
 
@@ -156,7 +157,7 @@  ENTRY(compat_restore_all_guest)
         mov VCPUMSR_spec_ctrl_raw(%rax), %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_PV    /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        SPEC_CTRL_EXIT_TO_PV    /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: acd */
 
         RESTORE_ALL adj=8 compat=1
 .Lft0:  iretq
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -9,6 +9,7 @@ 
 #include <asm/asm_defns.h>
 #include <asm/page.h>
 #include <asm/processor.h>
+#include <xen/lib.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
 
@@ -187,7 +188,7 @@  restore_all_guest:
         mov   %r15d, %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_PV    /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        SPEC_CTRL_EXIT_TO_PV    /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: acd */
 
         RESTORE_ALL
         testw $TRAP_syscall,4(%rsp)