mbox series

[0/3] x86: Initial pieces for guest CET support

Message ID 20210426175421.30497-1-andrew.cooper3@citrix.com (mailing list archive)
Headers show
Series x86: Initial pieces for guest CET support | expand

Message

Andrew Cooper April 26, 2021, 5:54 p.m. UTC
Some initial pieces for guest support.  Everything will currently malfunction
for VMs which explicitly opt in to CET_SS/IBT.

Still TODO as a minimum:
 * Teach the pagewalk logic about shadow stack accesses and errors.
 * Emulator support for the new instructions.  WRUSS is an irritating corner
   case, requiring a change to how we express pagewalk inputs, as
   user/supervisor is no longer dependent on CPL.
 * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
   the S_CET has broken sematics - it ends up as a mix of host and guest
   state, and isn't safe to XRSTOR without editing what the CPU wrote out.

The above ought to suffice for getting some XTF testing in place.  For general
guest support:
 * In-guest XSAVES support.  Windows is the only OS to support CET at the time
   of writing, and it cross-checks for XSAVES.  Linux expected to perform the
   same cross-check in due course.

Stretch features (not for initial support):
  * Adding EPT/NPT Supervisor Shadow Stack protections into mem_access, so
    introspection can block aliasing attacks.

Andrew Cooper (3):
  x86/hvm: Introduce experimental guest CET support
  x86/svm: Enumeration for CET
  x86/VT-x: Enumeration for CET

 xen/arch/x86/hvm/hvm.c                      | 18 ++++++++++++++++--
 xen/arch/x86/hvm/svm/svm.c                  |  1 +
 xen/arch/x86/hvm/svm/svmdebug.c             |  2 ++
 xen/arch/x86/hvm/vmx/vmcs.c                 |  6 ++++++
 xen/include/asm-x86/hvm/svm/svm.h           |  2 ++
 xen/include/asm-x86/hvm/svm/vmcb.h          | 10 ++++++++--
 xen/include/asm-x86/hvm/vmx/vmcs.h          | 11 ++++++++++-
 xen/include/public/arch-x86/cpufeatureset.h |  4 ++--
 8 files changed, 47 insertions(+), 7 deletions(-)

Comments

Jan Beulich April 27, 2021, 6:46 a.m. UTC | #1
On 26.04.2021 19:54, Andrew Cooper wrote:
> Some initial pieces for guest support.  Everything will currently malfunction
> for VMs which explicitly opt in to CET_SS/IBT.
> 
> Still TODO as a minimum:
>  * Teach the pagewalk logic about shadow stack accesses and errors.
>  * Emulator support for the new instructions.  WRUSS is an irritating corner
>    case, requiring a change to how we express pagewalk inputs, as
>    user/supervisor is no longer dependent on CPL.

I can put this on my todo list, considering that I'm the one to play
with the emulator the most. Just let me know if you would prefer to
do it yourself. (Otherwise my next item there after AMX is now
complete would have been KeyLocker insns.)

>  * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
>    the S_CET has broken sematics - it ends up as a mix of host and guest
>    state, and isn't safe to XRSTOR without editing what the CPU wrote out.

Hmm, I wasn't aware of quirks here - would you mind going into more
detail?

> The above ought to suffice for getting some XTF testing in place.  For general
> guest support:
>  * In-guest XSAVES support.  Windows is the only OS to support CET at the time
>    of writing, and it cross-checks for XSAVES.  Linux expected to perform the
>    same cross-check in due course.

What specifically do you mean here? The XSAVES CPU feature is marked
'S', so ought to be visible to Windows. Hence I guess you mean support
for the respective XSS bits?

Jan
Andrew Cooper April 27, 2021, 10:13 a.m. UTC | #2
On 27/04/2021 07:46, Jan Beulich wrote:
> On 26.04.2021 19:54, Andrew Cooper wrote:
>> Some initial pieces for guest support.  Everything will currently malfunction
>> for VMs which explicitly opt in to CET_SS/IBT.
>>
>> Still TODO as a minimum:
>>  * Teach the pagewalk logic about shadow stack accesses and errors.
>>  * Emulator support for the new instructions.  WRUSS is an irritating corner
>>    case, requiring a change to how we express pagewalk inputs, as
>>    user/supervisor is no longer dependent on CPL.
> I can put this on my todo list, considering that I'm the one to play
> with the emulator the most. Just let me know if you would prefer to
> do it yourself. (Otherwise my next item there after AMX is now
> complete would have been KeyLocker insns.)

My plan was actually to start with WRSS and do the pagewalk side of
things, including refreshing my comprehensive pagewalk XTF test.  This
will block work on all the other shadow stack instructions.

There are 3 emulator complexities for shadow stack instructions.  SSP
itself as a register, WRUSS no longer being CPL-based for
user/supervisor, and the fact that RSTORSSP in particular uses an atomic
block which microcode can express, but can't be encoded at an ISA
level.  I've got no idea what to do about this last problem, because we
can't map the two guest frames and re-issue the instruction - the
aliasing check on the tokens forces us to map the two frames in their
correct linear addresses.


IBT is quite different.  There are only two new instructions,
ENDBR{32,64} but tweaks to lots of other instructions (all indirect
control transfers), as well as 0x3e for the notrack prefix, and the
legacy code page bitmap.  The tracker bits themselves are somewhat
irritating to access in the {U,S}_CET MSRs.

Furthermore, I don't have access to hardware with CET-IBT (the SDP seems
to have let out some blue smoke), so any support here would be speculative.


Other bits I'd forgotten from the first set of bullet points.
* {RD,WR}MSR support for all the MSRs, including finally breaking into
the non-trivial work of context-dependent state.
* Changes to Task Switch.

>>  * Context switching of U/S_CET state.  Recommended way is with XSAVES, except
>>    the S_CET has broken sematics - it ends up as a mix of host and guest
>>    state, and isn't safe to XRSTOR without editing what the CPU wrote out.
> Hmm, I wasn't aware of quirks here - would you mind going into more
> detail?

I'll double check my notes.

>
>> The above ought to suffice for getting some XTF testing in place.  For general
>> guest support:
>>  * In-guest XSAVES support.  Windows is the only OS to support CET at the time
>>    of writing, and it cross-checks for XSAVES.  Linux expected to perform the
>>    same cross-check in due course.
> What specifically do you mean here? The XSAVES CPU feature is marked
> 'S', so ought to be visible to Windows. Hence I guess you mean support
> for the respective XSS bits?

We really shouldn't have exposed XSAVES without XSS bits, but yes - what
matters is that the {U,S}_CET XSTATE components are usable in the guest.

~Andrew
Andrew Cooper April 28, 2021, 12:25 p.m. UTC | #3
On 27/04/2021 11:13, Andrew Cooper wrote:
> There are 3 emulator complexities for shadow stack instructions.  SSP
> itself as a register, WRUSS no longer being CPL-based for
> user/supervisor, and the fact that RSTORSSP in particular uses an atomic
> block which microcode can express, but can't be encoded at an ISA
> level.  I've got no idea what to do about this last problem, because we
> can't map the two guest frames and re-issue the instruction - the
> aliasing check on the tokens forces us to map the two frames in their
> correct linear addresses.

Actually, RSTORSSP isn't too difficult.  I'd mis-read the pseudocode.

The atomic block is a check&edit of the token on the remote stack (not
both stacks, as I'd mistakenly thought).  The purpose is to prevent two
concurrent RSTORSSP's moving two threads onto the same shadow stack.

Without microcode superpowers, the best we can do this with a read,
check, cmpxchg() loop.

The common case will be no conflict, as stack switching will be well
formed (outside of debugging).  Any conflict here from real code is
going to yield #GP/#CP on one of the threads participating, so in the
case of a conflict in the emulator, a likely consequence of the 2nd
iteration is going to be a hard failure.

That said, malicious cases within the guest, or from foreign mappings,
can cause the cmpxchg() loop to take an unbounded time, so after 3
retries or so, we need to escalate to vcpu_pause_all_except_self(), and
or the ARM stop_machine() big hammer.

I'm tempted to just throw #GP back after 3 retries.  Its potentially
non-architectural behaviour, but won't occur in non-malicious
circumstances, and all fallback mechanisms have system-wide implications
that we oughtn't to be bowing to in a malicious circumstance.

~Andrew
Jan Beulich April 28, 2021, 1:03 p.m. UTC | #4
On 28.04.2021 14:25, Andrew Cooper wrote:
> On 27/04/2021 11:13, Andrew Cooper wrote:
>> There are 3 emulator complexities for shadow stack instructions.  SSP
>> itself as a register, WRUSS no longer being CPL-based for
>> user/supervisor, and the fact that RSTORSSP in particular uses an atomic
>> block which microcode can express, but can't be encoded at an ISA
>> level.  I've got no idea what to do about this last problem, because we
>> can't map the two guest frames and re-issue the instruction - the
>> aliasing check on the tokens forces us to map the two frames in their
>> correct linear addresses.
> 
> Actually, RSTORSSP isn't too difficult.  I'd mis-read the pseudocode.
> 
> The atomic block is a check&edit of the token on the remote stack (not
> both stacks, as I'd mistakenly thought).  The purpose is to prevent two
> concurrent RSTORSSP's moving two threads onto the same shadow stack.
> 
> Without microcode superpowers, the best we can do this with a read,
> check, cmpxchg() loop.
> 
> The common case will be no conflict, as stack switching will be well
> formed (outside of debugging).  Any conflict here from real code is
> going to yield #GP/#CP on one of the threads participating, so in the
> case of a conflict in the emulator, a likely consequence of the 2nd
> iteration is going to be a hard failure.
> 
> That said, malicious cases within the guest, or from foreign mappings,
> can cause the cmpxchg() loop to take an unbounded time, so after 3
> retries or so, we need to escalate to vcpu_pause_all_except_self(), and
> or the ARM stop_machine() big hammer.
> 
> I'm tempted to just throw #GP back after 3 retries.  Its potentially
> non-architectural behaviour, but won't occur in non-malicious
> circumstances, and all fallback mechanisms have system-wide implications
> that we oughtn't to be bowing to in a malicious circumstance.

I agree.

Jan