mbox series

[00/19] Refresh queued CET virtualization series

Message ID 20220616084643.19564-1-weijiang.yang@intel.com (mailing list archive)
Headers show
Series Refresh queued CET virtualization series | expand

Message

Yang, Weijiang June 16, 2022, 8:46 a.m. UTC
The purpose of this patch series is to refresh the queued CET KVM
patches[1] with the latest dependent CET native patches, pursuing
the result that whole series could be merged ahead of CET native
series[2] [3].

The patchset has been tested on Skylake(none-CET) and Sapphire Rapids
(CET capable) platforms, didn't find breakages to KVM basic functions
and KVM unit-tests/selftests.

----------------------------------------------------------------------
The motivations are:
1) Customers are interested in developing CET related applications,
they especially desire to set up CET development environments in
VM, but suffered by non-trivial native and KVM patch rebasing work.
If this series could be merged early, it'll save them tons of energy.

2) The kernel and KVM have evolved significantly since the queued day,
it’s necessary to fix up the KVM patches to make them adapted to the
recent mainline code.

3) CET native patch series refactored a lot per maintainers’ review
and some of the patches can be reused by KVM enabling patches.

4) PeterZ’s supervisor IBT patch series got merged in 5.18, it
requires additional KVM patch to support it in guest kernel.

----------------------------------------------------------------------
Guest CET states in KVM:
CET user mode states(MSR_IA32_U_CET,MSR_IA32_PL3_SSP) counts on
xsaves/xrstors and CET user bit of MSR_IA32_XSS to save/restor when
thread/process context switch happens. In virtulization world, after
vm-exit and before vcpu thread exits to user mode, the guest user mode
states are swapped to guest fpu area and host user mode states are loaded,
vice-versa on vm-entry. See details in kvm_load_guest_fpu() and
kvm_put_guest_fpu(). With this design, guest CET xsave-supported states
retain while the vcpu thread keeps in ring-0 vmx root mode, the
instantaneous guest states are not expected to impact host side.

Moveover, VMCS includes new fields for CET states, i.e.,GUEST_S_CET,
GUEST_SSP, GUEST_INTR_SSP_TABLE for guest and HOST_S_CET, HOST_SSP,
HOST_INTR_SSP_TABLE for host, when loading guest/host state bits set
in VMCS, the guest/host MSRs are swapped at vm-exit/entry, therefore
these guest/host CET states are strictly isolated. All CET supervisor
states map to one of the fields. With the new fields, current guest
supervisor IBT enalbing doesn't depend on xsaves/xrstors and CET
supervisor bit of MSR_IA32_XSS.

---------------------------------------------------------------------
Impact to existing kernel/KVM:
To minimize the impact to exiting kernel/KVM code, most of KVM patch
code can be bypassed during runtime.Uncheck "CONFIG_X86_KERNEL_IBT"
and "CONFIG_X86_SHADOW_STACK" in Kconfig before kernel build to get
rid of CET featrures in KVM. If both of them are not enabled, KVM
clears related feature bits as well as CET user bit in supported_xss,
this makes CET related checks stop at the first points. Since most of
the patch code runs on the none-hot path of KVM, it's expected to
introduce little impact to existing code.
On legacy platforms, CET feature is not available by nature, therefore
the execution flow just like that on CET capable platform with
features disabled at build time.

One known downside of early merge is thread fpu area size expands by 16
bytes due to enabling XFEATURE_MASK_CET_USER bit on CET capable platforms.

Although native SHSTK and IBT patch series are split off, but we don't
want to do the same for KVM patches since supervisor IBT has been merged
and customers desire full user mode features in guest.

We'd like to get your comments on the practice and patches, thanks!

Patch 1-5: Dependent CET native patches.
Patch 6-7: KVM XSS Supporting patches from kvm/queue.
Patch 8-18: Enabling patches for CET user mode.
Patch 19:  Enabling patch for supervisor IBT.

Change logs:
1. Removed XFEATURE_MASK_CET_KERNEL, MSR_IA32_PL{0,1,2}_SSP and
   MSR_IA32_INT_SSP_TAB related code since supervisor SHSTK design is
   still open.
2. Added support for guest kernel supervisor IBT.
3. Refactored some of previous helpers due to change 1) and 2).
4. Refactored control logic between XSS CET user bit and user mode SHSTK/IBT,
   make supervisor IBT support independent to XSS user bit.
5. Rebased the patch series onto kvm/queue:
   8baacf67c76c ("KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT too")

[1]: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=intel
[2]: SHSTK: https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com/
[3]: old IBT: https://lore.kernel.org/all/20210830182221.3535-1-yu-cheng.yu@intel.com/

Rick Edgecombe (1):
  x86/fpu: Add helper for modifying xstate

Sean Christopherson (2):
  KVM: x86: Report XSS as an MSR to be saved if there are supported
    features
  KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES

Yang Weijiang (12):
  KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS
  KVM: x86: Add #CP support in guest exception classification.
  KVM: VMX: Introduce CET VMCS fields and flags
  KVM: x86: Add fault checks for CR4.CET
  KVM: VMX: Emulate reads and writes to CET MSRs
  KVM: VMX: Add a synthetic MSR to allow userspace VMM to access
    GUEST_SSP
  KVM: x86: Report CET MSRs as to-be-saved if CET is supported
  KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area
  KVM: x86: Enable CET virtualization for VMX and advertise CET to
    userspace
  KVM: VMX: Pass through CET MSRs to the guest when supported
  KVM: nVMX: Enable CET support for nested VMX
  KVM: x86: Enable supervisor IBT support for guest

Yu-cheng Yu (4):
  x86/cet/shstk: Add Kconfig option for Shadow Stack
  x86/cpufeatures: Add CPU feature flags for shadow stacks
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

 arch/x86/Kconfig                         |  17 +++
 arch/x86/Kconfig.assembler               |   1 +
 arch/x86/include/asm/cpu.h               |   2 +-
 arch/x86/include/asm/cpufeatures.h       |   1 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/fpu/api.h           |   7 +-
 arch/x86/include/asm/fpu/types.h         |  14 ++-
 arch/x86/include/asm/fpu/xstate.h        |   6 +-
 arch/x86/include/asm/kvm_host.h          |   3 +-
 arch/x86/include/asm/vmx.h               |   8 ++
 arch/x86/include/uapi/asm/kvm.h          |   1 +
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/cpu/common.c             |  14 +--
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/fpu/core.c               |  19 ++++
 arch/x86/kernel/fpu/xstate.c             |  93 ++++++++--------
 arch/x86/kernel/machine_kexec_64.c       |   2 +-
 arch/x86/kvm/cpuid.c                     |  21 +++-
 arch/x86/kvm/cpuid.h                     |   5 +
 arch/x86/kvm/emulate.c                   |  11 ++
 arch/x86/kvm/vmx/capabilities.h          |   4 +
 arch/x86/kvm/vmx/nested.c                |  19 +++-
 arch/x86/kvm/vmx/vmcs12.c                |   6 +
 arch/x86/kvm/vmx/vmcs12.h                |  14 ++-
 arch/x86/kvm/vmx/vmx.c                   | 134 ++++++++++++++++++++++-
 arch/x86/kvm/x86.c                       |  95 ++++++++++++++--
 arch/x86/kvm/x86.h                       |  47 +++++++-
 27 files changed, 468 insertions(+), 86 deletions(-)


base-commit: 8baacf67c76c560fed954ac972b63e6e59a6fba0

Comments

Christoph Hellwig June 16, 2022, 9:10 a.m. UTC | #1
On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:
> The purpose of this patch series is to refresh the queued CET KVM
> patches[1] with the latest dependent CET native patches, pursuing
> the result that whole series could be merged ahead of CET native
> series[2] [3].

It might be helpful to explain what CET is here..
Peter Zijlstra June 16, 2022, 10:12 a.m. UTC | #2
On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:

> To minimize the impact to exiting kernel/KVM code, most of KVM patch
> code can be bypassed during runtime.Uncheck "CONFIG_X86_KERNEL_IBT"
> and "CONFIG_X86_SHADOW_STACK" in Kconfig before kernel build to get
> rid of CET featrures in KVM. If both of them are not enabled, KVM
> clears related feature bits as well as CET user bit in supported_xss,
> this makes CET related checks stop at the first points. Since most of
> the patch code runs on the none-hot path of KVM, it's expected to
> introduce little impact to existing code.

Do I understand this right in that a host without X86_KERNEL_IBT cannot
run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
was exactly what I did while developing the X86_KERNEL_IBT patches.

I'm thinking that if the hardware supports it, KVM should expose it,
irrespective of the host kernel using it.
Paolo Bonzini June 16, 2022, 10:21 a.m. UTC | #3
On 6/16/22 12:12, Peter Zijlstra wrote:
> Do I understand this right in that a host without X86_KERNEL_IBT cannot
> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
> was exactly what I did while developing the X86_KERNEL_IBT patches.
> 
> I'm thinking that if the hardware supports it, KVM should expose it,
> irrespective of the host kernel using it.

For IBT in particular, I think all processor state is only loaded and 
stored at vmentry/vmexit (does not need XSAVES), so it should be feasible.

Paolo
Peter Zijlstra June 16, 2022, 11:25 a.m. UTC | #4
On Thu, Jun 16, 2022 at 02:10:50AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:
> > The purpose of this patch series is to refresh the queued CET KVM
> > patches[1] with the latest dependent CET native patches, pursuing
> > the result that whole series could be merged ahead of CET native
> > series[2] [3].
> 
> It might be helpful to explain what CET is here..

Central European Time ofc :-)

I think it stands for Control-flow Enforcement Technology or something
along those lines, but this being Intel it loves to obfuscate stuff and
make it impossible to understand what's being said to increase the
buzzword bong hits.

Its a mostly pointless umbrella term for IBT (Indirect Branch Tracking)
and SHSTK (SHadow STacK), the first of which covers forward edge control
flow and the second covers backward edge control flow.
Peter Zijlstra June 16, 2022, 2:18 p.m. UTC | #5
On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
> On 6/16/22 12:12, Peter Zijlstra wrote:
> > Do I understand this right in that a host without X86_KERNEL_IBT cannot
> > run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
> > was exactly what I did while developing the X86_KERNEL_IBT patches.
> > 
> > I'm thinking that if the hardware supports it, KVM should expose it,
> > irrespective of the host kernel using it.
> 
> For IBT in particular, I think all processor state is only loaded and stored
> at vmentry/vmexit (does not need XSAVES), so it should be feasible.

That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
stuff is all XSAVE though.

But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
enumerate IBT and SS separately, but for each IBT/SS you have to
implement both U and S.

That was a problem with the first series, which only implemented support
for U_CET while advertising IBT and SS (very much including S_CET), and
still is a problem with this series because S_SS is missing while
advertised.
Yang, Weijiang June 16, 2022, 3:06 p.m. UTC | #6
On 6/16/2022 10:18 PM, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>> Do I understand this right in that a host without X86_KERNEL_IBT cannot
>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>
>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>> irrespective of the host kernel using it.
>> For IBT in particular, I think all processor state is only loaded and stored
>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
> stuff is all XSAVE though.

Thank you Peter and Paolo!

In this version, I referenced host kernel settings when expose 
X86_KERNEL_IBT to guest.

The reason would be _IF_ host, for whatever reason, disabled the IBT 
feature, exposing the

feature blindly to guest could be risking, e.g., hitting some issues 
host wants to mitigate.

The actual implementation depends on the agreement we got :-)

>
> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
> enumerate IBT and SS separately, but for each IBT/SS you have to
> implement both U and S.

Exactly, the CPUID enumeration could be a pain point for the KVM solution.

It makes {U,S}_CET feature control harder for guest.

>
> That was a problem with the first series, which only implemented support
> for U_CET while advertising IBT and SS (very much including S_CET), and
> still is a problem with this series because S_SS is missing while
> advertised.

KVM has problem advertising S_SS alone to guest when  U_CET(both SS and 
IBT) are

not available to guest. I would like to hear the voice from community on 
how to

make the features control straightforward and reasonable. Existing CPUID 
enumeration

cannot advertise {U, S}_SS and {U,S}_IBT well.

>
Paolo Bonzini June 16, 2022, 3:28 p.m. UTC | #7
On 6/16/22 16:18, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>> Do I understand this right in that a host without X86_KERNEL_IBT cannot
>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>
>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>> irrespective of the host kernel using it.
>>
>> For IBT in particular, I think all processor state is only loaded and stored
>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
> 
> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
> stuff is all XSAVE though.

What matters is whether XFEATURE_MASK_USER_SUPPORTED includes 
XFEATURE_CET_USER.  If you build with !X86_KERNEL_IBT, KVM can still 
rely on the FPU state for U_CET state, and S_CET is saved/restored via 
the VMCS independent of X86_KERNEL_IBT.

Paolo

> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
> enumerate IBT and SS separately, but for each IBT/SS you have to
> implement both U and S.
> 
> That was a problem with the first series, which only implemented support
> for U_CET while advertising IBT and SS (very much including S_CET), and
> still is a problem with this series because S_SS is missing while
> advertised.
Yang, Weijiang June 18, 2022, 6:43 a.m. UTC | #8
On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
> On 6/16/22 16:18, Peter Zijlstra wrote:
>> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>>> Do I understand this right in that a host without X86_KERNEL_IBT 
>>>> cannot
>>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>>
>>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>>> irrespective of the host kernel using it.
>>>
>>> For IBT in particular, I think all processor state is only loaded 
>>> and stored
>>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
>>
>> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
>> stuff is all XSAVE though.
>
> What matters is whether XFEATURE_MASK_USER_SUPPORTED includes 
> XFEATURE_CET_USER. 

Small correction, XFEATURE_CET_USER belongs to 
XFEATURE_MASK_SUPERVISOR_SUPPORTED, the name is misleading.


> If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state 
> for U_CET state, and S_CET is saved/restored via the VMCS independent 
> of X86_KERNEL_IBT.

A fundamental question is, should KVM always honor host CET enablement 
before expose the feature to guest? i.e., check X86_KERNEL_IBT and 
X86_SHADOW_STACK.


>
> Paolo
>
>> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
>> enumerate IBT and SS separately, but for each IBT/SS you have to
>> implement both U and S.
>>
>> That was a problem with the first series, which only implemented support
>> for U_CET while advertising IBT and SS (very much including S_CET), and
>> still is a problem with this series because S_SS is missing while
>> advertised.
>
Sean Christopherson July 14, 2022, 7:36 p.m. UTC | #9
On Sat, Jun 18, 2022, Yang, Weijiang wrote:
> 
> On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
> > If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state
> > for U_CET state, and S_CET is saved/restored via the VMCS independent of
> > X86_KERNEL_IBT.
> 
> A fundamental question is, should KVM always honor host CET enablement
> before expose the feature to guest? i.e., check X86_KERNEL_IBT and
> X86_SHADOW_STACK.

If there is a legitimate use case to NOT require host enablement and it's 100%
safe to do so (within requiring hacks to the core kernel), then there's no hard
requirement that says KVM can't virtualize a feature that's not used by the host.

It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
that KVM fully virtualizes (as opposed to emulates in software) without requiring
host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
one, but still not very many.

But, requiring host "support" is the de facto standard largely because features
tend to fall into one of three categories:

  1. The feature is always available, i.e. doesn't have a software enable/disable
     flag.

  2. The feature isn't explicitly disabled in cpufeatures / x86_capability even
     if it's not used by the host.  E.g. MONITOR/MWAIT comes to mind where the
     host can be configured to not use MWAIT for idle, but it's still reported
     as supported (and for that case, KVM does have to explicitly guard against
     X86_BUG_MONITOR).

  3. Require some amount of host support, e.g. exposing XSAVE without the kernel
     knowing how to save/restore all that state wouldn't end well.

In other words, virtualizing a feature if it's disabled in the host is allowed,
but it's rare because there just aren't many features where doing so is possible
_and_ necessary.
Yang, Weijiang July 15, 2022, 3:04 p.m. UTC | #10
On 7/15/2022 3:36 AM, Sean Christopherson wrote:
> On Sat, Jun 18, 2022, Yang, Weijiang wrote:
>> On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
>>> If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state
>>> for U_CET state, and S_CET is saved/restored via the VMCS independent of
>>> X86_KERNEL_IBT.
>> A fundamental question is, should KVM always honor host CET enablement
>> before expose the feature to guest? i.e., check X86_KERNEL_IBT and
>> X86_SHADOW_STACK.
> If there is a legitimate use case to NOT require host enablement and it's 100%
> safe to do so (within requiring hacks to the core kernel), then there's no hard
> requirement that says KVM can't virtualize a feature that's not used by the host.

Yeah, CET definitely can be virtualized without considering host usages, 
but to make things

easier, still back on some kind of host side support, e.g., xsaves.

>
> It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
> that KVM fully virtualizes (as opposed to emulates in software) without requiring
> host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
> it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
> one, but still not very many.

Speaking of MPX, is it really active in recent kernel? I can find little 
piece of code at native side,

instead, more code in KVM.

>
> But, requiring host "support" is the de facto standard largely because features
> tend to fall into one of three categories:
>
>    1. The feature is always available, i.e. doesn't have a software enable/disable
>       flag.
>
>    2. The feature isn't explicitly disabled in cpufeatures / x86_capability even
>       if it's not used by the host.  E.g. MONITOR/MWAIT comes to mind where the
>       host can be configured to not use MWAIT for idle, but it's still reported
>       as supported (and for that case, KVM does have to explicitly guard against
>       X86_BUG_MONITOR).
>
>    3. Require some amount of host support, e.g. exposing XSAVE without the kernel
>       knowing how to save/restore all that state wouldn't end well.

CET may fall into one of the three or combination of them :-), depending 
on the complexity

of the implementation.

>
> In other words, virtualizing a feature if it's disabled in the host is allowed,
> but it's rare because there just aren't many features where doing so is possible
> _and_ necessary.

I'm thinking of tweaking the patches to construct a safe yet flexible 
solution based on

a bunch of MSRs/CPUIDs/VMCS fields/XSAVES elements + a few host side 
constraints.

Thanks for the enlightenment!
Sean Christopherson July 15, 2022, 3:58 p.m. UTC | #11
On Fri, Jul 15, 2022, Yang, Weijiang wrote:
> 
> On 7/15/2022 3:36 AM, Sean Christopherson wrote:
> > It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
> > that KVM fully virtualizes (as opposed to emulates in software) without requiring
> > host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
> > it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
> > one, but still not very many.
> 
> Speaking of MPX, is it really active in recent kernel? I can find little
> piece of code at native side, instead, more code in KVM.

Nope, native MPX support was ripped out a year or two ago.  The kernel provides
just enough save+restore support so that KVM can continue to virtualize MPX.