mbox series

[v3,0/3] amd/msr: implement MSR_VIRT_SPEC_CTRL for HVM guests

Message ID 20220331092717.9023-1-roger.pau@citrix.com (mailing list archive)
Headers show
Series amd/msr: implement MSR_VIRT_SPEC_CTRL for HVM guests | expand

Message

Roger Pau Monné March 31, 2022, 9:27 a.m. UTC
Hello,

The following series implements support for MSR_VIRT_SPEC_CTRL
(VIRT_SSBD) on different AMD CPU families.

Note that the support is added backwards, starting with the newer CPUs
that support MSR_SPEC_CTRL and moving to the older ones either using
MSR_VIRT_SPEC_CTRL or the SSBD bit in LS_CFG.

Xen is still free to use it's own SSBD setting, as the selection is
context switched on vm{entry,exit}.

On Zen2 and later, SPEC_CTRL.SSBD exists and should be used in
preference to VIRT_SPEC_CTRL.SSBD.  However, for migration
compatibility, Xen offers VIRT_SSBD to guests (in the max CPUID policy,
not default) implemented in terms of SPEC_CTRL.SSBD.

On Fam15h thru Zen1, Xen exposes VIRT_SSBD to guests by default to
abstract away the model and/or hypervisor specific differences in
MSR_LS_CFG/MSR_VIRT_SPEC_CTRL.

Note that if the hardware itself does offer VIRT_SSBD (ie: very likely
when running virtualized on < Zen2 hardware) and not AMD_SSBD Xen will
allow untrapped access to MSR_VIRT_SPEC_CTRL for HVM guests.

So the implementation of VIRT_SSBD exposed to HVM guests will use one of
the following underlying mechanisms, in the preference order listed
below:

 * SPEC_CTRL.SSBD. (patch 1)
 * VIRT_SPEC_CTRL.SSBD (untrapped). (patch 2).
 * Non-architectural way using LS_CFG. (patch 3)

This has survived a XenRT basic set of tests on AMD machines.

Roger Pau Monne (3):
  amd/msr: implement VIRT_SPEC_CTRL for HVM guests on top of SPEC_CTRL
  amd/msr: allow passthrough of VIRT_SPEC_CTRL for HVM guests
  amd/msr: implement VIRT_SPEC_CTRL for HVM guests using legacy SSBD

 xen/arch/x86/cpu/amd.c                      | 116 +++++++++++++++++---
 xen/arch/x86/cpuid.c                        |  28 +++++
 xen/arch/x86/hvm/hvm.c                      |   1 +
 xen/arch/x86/hvm/svm/entry.S                |   6 +
 xen/arch/x86/hvm/svm/svm.c                  |  50 +++++++++
 xen/arch/x86/include/asm/amd.h              |   4 +
 xen/arch/x86/include/asm/cpufeatures.h      |   1 +
 xen/arch/x86/include/asm/msr.h              |  14 +++
 xen/arch/x86/msr.c                          |  26 +++++
 xen/arch/x86/spec_ctrl.c                    |  12 +-
 xen/include/public/arch-x86/cpufeatureset.h |   2 +-
 11 files changed, 241 insertions(+), 19 deletions(-)

Comments

Andrew Cooper April 22, 2022, 6:49 p.m. UTC | #1
On 31/03/2022 10:27, Roger Pau Monne wrote:
> Hello,
>
> The following series implements support for MSR_VIRT_SPEC_CTRL
> (VIRT_SSBD) on different AMD CPU families.
>
> Note that the support is added backwards, starting with the newer CPUs
> that support MSR_SPEC_CTRL and moving to the older ones either using
> MSR_VIRT_SPEC_CTRL or the SSBD bit in LS_CFG.
>
> Xen is still free to use it's own SSBD setting, as the selection is
> context switched on vm{entry,exit}.
>
> On Zen2 and later, SPEC_CTRL.SSBD exists and should be used in
> preference to VIRT_SPEC_CTRL.SSBD.  However, for migration
> compatibility, Xen offers VIRT_SSBD to guests (in the max CPUID policy,
> not default) implemented in terms of SPEC_CTRL.SSBD.
>
> On Fam15h thru Zen1, Xen exposes VIRT_SSBD to guests by default to
> abstract away the model and/or hypervisor specific differences in
> MSR_LS_CFG/MSR_VIRT_SPEC_CTRL.
>
> Note that if the hardware itself does offer VIRT_SSBD (ie: very likely
> when running virtualized on < Zen2 hardware) and not AMD_SSBD Xen will
> allow untrapped access to MSR_VIRT_SPEC_CTRL for HVM guests.
>
> So the implementation of VIRT_SSBD exposed to HVM guests will use one of
> the following underlying mechanisms, in the preference order listed
> below:
>
>  * SPEC_CTRL.SSBD. (patch 1)
>  * VIRT_SPEC_CTRL.SSBD (untrapped). (patch 2).
>  * Non-architectural way using LS_CFG. (patch 3)
>
> This has survived a XenRT basic set of tests on AMD machines.

Sorry for the mixed feedback, but some is applicable across multiple
patches.

First, it is important to know why MSR_VIRT_SPEC_CTRL exists, because
that informs what is, and is not, sensible to do with it.

It exists to be a FMS-invariant abstraction of the DE_CFG interface,
emulated by the hypervisor.  At the time, we experimented with emulating
MSR_SPEC_CTRL directly, but the results were unusable slow (legacy IBRS
causing a vmexit on every syscall/interrupt entry&exit) so
MSR_VIRT_SPEC_CTRL is also an explicit statement that it is an expensive
operation and shouldn't be used frequently.

In practice, this means "only for very very important processes, and not
to be used more frequently than process context switch".  Also, there is
no hardware which implements MSR_VIRT_SPEC_CTRL, nor will there be.

Patch 2 has added an extra two vmexits around each vmexit, in an effort
to let L2 vmexit to L0 rather than L1 for what is likely to be 0 times
in an L1 timeslice.  It's not a credible optimisation, for something
which isn't a production usecase.  Yes - nested virt does exist, and is
useful for dev, but noone runs a fully fat server virt hypervisor at L1
in production if they care in the slightest about performance.  Either
way, patch 2 is premature optimisation with a massive complexity cost.

Furthermore, writes to LS_CFG are also incredibly expensive, even if
you're not changing any bits.  The AMD recommended algorithm
specifically avoids rewriting it with the same value as before.

Another thing is that Xen shouldn't touch LS_CFG like this if there is
any hint of a hypervisor on the system.  If there is a hypervisor and it
doesn't offer VIRT_SPEC_CTRL, trying to play with LS_CFG isn't going to
make the situation any better.

As to the CPUID bit handling, on consideration of the whole series, it
wants to be "!" only.  ! is there to indicate "something complicated is
going on with this bit", and life is too short to try and get the
derivation logic right with both implicit and explicit conditions. 
Leave it without an s/S (so no auto propagation from the host policy),
and set it in the max policy for LS_CFG || VIRT_SPEC_CTRL || SPEC_CTRL,
and set it in the default policy for LS_CFG || VIRT_SPEC_CTRL, which
will be far clearer to follow.

For `struct ssbd_core`, the name isn't great.  It's more
ssbd_ls_cfg/state.  Also, each array element wants 64 byte alignment,
because that's the only way to avoid atomic cacheline pingpong from the
spinlocks.  Also, the accessors need to be raw, because GIF=0 context is
weird and working around checklock with irqsave variants is not a clever
move.  It is not safe to printk()/bug/etc from GIF=0 context, so logic
needs to be kept to an absolute bare minimum.

~Andrew
Roger Pau Monné April 25, 2022, 11:28 a.m. UTC | #2
On Fri, Apr 22, 2022 at 06:49:57PM +0000, Andrew Cooper wrote:
> On 31/03/2022 10:27, Roger Pau Monne wrote:
> > Hello,
> >
> > The following series implements support for MSR_VIRT_SPEC_CTRL
> > (VIRT_SSBD) on different AMD CPU families.
> >
> > Note that the support is added backwards, starting with the newer CPUs
> > that support MSR_SPEC_CTRL and moving to the older ones either using
> > MSR_VIRT_SPEC_CTRL or the SSBD bit in LS_CFG.
> >
> > Xen is still free to use it's own SSBD setting, as the selection is
> > context switched on vm{entry,exit}.
> >
> > On Zen2 and later, SPEC_CTRL.SSBD exists and should be used in
> > preference to VIRT_SPEC_CTRL.SSBD.  However, for migration
> > compatibility, Xen offers VIRT_SSBD to guests (in the max CPUID policy,
> > not default) implemented in terms of SPEC_CTRL.SSBD.
> >
> > On Fam15h thru Zen1, Xen exposes VIRT_SSBD to guests by default to
> > abstract away the model and/or hypervisor specific differences in
> > MSR_LS_CFG/MSR_VIRT_SPEC_CTRL.
> >
> > Note that if the hardware itself does offer VIRT_SSBD (ie: very likely
> > when running virtualized on < Zen2 hardware) and not AMD_SSBD Xen will
> > allow untrapped access to MSR_VIRT_SPEC_CTRL for HVM guests.
> >
> > So the implementation of VIRT_SSBD exposed to HVM guests will use one of
> > the following underlying mechanisms, in the preference order listed
> > below:
> >
> >  * SPEC_CTRL.SSBD. (patch 1)
> >  * VIRT_SPEC_CTRL.SSBD (untrapped). (patch 2).
> >  * Non-architectural way using LS_CFG. (patch 3)
> >
> > This has survived a XenRT basic set of tests on AMD machines.
> 
> Sorry for the mixed feedback, but some is applicable across multiple
> patches.
> 
> First, it is important to know why MSR_VIRT_SPEC_CTRL exists, because
> that informs what is, and is not, sensible to do with it.
> 
> It exists to be a FMS-invariant abstraction of the DE_CFG interface,
> emulated by the hypervisor.  At the time, we experimented with emulating
> MSR_SPEC_CTRL directly, but the results were unusable slow (legacy IBRS
> causing a vmexit on every syscall/interrupt entry&exit) so
> MSR_VIRT_SPEC_CTRL is also an explicit statement that it is an expensive
> operation and shouldn't be used frequently.
> 
> In practice, this means "only for very very important processes, and not
> to be used more frequently than process context switch".  Also, there is
> no hardware which implements MSR_VIRT_SPEC_CTRL, nor will there be.
> 
> Patch 2 has added an extra two vmexits around each vmexit, in an effort

No, it adds just one extra unconditional vmexit, the wrmsr is
conditional to the value in the guest and Xen differing, and it's not
unconditionally executed.

> to let L2 vmexit to L0 rather than L1 for what is likely to be 0 times
> in an L1 timeslice.  It's not a credible optimisation, for something
> which isn't a production usecase.  Yes - nested virt does exist, and is
> useful for dev, but noone runs a fully fat server virt hypervisor at L1
> in production if they care in the slightest about performance.  Either
> way, patch 2 is premature optimisation with a massive complexity cost.

OK, so your recommendation is to trap writes to VIRT_SPEC_CTRL so the
unconditional rdmsr on vmexit is avoided at the cost of taking a
vmexit on all writes to VIRT_SPEC_CTRL, that's indeed fine.

> 
> Furthermore, writes to LS_CFG are also incredibly expensive, even if
> you're not changing any bits.  The AMD recommended algorithm
> specifically avoids rewriting it with the same value as before.

Writes to LS_CFG will only happen if the guest and Xen selection of
SSBD differ, as the guest value is cached.

> Another thing is that Xen shouldn't touch LS_CFG like this if there is
> any hint of a hypervisor on the system.  If there is a hypervisor and it
> doesn't offer VIRT_SPEC_CTRL, trying to play with LS_CFG isn't going to
> make the situation any better.

I would assume that no hypervisor will offer LS_CFG support for
setting SSBD if the guest shouldn't use it, and hence set_legacy_ssbd
will already return false.

> As to the CPUID bit handling, on consideration of the whole series, it
> wants to be "!" only.  ! is there to indicate "something complicated is
> going on with this bit", and life is too short to try and get the
> derivation logic right with both implicit and explicit conditions. 
> Leave it without an s/S (so no auto propagation from the host policy),
> and set it in the max policy for LS_CFG || VIRT_SPEC_CTRL || SPEC_CTRL,
> and set it in the default policy for LS_CFG || VIRT_SPEC_CTRL, which
> will be far clearer to follow.

OK, as both Jan and you agree that using just '!' is fine I can work
with that.

> For `struct ssbd_core`, the name isn't great.  It's more
> ssbd_ls_cfg/state.  Also, each array element wants 64 byte alignment,
> because that's the only way to avoid atomic cacheline pingpong from the
> spinlocks.  Also, the accessors need to be raw, because GIF=0 context is
> weird and working around checklock with irqsave variants is not a clever
> move.  It is not safe to printk()/bug/etc from GIF=0 context, so logic
> needs to be kept to an absolute bare minimum.

Sorry I'm a bit dense today, but I don't seem to be able to find any
raw accessors for our spinlock implementation. _spin_lock_cb will
unconditionally call check_lock and there doesn't seem to be a way to
bypass the checks if lock debugging is enabled.  Am I missing
something?

Thanks, Roger.