diff mbox series

[v2,for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior

Message ID 20210304144755.35891-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series [v2,for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior | expand

Commit Message

Roger Pau Monné March 4, 2021, 2:47 p.m. UTC
Introduce an option to allow selecting a less strict behaviour for
rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
Xen result in the injection of a #GP to the guest. This is a behavior
change since previously a #GP was only injected if accessing the MSR
on the real hardware will also trigger a #GP.

This commit attempts to offer a fallback option similar to the
previous behavior. Note however that the value of the underlying MSR
is never leaked to the guest, as the newly introduced option only
changes whether a #GP is injected or not.

Long term the plan is to properly handle all the MSRs, so the option
introduced here should be considered a temporary resort for OSes that
don't work properly with the new MSR policy. Any OS that requires this
option to be enabled should be reported to
xen-devel@lists.xenproject.org.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Only apply the option to HVM guests.
 - Only apply the special handling to MSR reads.
 - Sanitize the newly introduced flags field.
 - Print a warning message when the option is used.
---
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Boris, could you please test with Solaris to see if this fixes the
issue?

I wonder whether we need to to enable this option by default for
guests being migrated from previous Xen versions? Maybe that's not
required as the option is helpful mostly for early boot I would
assume, afterwards an OS should already have the #GP handler setup
when accessing MSRs.

From a release PoV the biggest risk would be breaking some of the
existing MSR functionality. I think that's a necessary risk in order
to offer such fallback option, or else we might discover after the
release that guests that worked on Xen 4.14 don't work anymore in Xen
4.15.
---
 docs/man/xl.cfg.5.pod.in          | 17 +++++++++++++++++
 tools/include/libxl.h             |  8 ++++++++
 tools/libs/light/libxl_types.idl  |  2 ++
 tools/libs/light/libxl_x86.c      |  4 ++++
 tools/xl/xl_parse.c               |  7 +++++++
 xen/arch/x86/domain.c             | 10 ++++++++++
 xen/arch/x86/hvm/svm/svm.c        |  6 ++++++
 xen/arch/x86/hvm/vmx/vmx.c        |  7 +++++++
 xen/include/asm-x86/hvm/domain.h  |  3 +++
 xen/include/public/arch-x86/xen.h |  8 ++++++++
 10 files changed, 72 insertions(+)

Comments

Ian Jackson March 4, 2021, 2:59 p.m. UTC | #1
Roger Pau Monne writes ("[PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> Introduce an option to allow selecting a less strict behaviour for
> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> Xen result in the injection of a #GP to the guest. This is a behavior
> change since previously a #GP was only injected if accessing the MSR
> on the real hardware will also trigger a #GP.
...
> I wonder whether we need to to enable this option by default for
> guests being migrated from previous Xen versions? Maybe that's not
> required as the option is helpful mostly for early boot I would
> assume, afterwards an OS should already have the #GP handler setup
> when accessing MSRs.

I think it's almost as bad to have guests which can be migrated in,
but which then cannot reboot.

Historically we have taken the view that new Xen must support old
guests, even if that means being bug-compatible.  So I am strongly in
favour of avoiding such a usability regression.

Which I think means enabling this option by default ?

> >From a release PoV the biggest risk would be breaking some of the
> existing MSR functionality. I think that's a necessary risk in order
> to offer such fallback option, or else we might discover after the
> release that guests that worked on Xen 4.14 don't work anymore in Xen
> 4.15.

Yes.

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Ian.
Roger Pau Monné March 4, 2021, 3:13 p.m. UTC | #2
On Thu, Mar 04, 2021 at 02:59:38PM +0000, Ian Jackson wrote:
> Roger Pau Monne writes ("[PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> > Introduce an option to allow selecting a less strict behaviour for
> > rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> > commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> > Xen result in the injection of a #GP to the guest. This is a behavior
> > change since previously a #GP was only injected if accessing the MSR
> > on the real hardware will also trigger a #GP.
> ...
> > I wonder whether we need to to enable this option by default for
> > guests being migrated from previous Xen versions? Maybe that's not
> > required as the option is helpful mostly for early boot I would
> > assume, afterwards an OS should already have the #GP handler setup
> > when accessing MSRs.
> 
> I think it's almost as bad to have guests which can be migrated in,
> but which then cannot reboot.

Ups, yes, right.

> Historically we have taken the view that new Xen must support old
> guests, even if that means being bug-compatible.  So I am strongly in
> favour of avoiding such a usability regression.

I'm not a xl/libxl expert, but couldn't we set the option in a
persistent way for migrated-in guests?

IIRC at domain creation libxl knows whether it's a restore or a fresh
domain, and hence we could set the option there?

The part I'm not sure is about how to make it persistent.

Thanks, Roger.
Ian Jackson March 4, 2021, 3:20 p.m. UTC | #3
Roger Pau Monné writes ("Re: [PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> On Thu, Mar 04, 2021 at 02:59:38PM +0000, Ian Jackson wrote:
> > I think it's almost as bad to have guests which can be migrated in,
> > but which then cannot reboot.
> 
> Ups, yes, right.
> 
> > Historically we have taken the view that new Xen must support old
> > guests, even if that means being bug-compatible.  So I am strongly in
> > favour of avoiding such a usability regression.
> 
> I'm not a xl/libxl expert, but couldn't we set the option in a
> persistent way for migrated-in guests?
> 
> IIRC at domain creation libxl knows whether it's a restore or a fresh
> domain, and hence we could set the option there?
> 
> The part I'm not sure is about how to make it persistent.

The guest could be stopped with xl shutdown and then recrated with xl
create, from the config file.  I don't think we want to break that use
case here either.

Ian.
Roger Pau Monné March 4, 2021, 4:55 p.m. UTC | #4
On Thu, Mar 04, 2021 at 03:20:34PM +0000, Ian Jackson wrote:
> Roger Pau Monné writes ("Re: [PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> > On Thu, Mar 04, 2021 at 02:59:38PM +0000, Ian Jackson wrote:
> > > I think it's almost as bad to have guests which can be migrated in,
> > > but which then cannot reboot.
> > 
> > Ups, yes, right.
> > 
> > > Historically we have taken the view that new Xen must support old
> > > guests, even if that means being bug-compatible.  So I am strongly in
> > > favour of avoiding such a usability regression.
> > 
> > I'm not a xl/libxl expert, but couldn't we set the option in a
> > persistent way for migrated-in guests?
> > 
> > IIRC at domain creation libxl knows whether it's a restore or a fresh
> > domain, and hence we could set the option there?
> > 
> > The part I'm not sure is about how to make it persistent.
> 
> The guest could be stopped with xl shutdown and then recrated with xl
> create, from the config file.  I don't think we want to break that use
> case here either.

So my original approach was to actually risk breaking creation from
config file and require the user to set the rdmsr_relaxed option, and
report the problem upstream. I think ideally we would like to get to a
point where we could drop the rdmsr_relaxed option, but maybe that's
too optimistic.

We have done quite a lot of testing of this new policy, but obviously
it's not possible to test all possible guest OSes. Forcing the new
policy by default might be too risky, so indeed falling back to
enabling this by default could be the only solution.

The main downside of enabling by default is that then we have to
resign to always having this kind of quirky behavior for MSR
accesses as the default.

Thanks, Roger.
Ian Jackson March 4, 2021, 5:13 p.m. UTC | #5
Roger Pau Monné writes ("Re: [PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> On Thu, Mar 04, 2021 at 03:20:34PM +0000, Ian Jackson wrote:
> > The guest could be stopped with xl shutdown and then recrated with xl
> > create, from the config file.  I don't think we want to break that use
> > case here either.
> 
> So my original approach was to actually risk breaking creation from
> config file and require the user to set the rdmsr_relaxed option, and
> report the problem upstream. I think ideally we would like to get to a
> point where we could drop the rdmsr_relaxed option, but maybe that's
> too optimistic.

Isn't there some way we can move in this direction without the first
thing that users experience being their guests not being able to be
created ?

Maybe we could print a warning on the console or something ?

> We have done quite a lot of testing of this new policy, but obviously
> it's not possible to test all possible guest OSes. Forcing the new
> policy by default might be too risky, so indeed falling back to
> enabling this by default could be the only solution.
> 
> The main downside of enabling by default is that then we have to
> resign to always having this kind of quirky behavior for MSR
> accesses as the default.

What would stop us changing the default later, when we had a better
idea of the set of RDMSRs that need to be special-cased ?

Ian.
Roger Pau Monné March 4, 2021, 5:43 p.m. UTC | #6
On Thu, Mar 04, 2021 at 05:13:15PM +0000, Ian Jackson wrote:
> Roger Pau Monné writes ("Re: [PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> > On Thu, Mar 04, 2021 at 03:20:34PM +0000, Ian Jackson wrote:
> > > The guest could be stopped with xl shutdown and then recrated with xl
> > > create, from the config file.  I don't think we want to break that use
> > > case here either.
> > 
> > So my original approach was to actually risk breaking creation from
> > config file and require the user to set the rdmsr_relaxed option, and
> > report the problem upstream. I think ideally we would like to get to a
> > point where we could drop the rdmsr_relaxed option, but maybe that's
> > too optimistic.
> 
> Isn't there some way we can move in this direction without the first
> thing that users experience being their guests not being able to be
> created ?
> 
> Maybe we could print a warning on the console or something ?

I (sadly) fear unless you get a guest crash no-one will ever look at
the logs and notice those messages.

> > We have done quite a lot of testing of this new policy, but obviously
> > it's not possible to test all possible guest OSes. Forcing the new
> > policy by default might be too risky, so indeed falling back to
> > enabling this by default could be the only solution.
> > 
> > The main downside of enabling by default is that then we have to
> > resign to always having this kind of quirky behavior for MSR
> > accesses as the default.
> 
> What would stop us changing the default later, when we had a better
> idea of the set of RDMSRs that need to be special-cased ?

We could. From the Citrix side I'm afraid there's not much more
testing that we can do however. We tested the new policy against all
possible guests known by XenRT, but obviously that's not every
possible OS.

One option we could go for is making this behavior depend on Kconfig:
enable strict MSR policy for debug builds and fallback to the
'relaxed' one for non-debug builds. That might get us some more data,
but again I fear most people out there will run non-debug builds
anyway.

Thanks, Roger.
Ian Jackson March 4, 2021, 6:21 p.m. UTC | #7
Roger Pau Monné writes ("Re: [PATCH v2 for-4.15] x86/msr: introduce an option for HVM relaxed rdmsr behavior"):
> One option we could go for is making this behavior depend on Kconfig:
> enable strict MSR policy for debug builds and fallback to the
> 'relaxed' one for non-debug builds. That might get us some more data,
> but again I fear most people out there will run non-debug builds
> anyway.

Hmmm.

Well, anyway, my R-A for this patch stands.

I wanted to explore our options, but I won't try to insist on a change
to the default configuration if you experts are against it.

Ian.
Boris Ostrovsky March 4, 2021, 11:09 p.m. UTC | #8
On 3/4/21 9:47 AM, Roger Pau Monne wrote:
> Introduce an option to allow selecting a less strict behaviour for
> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> Xen result in the injection of a #GP to the guest. This is a behavior
> change since previously a #GP was only injected if accessing the MSR
> on the real hardware will also trigger a #GP.
>
> This commit attempts to offer a fallback option similar to the
> previous behavior. Note however that the value of the underlying MSR
> is never leaked to the guest, as the newly introduced option only
> changes whether a #GP is injected or not.
>
> Long term the plan is to properly handle all the MSRs, so the option
> introduced here should be considered a temporary resort for OSes that
> don't work properly with the new MSR policy. Any OS that requires this
> option to be enabled should be reported to
> xen-devel@lists.xenproject.org.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Only apply the option to HVM guests.
>  - Only apply the special handling to MSR reads.
>  - Sanitize the newly introduced flags field.
>  - Print a warning message when the option is used.
> ---
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Boris, could you please test with Solaris to see if this fixes the
> issue?


Yes, still works. (It worked especially well after I noticed new option name ;-))


-boris
Andrew Cooper March 4, 2021, 11:28 p.m. UTC | #9
On 04/03/2021 23:09, Boris Ostrovsky wrote:
> On 3/4/21 9:47 AM, Roger Pau Monne wrote:
>> Introduce an option to allow selecting a less strict behaviour for
>> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
>> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
>> Xen result in the injection of a #GP to the guest. This is a behavior
>> change since previously a #GP was only injected if accessing the MSR
>> on the real hardware will also trigger a #GP.
>>
>> This commit attempts to offer a fallback option similar to the
>> previous behavior. Note however that the value of the underlying MSR
>> is never leaked to the guest, as the newly introduced option only
>> changes whether a #GP is injected or not.
>>
>> Long term the plan is to properly handle all the MSRs, so the option
>> introduced here should be considered a temporary resort for OSes that
>> don't work properly with the new MSR policy. Any OS that requires this
>> option to be enabled should be reported to
>> xen-devel@lists.xenproject.org.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> Changes since v1:
>>  - Only apply the option to HVM guests.
>>  - Only apply the special handling to MSR reads.
>>  - Sanitize the newly introduced flags field.
>>  - Print a warning message when the option is used.
>> ---
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>> Boris, could you please test with Solaris to see if this fixes the
>> issue?
>
> Yes, still works. (It worked especially well after I noticed new option name ;-))

I'm afraid I want to break and rework how this bugfix happens.  Solaris
is still broken on all older branches and this isn't a suitable fix to
backport.

~Andrew
Andrew Cooper March 5, 2021, 12:06 a.m. UTC | #10
On 04/03/2021 14:47, Roger Pau Monne wrote:
> Introduce an option to allow selecting a less strict behaviour for
> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> Xen result in the injection of a #GP to the guest. This is a behavior
> change since previously a #GP was only injected if accessing the MSR
> on the real hardware will also trigger a #GP.
>
> This commit attempts to offer a fallback option similar to the
> previous behavior. Note however that the value of the underlying MSR
> is never leaked to the guest, as the newly introduced option only
> changes whether a #GP is injected or not.
>
> Long term the plan is to properly handle all the MSRs, so the option
> introduced here should be considered a temporary resort for OSes that
> don't work properly with the new MSR policy. Any OS that requires this
> option to be enabled should be reported to
> xen-devel@lists.xenproject.org.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Only apply the option to HVM guests.
>  - Only apply the special handling to MSR reads.
>  - Sanitize the newly introduced flags field.
>  - Print a warning message when the option is used.
> ---
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Boris, could you please test with Solaris to see if this fixes the
> issue?
>
> I wonder whether we need to to enable this option by default for
> guests being migrated from previous Xen versions? Maybe that's not
> required as the option is helpful mostly for early boot I would
> assume, afterwards an OS should already have the #GP handler setup
> when accessing MSRs.

We know when building a domain whether it is a migrate or not, but don't
recall any version information existing at an appropriate point in the
migration stream to do this easily.

We can buffer the stream forward and peek at the libxc domain header,
which does have the source hypervisor version, but that is going to be
very invasive to implement.

>
> From a release PoV the biggest risk would be breaking some of the
> existing MSR functionality. I think that's a necessary risk in order
> to offer such fallback option, or else we might discover after the
> release that guests that worked on Xen 4.14 don't work anymore in Xen
> 4.15.

Much as I'd prefer not to have this, I agree with the sentiment that we
should have an "emergency undo" which people can use, and carry it for
at least a short while.

However, to be useful for the purpose of unbreaking VMs, it must change
both the read and write behaviour, because both are potential
compatibility concerns (without reintroducing the information leak).

> ---
>  docs/man/xl.cfg.5.pod.in          | 17 +++++++++++++++++
>  tools/include/libxl.h             |  8 ++++++++
>  tools/libs/light/libxl_types.idl  |  2 ++
>  tools/libs/light/libxl_x86.c      |  4 ++++
>  tools/xl/xl_parse.c               |  7 +++++++
>  xen/arch/x86/domain.c             | 10 ++++++++++
>  xen/arch/x86/hvm/svm/svm.c        |  6 ++++++
>  xen/arch/x86/hvm/vmx/vmx.c        |  7 +++++++
>  xen/include/asm-x86/hvm/domain.h  |  3 +++
>  xen/include/public/arch-x86/xen.h |  8 ++++++++

This needs changes to the Ocaml bindings as well.  I guess I'll add that
to the todo list.

~Andrew
Roger Pau Monné March 5, 2021, 8:26 a.m. UTC | #11
On Thu, Mar 04, 2021 at 11:28:16PM +0000, Andrew Cooper wrote:
> On 04/03/2021 23:09, Boris Ostrovsky wrote:
> > On 3/4/21 9:47 AM, Roger Pau Monne wrote:
> >> Introduce an option to allow selecting a less strict behaviour for
> >> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> >> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> >> Xen result in the injection of a #GP to the guest. This is a behavior
> >> change since previously a #GP was only injected if accessing the MSR
> >> on the real hardware will also trigger a #GP.
> >>
> >> This commit attempts to offer a fallback option similar to the
> >> previous behavior. Note however that the value of the underlying MSR
> >> is never leaked to the guest, as the newly introduced option only
> >> changes whether a #GP is injected or not.
> >>
> >> Long term the plan is to properly handle all the MSRs, so the option
> >> introduced here should be considered a temporary resort for OSes that
> >> don't work properly with the new MSR policy. Any OS that requires this
> >> option to be enabled should be reported to
> >> xen-devel@lists.xenproject.org.
> >>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> ---
> >> Changes since v1:
> >>  - Only apply the option to HVM guests.
> >>  - Only apply the special handling to MSR reads.
> >>  - Sanitize the newly introduced flags field.
> >>  - Print a warning message when the option is used.
> >> ---
> >> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >> ---
> >> Boris, could you please test with Solaris to see if this fixes the
> >> issue?
> >
> > Yes, still works. (It worked especially well after I noticed new option name ;-))
> 
> I'm afraid I want to break and rework how this bugfix happens.  Solaris
> is still broken on all older branches and this isn't a suitable fix to
> backport.

Right, I think that's the reactive part of the fixing that we spoke
with Jan, but I think we would still need something similar to
rdmsr_relaxed.

There's at least one other MSR which Jan identified that we also want
to handle MSR_K7_HWCR.

Thanks, Roger.
Roger Pau Monné March 5, 2021, 9:15 a.m. UTC | #12
On Fri, Mar 05, 2021 at 12:06:19AM +0000, Andrew Cooper wrote:
> On 04/03/2021 14:47, Roger Pau Monne wrote:
> > Introduce an option to allow selecting a less strict behaviour for
> > rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> > commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> > Xen result in the injection of a #GP to the guest. This is a behavior
> > change since previously a #GP was only injected if accessing the MSR
> > on the real hardware will also trigger a #GP.
> >
> > This commit attempts to offer a fallback option similar to the
> > previous behavior. Note however that the value of the underlying MSR
> > is never leaked to the guest, as the newly introduced option only
> > changes whether a #GP is injected or not.
> >
> > Long term the plan is to properly handle all the MSRs, so the option
> > introduced here should be considered a temporary resort for OSes that
> > don't work properly with the new MSR policy. Any OS that requires this
> > option to be enabled should be reported to
> > xen-devel@lists.xenproject.org.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Only apply the option to HVM guests.
> >  - Only apply the special handling to MSR reads.
> >  - Sanitize the newly introduced flags field.
> >  - Print a warning message when the option is used.
> > ---
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > ---
> > Boris, could you please test with Solaris to see if this fixes the
> > issue?
> >
> > I wonder whether we need to to enable this option by default for
> > guests being migrated from previous Xen versions? Maybe that's not
> > required as the option is helpful mostly for early boot I would
> > assume, afterwards an OS should already have the #GP handler setup
> > when accessing MSRs.
> 
> We know when building a domain whether it is a migrate or not, but don't
> recall any version information existing at an appropriate point in the
> migration stream to do this easily.
> 
> We can buffer the stream forward and peek at the libxc domain header,
> which does have the source hypervisor version, but that is going to be
> very invasive to implement.

I need to look at this more closely to have an opinion. Instead of
figuring out if the source version is older maybe we could add
something to the 4.15 stream in a suitable position to detect whether
the source is new enough?

> >
> > From a release PoV the biggest risk would be breaking some of the
> > existing MSR functionality. I think that's a necessary risk in order
> > to offer such fallback option, or else we might discover after the
> > release that guests that worked on Xen 4.14 don't work anymore in Xen
> > 4.15.
> 
> Much as I'd prefer not to have this, I agree with the sentiment that we
> should have an "emergency undo" which people can use, and carry it for
> at least a short while.
> 
> However, to be useful for the purpose of unbreaking VMs, it must change
> both the read and write behaviour, because both are potential
> compatibility concerns (without reintroducing the information leak).

I think I was confused here and assumed the previous behavior would
check the written value to match the current underlying value before
injecting a #GP. That's not the case.

I can expand this patch to include the write side, I just thought
having the rad side only would be enough to cover for the unhandled
MSRs accesses.

> 
> > ---
> >  docs/man/xl.cfg.5.pod.in          | 17 +++++++++++++++++
> >  tools/include/libxl.h             |  8 ++++++++
> >  tools/libs/light/libxl_types.idl  |  2 ++
> >  tools/libs/light/libxl_x86.c      |  4 ++++
> >  tools/xl/xl_parse.c               |  7 +++++++
> >  xen/arch/x86/domain.c             | 10 ++++++++++
> >  xen/arch/x86/hvm/svm/svm.c        |  6 ++++++
> >  xen/arch/x86/hvm/vmx/vmx.c        |  7 +++++++
> >  xen/include/asm-x86/hvm/domain.h  |  3 +++
> >  xen/include/public/arch-x86/xen.h |  8 ++++++++
> 
> This needs changes to the Ocaml bindings as well.  I guess I'll add that
> to the todo list.

Hm, would be better if someone else did those, as I don't know
anything about Ocaml. I can try however in the next version, maybe
it's trivial to add.

Thanks, Roger.
Jan Beulich March 5, 2021, 10:26 a.m. UTC | #13
On 04.03.2021 18:43, Roger Pau Monné wrote:
> One option we could go for is making this behavior depend on Kconfig:
> enable strict MSR policy for debug builds and fallback to the
> 'relaxed' one for non-debug builds. That might get us some more data,
> but again I fear most people out there will run non-debug builds
> anyway.

Plus of course we'd almost never test the "relaxed" code path(s).

Jan
Jan Beulich March 5, 2021, 10:35 a.m. UTC | #14
On 05.03.2021 10:15, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 12:06:19AM +0000, Andrew Cooper wrote:
>> On 04/03/2021 14:47, Roger Pau Monne wrote:
>>> From a release PoV the biggest risk would be breaking some of the
>>> existing MSR functionality. I think that's a necessary risk in order
>>> to offer such fallback option, or else we might discover after the
>>> release that guests that worked on Xen 4.14 don't work anymore in Xen
>>> 4.15.
>>
>> Much as I'd prefer not to have this, I agree with the sentiment that we
>> should have an "emergency undo" which people can use, and carry it for
>> at least a short while.
>>
>> However, to be useful for the purpose of unbreaking VMs, it must change
>> both the read and write behaviour, because both are potential
>> compatibility concerns (without reintroducing the information leak).
> 
> I think I was confused here and assumed the previous behavior would
> check the written value to match the current underlying value before
> injecting a #GP. That's not the case.
> 
> I can expand this patch to include the write side, I just thought
> having the rad side only would be enough to cover for the unhandled
> MSRs accesses.

Both when seeing this patch's title and when ripping the write part
out of my patch I meant to indicate the same - dealing with just
reads may not be enough. Arguably people could be told to first try
with just relaxing rdmsr handling, but ones anxious to get their
VMs back into production use may ignore such an advice and use the
bigger hammer right away.

Jan
Jan Beulich March 5, 2021, 10:56 a.m. UTC | #15
On 04.03.2021 15:47, Roger Pau Monne wrote:
> Introduce an option to allow selecting a less strict behaviour for
> rdmsr accesses targeting a MSR not explicitly handled by Xen. Since
> commit 84e848fd7a162f669 accesses to MSRs not explicitly handled by
> Xen result in the injection of a #GP to the guest. This is a behavior
> change since previously a #GP was only injected if accessing the MSR
> on the real hardware will also trigger a #GP.
> 
> This commit attempts to offer a fallback option similar to the
> previous behavior. Note however that the value of the underlying MSR
> is never leaked to the guest, as the newly introduced option only
> changes whether a #GP is injected or not.
> 
> Long term the plan is to properly handle all the MSRs, so the option
> introduced here should be considered a temporary resort for OSes that
> don't work properly with the new MSR policy. Any OS that requires this
> option to be enabled should be reported to
> xen-devel@lists.xenproject.org.

While the title says this is limited to HVM guests, I have to admit
that I fail to see why this is, and hence I would have hoped for
some clarification in the description. In particular I don't think
my "guest in early boot" workaround, of which I posted v2 earlier
today, can be assumed to be enough in the longer run. Recall that
it relaxes behavior only when the guest didn't install a handler
for #GP yet - this means it wouldn't help with any unguarded RDMSR
the guest might issue later, with a handler already installed.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -760,6 +760,13 @@ int arch_domain_create(struct domain *d,
>                 d->domain_id);
>      }
>  
> +    if ( config->arch.domain_flags & ~XEN_X86_RDMSR_RELAXED )
> +    {
> +        printk(XENLOG_G_ERR "d%d: Invalid arch domain flags: %#x\n",
> +               d->domain_id, config->arch.domain_flags);
> +        return -EINVAL;
> +    }

This would look to better go into arch_sanitise_domain_config().
And if the flag remains HVM-only, that aspect should then also be
checked (i.e. the flag being set would then also need rejecting
for PV guests).

> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1795,6 +1795,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>      const struct domain *d = v->domain;
>      struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
>      const struct nestedsvm *nsvm = &vcpu_nestedsvm(v);
> +    uint64_t tmp;
>  
>      switch ( msr )
>      {
> @@ -1965,6 +1966,11 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>          break;
>  
>      default:
> +        if ( d->arch.hvm.rdmsr_relaxed && !rdmsr_safe(msr, tmp) )
> +        {
> +            *msr_content = 0;
> +            break;
> +        }

You don't really need "tmp" here, do you? You could as well read
into *msr_content, as you're zapping the value afterwards anyway.

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -122,6 +122,9 @@ struct hvm_domain {
>  
>      bool_t                 is_s3_suspended;
>  
> +    /* Don't unconditionally inject #GP for unhandled MSRs reads. */
> +    bool rdmsr_relaxed;

If, again, this is to remain HVM-only, then you insertion wants
to honor the blank padding other field decls use. I'd also like
to ask for your insertion to be moved up a few lines, to after
"is_in_uc_mode". I have a patch queued already to also move
"is_s3_suspended" into that hole; it's from November last year,
so it looks like I simply forgot to post it.

Jan
Jan Beulich March 5, 2021, 11:06 a.m. UTC | #16
On 05.03.2021 11:56, Jan Beulich wrote:
> On 04.03.2021 15:47, Roger Pau Monne wrote:
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1795,6 +1795,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>      const struct domain *d = v->domain;
>>      struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
>>      const struct nestedsvm *nsvm = &vcpu_nestedsvm(v);
>> +    uint64_t tmp;
>>  
>>      switch ( msr )
>>      {
>> @@ -1965,6 +1966,11 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>          break;
>>  
>>      default:
>> +        if ( d->arch.hvm.rdmsr_relaxed && !rdmsr_safe(msr, tmp) )
>> +        {
>> +            *msr_content = 0;
>> +            break;
>> +        }
> 
> You don't really need "tmp" here, do you? You could as well read
> into *msr_content, as you're zapping the value afterwards anyway.

Actually, while perhaps indeed not strictly needed, it allows the
compiler to produce better code, as it'll be able to recognize
the value doesn't need writing to memory on any path. I guess I'll
change the logic in my related patch along these lines then.

Jan
Roger Pau Monné March 8, 2021, 2:30 p.m. UTC | #17
On Fri, Mar 05, 2021 at 11:56:33AM +0100, Jan Beulich wrote:
> On 04.03.2021 15:47, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/svm/svm.c
> > +++ b/xen/arch/x86/hvm/svm/svm.c
> > @@ -1795,6 +1795,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
> >      const struct domain *d = v->domain;
> >      struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
> >      const struct nestedsvm *nsvm = &vcpu_nestedsvm(v);
> > +    uint64_t tmp;
> >  
> >      switch ( msr )
> >      {
> > @@ -1965,6 +1966,11 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
> >          break;
> >  
> >      default:
> > +        if ( d->arch.hvm.rdmsr_relaxed && !rdmsr_safe(msr, tmp) )
> > +        {
> > +            *msr_content = 0;
> > +            break;
> > +        }
> 
> You don't really need "tmp" here, do you? You could as well read
> into *msr_content, as you're zapping the value afterwards anyway.

I also thought about doing this, but felt unease. I fear the code
might be changed in the future and maybe msr_content is not zapped
anymore, thus leaking the content. I feel it's safer to use a
temporary variable that will never be returned to the guest. Maybe
I'm just too paranoid.

Thanks, Roger.
diff mbox series

Patch

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 040374dcd6..62178b9829 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2861,6 +2861,23 @@  No MCA capabilities in above list are enabled.
 
 =back
 
+=item B<rdmsr_relaxed=BOOLEAN>
+
+Select whether to use a relaxed behavior for read accesses to MSRs not
+explicitly handled by Xen instead of injecting a #GP to the guest.  Such access
+mode will force Xen to replicate the behaviour from the hardware it's currently
+running on in order to decide whether a #GP is injected to the guest.  Note
+that the guest is never allowed to read the value of unhandled MSRs, this
+option only changes whether a #GP might be injected or not.
+
+This option is only relevant for HVM guests, and will be removed in future
+releases once we are certain the default MSR access policy has been properly
+tested by a wide variety of guests.  If you need to use this option please send
+a bug report to xen-devel@lists.xenproject.org with the details of the guests
+you are running that require it.
+
+=back
+
 =back
 
 =head1 SEE ALSO
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index a7b673e89d..1cc40a2d67 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -495,6 +495,14 @@ 
  */
 #define LIBXL_HAVE_VMTRACE_BUF_KB 1
 
+/*
+ * LIBXL_HAVE_RDMSR_RELAXED indicates the toolstack has support for switching
+ * the rdmsr handling in the hypervisor to relaxed mode, where #GP is only
+ * injected to guests for unhandled MSRs if accessing the MSR on the physical
+ * hardware also triggers a #GP.
+ */
+#define LIBXL_HAVE_RDMSR_RELAXED 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 5b85a7419f..03b0c80146 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -644,6 +644,8 @@  libxl_domain_build_info = Struct("domain_build_info",[
     ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
                                ("vuart", libxl_vuart_type),
                               ])),
+    ("arch_x86", Struct(None, [("rdmsr_relaxed", libxl_defbool),
+                              ])),
     # Alternate p2m is not bound to any architecture or guest type, as it is
     # supported by x86 HVM and ARM support is planned.
     ("altp2m", libxl_altp2m_mode),
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 58187ed760..c9cff44088 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -5,9 +5,12 @@  int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       struct xen_domctl_createdomain *config)
 {
+    config->arch.domain_flags = 0;
     switch(d_config->c_info.type) {
     case LIBXL_DOMAIN_TYPE_HVM:
         config->arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+        if (libxl_defbool_val(d_config->b_info.arch_x86.rdmsr_relaxed))
+            config->arch.domain_flags |= XEN_X86_RDMSR_RELAXED;
         break;
     case LIBXL_DOMAIN_TYPE_PVH:
         config->arch.emulation_flags = XEN_X86_EMU_LAPIC;
@@ -809,6 +812,7 @@  void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
                                               libxl_domain_build_info *b_info)
 {
     libxl_defbool_setdefault(&b_info->acpi, true);
+    libxl_defbool_setdefault(&b_info->arch_x86.rdmsr_relaxed, false);
 }
 
 int libxl__arch_passthrough_mode_setdefault(libxl__gc *gc,
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 1893cfc086..9f52c7e914 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2636,6 +2636,13 @@  skip_usbdev:
         xlu_cfg_replace_string (config, "spice_streaming_video",
                                 &b_info->u.hvm.spice.streaming_video, 0);
         xlu_cfg_get_defbool(config, "nographic", &b_info->u.hvm.nographic, 0);
+        if (!xlu_cfg_get_defbool(config, "rdmsr_relaxed",
+                                 &b_info->arch_x86.rdmsr_relaxed, 0))
+            fprintf(stderr,
+                    "WARNING: rdmsr_relaxed will be removed in future versions.\n"
+                    "If it fixes an issue you are having please report to "
+                    "xen-devel@lists.xenproject.org.\n");
+
         if (!xlu_cfg_get_long(config, "gfx_passthru", &l, 1)) {
             libxl_defbool_set(&b_info->u.hvm.gfx_passthru, l);
         } else if (!xlu_cfg_get_string(config, "gfx_passthru", &buf, 0)) {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 5e3c94d3fa..c06b17d338 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -760,6 +760,13 @@  int arch_domain_create(struct domain *d,
                d->domain_id);
     }
 
+    if ( config->arch.domain_flags & ~XEN_X86_RDMSR_RELAXED )
+    {
+        printk(XENLOG_G_ERR "d%d: Invalid arch domain flags: %#x\n",
+               d->domain_id, config->arch.domain_flags);
+        return -EINVAL;
+    }
+
     emflags = config->arch.emulation_flags;
 
     if ( is_hardware_domain(d) && is_pv_domain(d) )
@@ -824,6 +831,9 @@  int arch_domain_create(struct domain *d,
     {
         if ( (rc = hvm_domain_initialise(d)) != 0 )
             goto fail;
+
+        d->arch.hvm.rdmsr_relaxed = config->arch.domain_flags &
+                                    XEN_X86_RDMSR_RELAXED;
     }
     else if ( is_pv_domain(d) )
     {
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b819897a4a..d036809bd3 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1795,6 +1795,7 @@  static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     const struct domain *d = v->domain;
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
     const struct nestedsvm *nsvm = &vcpu_nestedsvm(v);
+    uint64_t tmp;
 
     switch ( msr )
     {
@@ -1965,6 +1966,11 @@  static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         break;
 
     default:
+        if ( d->arch.hvm.rdmsr_relaxed && !rdmsr_safe(msr, tmp) )
+        {
+            *msr_content = 0;
+            break;
+        }
         gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr);
         goto gpf;
     }
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bfea1b0f8a..883e43a0bb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3123,6 +3123,7 @@  static int is_last_branch_msr(u32 ecx)
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
 {
     struct vcpu *curr = current;
+    uint64_t tmp;
 
     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
 
@@ -3204,6 +3205,12 @@  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
             break;
         }
 
+        if ( curr->domain->arch.hvm.rdmsr_relaxed && !rdmsr_safe(msr, tmp) )
+        {
+            *msr_content = 0;
+            break;
+        }
+
         gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr);
         goto gp_fault;
     }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 7b60e9125f..fdc1b36513 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -122,6 +122,9 @@  struct hvm_domain {
 
     bool_t                 is_s3_suspended;
 
+    /* Don't unconditionally inject #GP for unhandled MSRs reads. */
+    bool rdmsr_relaxed;
+
     /*
      * TSC value that VCPUs use to calculate their tsc_offset value.
      * Used during initialization and save/restore.
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 629cb2ba40..fbf91bf3b9 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -304,6 +304,14 @@  struct xen_arch_domainconfig {
                                      XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
                                      XEN_X86_EMU_VPCI)
     uint32_t emulation_flags;
+
+/*
+ * HVM only: select whether to use a relaxed behavior for read accesses to MSRs
+ * not explicitly handled by Xen instead of injecting a #GP to the guest. Note
+ * this option doesn't allow the guest to read the hardware value.
+ */
+#define XEN_X86_RDMSR_RELAXED       (1u << 0)
+    uint32_t domain_flags;
 };
 
 /* Location of online VCPU bitmap. */