Message ID | 20210301162357.76527-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,for-4.15] x86/msr: introduce an option for legacy MSR behavior selection | expand |
Roger Pau Monne writes ("[PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection"): > Introduce an option to allow selecting the legacy behavior for > accesses to MSRs not explicitly handled. Since commit > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > handled by Xen result in the injection of a #GP to the guest. This is > a behavior change since previously a #GP was only injected if > accessing the MSR on the real hardware will also trigger a #GP. > > This seems to be problematic for some guests, so introduce an option > to fallback to this legacy behavior. The main difference between what > was previously done is that the hardware MSR value is not leaked to > the guests on reads. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > --- > Note that this option is not made available to dom0. I'm not sure > whether it makes sense to do so, since anyone updating Xen to such > newer version will also likely pair it with a newish kernel that > doesn't require such workarounds. > > RFC because there's still some debate as to how we should solve the > MSR issue, this is one possible way, but IMO we need to make a > decision soon-ish because of the release timeline. > > Boris, could you please test with Solaris to see if this fixes the > issue? So AIUI this patch is to make it possible for Xen 4.15 to behave like Xen 4.14, thus avoiding a regression for these troublesome guests. Have we diffed the result of this against 4.14 and if not would it be a sensible thing to do ? Ian.
On Mon, Mar 01, 2021 at 05:16:34PM +0000, Ian Jackson wrote: > Roger Pau Monne writes ("[PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection"): > > Introduce an option to allow selecting the legacy behavior for > > accesses to MSRs not explicitly handled. Since commit > > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > > handled by Xen result in the injection of a #GP to the guest. This is > > a behavior change since previously a #GP was only injected if > > accessing the MSR on the real hardware will also trigger a #GP. > > > > This seems to be problematic for some guests, so introduce an option > > to fallback to this legacy behavior. The main difference between what > > was previously done is that the hardware MSR value is not leaked to > > the guests on reads. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > > --- > > Note that this option is not made available to dom0. I'm not sure > > whether it makes sense to do so, since anyone updating Xen to such > > newer version will also likely pair it with a newish kernel that > > doesn't require such workarounds. > > > > RFC because there's still some debate as to how we should solve the > > MSR issue, this is one possible way, but IMO we need to make a > > decision soon-ish because of the release timeline. > > > > Boris, could you please test with Solaris to see if this fixes the > > issue? > > So AIUI this patch is to make it possible for Xen 4.15 to behave like > Xen 4.14, thus avoiding a regression for these troublesome guests. Yes, sorry I haven't provided a release executive summary, as I wasn't sure this would be acceptable in it's current form. Can do if there's consensus this is an acceptable fix. > Have we diffed the result of this against 4.14 and if not would it be > a sensible thing to do ? I think there will likely bee too much noise, we have changed the MSR handling a bit from 4.14, so it's likely a diff to 4.14 is not going to be helpful as the context will have too many changes (albeit I haven't tried the exercise myself). Thanks, Roger.
Roger Pau Monné writes ("Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection"): > On Mon, Mar 01, 2021 at 05:16:34PM +0000, Ian Jackson wrote: > > So AIUI this patch is to make it possible for Xen 4.15 to behave like > > Xen 4.14, thus avoiding a regression for these troublesome guests. > > Yes, sorry I haven't provided a release executive summary, as I wasn't > sure this would be acceptable in it's current form. Can do if there's > consensus this is an acceptable fix. Thanks for the information. I am in favour of fixing this issue. I have it on my 4.15 blockers list. Thanks, Ian.
On 3/1/21 11:23 AM, Roger Pau Monne wrote: > Introduce an option to allow selecting the legacy behavior for > accesses to MSRs not explicitly handled. Since commit > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > handled by Xen result in the injection of a #GP to the guest. This is > a behavior change since previously a #GP was only injected if > accessing the MSR on the real hardware will also trigger a #GP. > > This seems to be problematic for some guests, so introduce an option > to fallback to this legacy behavior. The main difference between what > was previously done is that the hardware MSR value is not leaked to > the guests on reads. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > --- > Note that this option is not made available to dom0. I'm not sure > whether it makes sense to do so, since anyone updating Xen to such > newer version will also likely pair it with a newish kernel that > doesn't require such workarounds. > > RFC because there's still some debate as to how we should solve the > MSR issue, this is one possible way, but IMO we need to make a > decision soon-ish because of the release timeline. > > Boris, could you please test with Solaris to see if this fixes the > issue? Yes, it does. Thanks. -boris
On 01.03.2021 17:23, Roger Pau Monne wrote: > Introduce an option to allow selecting the legacy behavior for > accesses to MSRs not explicitly handled. Since commit > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > handled by Xen result in the injection of a #GP to the guest. This is > a behavior change since previously a #GP was only injected if > accessing the MSR on the real hardware will also trigger a #GP. > > This seems to be problematic for some guests, so introduce an option > to fallback to this legacy behavior. The main difference between what > was previously done is that the hardware MSR value is not leaked to > the guests on reads. Looking at the WRMSR behavior for PV, what you introduce isn't matching 4.14 behavior: If rdmsr_safe() failed, all that effected was the issuing of a log message. The behavior you propose is better, no question, but it shouldn't be described as matching legacy behavior then. Somewhat related to this I wonder whether MSR reads and writes wouldn't better be controllable independently. It seems quite likely that a kernel may have an issue only with reads. Additionally I wonder whether it is a good idea to let these events go silently. > Note that this option is not made available to dom0. I'm not sure > whether it makes sense to do so, since anyone updating Xen to such > newer version will also likely pair it with a newish kernel that > doesn't require such workarounds. No, that's not an option imo. It was a Dom0 where I ran into the problem causing me to submit "x86/PV: conditionally avoid raising #GP for early guest MSR accesses". While I could upgrade that system, I have reasons for not doing so. And while I could put a more modern kernel on there, I'd prefer if the distro kernel continued to work. (That's leaving aside that for unrelated reasons building and using my own, newer kernel there is quite a bit more difficult than on all other of my test systems.) > RFC because there's still some debate as to how we should solve the > MSR issue, this is one possible way, but IMO we need to make a > decision soon-ish because of the release timeline. Generally I think it would be far better from a user pov if things worked out of the box, at least in cases where it can be made work. Arguably for Solaris this isn't going to be possible (leaving aside the non-option of fully returning back to original behavior), but for the old-Linux-PV case I still think my proposed change is better in this regard. I realize Andrew said (on irc) that this is making the behavior even weaker. I take a different perspective: Considering a guest will install exception handlers at some point, I continue to fail to see what good it will do to try to inject a #GP(0) when we know the guest will die because of not having a handler registered just yet. It is a kernel flaw, yes, but they ended up living with it merely because of our odd prior behavior, so we can't put all the blame on them. This isn't to say that I'm against propagating exceptions where there's no alternative to delivering one. Also I'm certainly open to further tighten the condition of when to zap such an exception (e.g. only as long as there's no handler _and_ the guest is still in UP mode, assuming of course this would still address the observed issue). > --- a/docs/man/xl.cfg.5.pod.in > +++ b/docs/man/xl.cfg.5.pod.in > @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > > =back > > +=item B<msr_legacy_handling=BOOLEAN> > + > +Select whether to use the legacy behaviour for accesses to MSRs not explicitly > +handled by Xen instead of injecting a #GP to the guest. Such legacy access > +mode will force Xen to replicate the behaviour from the hardware it's currently > +running on in order to decide whether a #GP is injected to the guest. Note > +that the guest is never allowed access to unhandled MSRs, this option only > +changes whether a #GP might be injected or not. This description is appropriate for reads, but not for writes: Whether a write would fault can only be known by trying a write, yet for safety reasons we can't risk doing more than a read. An option might be to make writes fault is the to be written value differs from that which the probing read has returned (i.e. the same condition which originally had caused a log message to appear in 4.14 for PV guests). > --- a/tools/libs/light/libxl_types.idl > +++ b/tools/libs/light/libxl_types.idl > @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), > ("vuart", libxl_vuart_type), > ])), > + ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool), > + ])), Seeing this I'm wondering whether the entire set of arch_* shouldn't be within a union. But afaics this would have further implications on code elsewhere, so surely wouldn't want doing right now. > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d, > > domain_cpu_policy_changed(d); > > + d->arch.msr_legacy_handling = config->arch.domain_flags & > + XEN_X86_LEGACY_MSR_HANDLING; Somewhere you'd also need to refuse processing requests with any of the other 31 bits set. Jan
On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: > On 01.03.2021 17:23, Roger Pau Monne wrote: > > Introduce an option to allow selecting the legacy behavior for > > accesses to MSRs not explicitly handled. Since commit > > 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly > > handled by Xen result in the injection of a #GP to the guest. This is > > a behavior change since previously a #GP was only injected if > > accessing the MSR on the real hardware will also trigger a #GP. > > > > This seems to be problematic for some guests, so introduce an option > > to fallback to this legacy behavior. The main difference between what > > was previously done is that the hardware MSR value is not leaked to > > the guests on reads. > > Looking at the WRMSR behavior for PV, what you introduce isn't > matching 4.14 behavior: If rdmsr_safe() failed, all that effected > was the issuing of a log message. The behavior you propose is > better, no question, but it shouldn't be described as matching > legacy behavior then. > > Somewhat related to this I wonder whether MSR reads and writes > wouldn't better be controllable independently. It seems quite > likely that a kernel may have an issue only with reads. > > Additionally I wonder whether it is a good idea to let these > events go silently. > > > Note that this option is not made available to dom0. I'm not sure > > whether it makes sense to do so, since anyone updating Xen to such > > newer version will also likely pair it with a newish kernel that > > doesn't require such workarounds. > > No, that's not an option imo. It was a Dom0 where I ran into the > problem causing me to submit "x86/PV: conditionally avoid raising > #GP for early guest MSR accesses". While I could upgrade that > system, I have reasons for not doing so. And while I could put a > more modern kernel on there, I'd prefer if the distro kernel > continued to work. (That's leaving aside that for unrelated > reasons building and using my own, newer kernel there is quite a > bit more difficult than on all other of my test systems.) > > > RFC because there's still some debate as to how we should solve the > > MSR issue, this is one possible way, but IMO we need to make a > > decision soon-ish because of the release timeline. > > Generally I think it would be far better from a user pov if > things worked out of the box, at least in cases where it can be > made work. Arguably for Solaris this isn't going to be possible > (leaving aside the non-option of fully returning back to original > behavior), but for the old-Linux-PV case I still think my proposed > change is better in this regard. I realize Andrew said (on irc) > that this is making the behavior even weaker. I take a different > perspective: Considering a guest will install exception handlers > at some point, I continue to fail to see what good it will do to > try to inject a #GP(0) when we know the guest will die because of > not having a handler registered just yet. It is a kernel flaw, > yes, but they ended up living with it merely because of our odd > prior behavior, so we can't put all the blame on them. My concern with this would mostly be with newer guests, in the sense that people could come to rely on this behavior of not injecting a #GP if the handler is not setup, which I think we should try to avoid. One option would be to introduce a feature that could be used in the elfnotes to signal that the kernel doesn't require such workarounds for MSR #GP handling, maybe: /* * Signal that the kernel doesn't require suppressing the #GP on * unknown MSR accesses if the handler is not setup. New kernels * should always have this set. */ #define XENFEAT_msr_early_gp 16 We could try to backport this on the Linux kernel as far as we can that we know it's safe to do so. If this is not acceptable then I guess your solution is fine. Arguably PV has it's own (weird) architecture, in which it might be safe to say that no #GP will be injected as a result of a MSR access unless the handler is setup. Ideally we should document this behaviour somewhere. Maybe we could add a rdmsr_safe to your path akin to what's done here? > > This isn't to say that I'm against propagating exceptions where > there's no alternative to delivering one. Also I'm certainly open > to further tighten the condition of when to zap such an exception > (e.g. only as long as there's no handler _and_ the guest is still > in UP mode, assuming of course this would still address the > observed issue). > > > --- a/docs/man/xl.cfg.5.pod.in > > +++ b/docs/man/xl.cfg.5.pod.in > > @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > > > > =back > > > > +=item B<msr_legacy_handling=BOOLEAN> > > + > > +Select whether to use the legacy behaviour for accesses to MSRs not explicitly > > +handled by Xen instead of injecting a #GP to the guest. Such legacy access > > +mode will force Xen to replicate the behaviour from the hardware it's currently > > +running on in order to decide whether a #GP is injected to the guest. Note > > +that the guest is never allowed access to unhandled MSRs, this option only > > +changes whether a #GP might be injected or not. > > This description is appropriate for reads, but not for writes: > Whether a write would fault can only be known by trying a write, > yet for safety reasons we can't risk doing more than a read. An > option might be to make writes fault is the to be written value > differs from that which the probing read has returned (i.e. the > same condition which originally had caused a log message to appear > in 4.14 for PV guests). Read values for unhandled MSRs will always be 0 with the proposed code, so we would inject a #GP to the guest for any write attempt to unhandled MSRs with a value different than 0. Maybe we should just inject a #GP to the guest for any attempts to write to unhandled MSRs? > > --- a/tools/libs/light/libxl_types.idl > > +++ b/tools/libs/light/libxl_types.idl > > @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ > > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), > > ("vuart", libxl_vuart_type), > > ])), > > + ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool), > > + ])), > > Seeing this I'm wondering whether the entire set of arch_* > shouldn't be within a union. But afaics this would have further > implications on code elsewhere, so surely wouldn't want doing > right now. Right, I thought the same but I'm not sure we can change this now, as it's part of the API. Adding new fields is fine, but adding a union field with arch_arm would change the structure as we would need to add a 'u' (or equivalent) field here. > > --- a/xen/arch/x86/domain.c > > +++ b/xen/arch/x86/domain.c > > @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d, > > > > domain_cpu_policy_changed(d); > > > > + d->arch.msr_legacy_handling = config->arch.domain_flags & > > + XEN_X86_LEGACY_MSR_HANDLING; > > Somewhere you'd also need to refuse processing requests with any > of the other 31 bits set. Yes, I need to sanitize the flags. Thanks, Roger.
On 02.03.2021 16:00, Roger Pau Monné wrote: > On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: >> On 01.03.2021 17:23, Roger Pau Monne wrote: >>> RFC because there's still some debate as to how we should solve the >>> MSR issue, this is one possible way, but IMO we need to make a >>> decision soon-ish because of the release timeline. >> >> Generally I think it would be far better from a user pov if >> things worked out of the box, at least in cases where it can be >> made work. Arguably for Solaris this isn't going to be possible >> (leaving aside the non-option of fully returning back to original >> behavior), but for the old-Linux-PV case I still think my proposed >> change is better in this regard. I realize Andrew said (on irc) >> that this is making the behavior even weaker. I take a different >> perspective: Considering a guest will install exception handlers >> at some point, I continue to fail to see what good it will do to >> try to inject a #GP(0) when we know the guest will die because of >> not having a handler registered just yet. It is a kernel flaw, >> yes, but they ended up living with it merely because of our odd >> prior behavior, so we can't put all the blame on them. > > My concern with this would mostly be with newer guests, in the sense > that people could come to rely on this behavior of not injecting a > #GP if the handler is not setup, which I think we should try to avoid. > > One option would be to introduce a feature that could be used in the > elfnotes to signal that the kernel doesn't require such workarounds > for MSR #GP handling, maybe: > > /* > * Signal that the kernel doesn't require suppressing the #GP on > * unknown MSR accesses if the handler is not setup. New kernels > * should always have this set. > */ > #define XENFEAT_msr_early_gp 16 > > We could try to backport this on the Linux kernel as far as we can > that we know it's safe to do so. I too did consider something like this. While I'm not opposed, it effectively requires well-behaved consumers who haven't been well- behaved in the past. > If this is not acceptable then I guess your solution is fine. Arguably > PV has it's own (weird) architecture, in which it might be safe to say > that no #GP will be injected as a result of a MSR access unless the > handler is setup. Ideally we should document this behaviour somewhere. > > Maybe we could add a rdmsr_safe to your path akin to what's done > here? Probably. Would need trying out on the affected older version, just like ... >>> --- a/docs/man/xl.cfg.5.pod.in >>> +++ b/docs/man/xl.cfg.5.pod.in >>> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. >>> >>> =back >>> >>> +=item B<msr_legacy_handling=BOOLEAN> >>> + >>> +Select whether to use the legacy behaviour for accesses to MSRs not explicitly >>> +handled by Xen instead of injecting a #GP to the guest. Such legacy access >>> +mode will force Xen to replicate the behaviour from the hardware it's currently >>> +running on in order to decide whether a #GP is injected to the guest. Note >>> +that the guest is never allowed access to unhandled MSRs, this option only >>> +changes whether a #GP might be injected or not. >> >> This description is appropriate for reads, but not for writes: >> Whether a write would fault can only be known by trying a write, >> yet for safety reasons we can't risk doing more than a read. An >> option might be to make writes fault is the to be written value >> differs from that which the probing read has returned (i.e. the >> same condition which originally had caused a log message to appear >> in 4.14 for PV guests). > > Read values for unhandled MSRs will always be 0 with the proposed > code, so we would inject a #GP to the guest for any write attempt to > unhandled MSRs with a value different than 0. > > Maybe we should just inject a #GP to the guest for any attempts to > write to unhandled MSRs? ... doing this would need to. Iirc I did add the write side of the handling in my patch just for symmetry. I'd prefer handling to be symmetric, but I can see why we may not want it to be so. Jan
On Tue, Mar 02, 2021 at 04:18:59PM +0100, Jan Beulich wrote: > On 02.03.2021 16:00, Roger Pau Monné wrote: > > On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: > >> On 01.03.2021 17:23, Roger Pau Monne wrote: > >>> RFC because there's still some debate as to how we should solve the > >>> MSR issue, this is one possible way, but IMO we need to make a > >>> decision soon-ish because of the release timeline. > >> > >> Generally I think it would be far better from a user pov if > >> things worked out of the box, at least in cases where it can be > >> made work. Arguably for Solaris this isn't going to be possible > >> (leaving aside the non-option of fully returning back to original > >> behavior), but for the old-Linux-PV case I still think my proposed > >> change is better in this regard. I realize Andrew said (on irc) > >> that this is making the behavior even weaker. I take a different > >> perspective: Considering a guest will install exception handlers > >> at some point, I continue to fail to see what good it will do to > >> try to inject a #GP(0) when we know the guest will die because of > >> not having a handler registered just yet. It is a kernel flaw, > >> yes, but they ended up living with it merely because of our odd > >> prior behavior, so we can't put all the blame on them. > > > > My concern with this would mostly be with newer guests, in the sense > > that people could come to rely on this behavior of not injecting a > > #GP if the handler is not setup, which I think we should try to avoid. > > > > One option would be to introduce a feature that could be used in the > > elfnotes to signal that the kernel doesn't require such workarounds > > for MSR #GP handling, maybe: > > > > /* > > * Signal that the kernel doesn't require suppressing the #GP on > > * unknown MSR accesses if the handler is not setup. New kernels > > * should always have this set. > > */ > > #define XENFEAT_msr_early_gp 16 > > > > We could try to backport this on the Linux kernel as far as we can > > that we know it's safe to do so. > > I too did consider something like this. While I'm not opposed, it > effectively requires well-behaved consumers who haven't been well- > behaved in the past. > > > If this is not acceptable then I guess your solution is fine. Arguably > > PV has it's own (weird) architecture, in which it might be safe to say > > that no #GP will be injected as a result of a MSR access unless the > > handler is setup. Ideally we should document this behaviour somewhere. > > > > Maybe we could add a rdmsr_safe to your path akin to what's done > > here? > > Probably. Would need trying out on the affected older version, > just like ... > > >>> --- a/docs/man/xl.cfg.5.pod.in > >>> +++ b/docs/man/xl.cfg.5.pod.in > >>> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > >>> > >>> =back > >>> > >>> +=item B<msr_legacy_handling=BOOLEAN> > >>> + > >>> +Select whether to use the legacy behaviour for accesses to MSRs not explicitly > >>> +handled by Xen instead of injecting a #GP to the guest. Such legacy access > >>> +mode will force Xen to replicate the behaviour from the hardware it's currently > >>> +running on in order to decide whether a #GP is injected to the guest. Note > >>> +that the guest is never allowed access to unhandled MSRs, this option only > >>> +changes whether a #GP might be injected or not. > >> > >> This description is appropriate for reads, but not for writes: > >> Whether a write would fault can only be known by trying a write, > >> yet for safety reasons we can't risk doing more than a read. An > >> option might be to make writes fault is the to be written value > >> differs from that which the probing read has returned (i.e. the > >> same condition which originally had caused a log message to appear > >> in 4.14 for PV guests). > > > > Read values for unhandled MSRs will always be 0 with the proposed > > code, so we would inject a #GP to the guest for any write attempt to > > unhandled MSRs with a value different than 0. > > > > Maybe we should just inject a #GP to the guest for any attempts to > > write to unhandled MSRs? > > ... doing this would need to. Iirc I did add the write side of the > handling in my patch just for symmetry. I'd prefer handling to be > symmetric, but I can see why we may not want it to be so. Kind of in the wrong context, but I will reply here because it's the last message related to the MSR stuff. Jan: would you be fine with modifying your path to not change the behaviour for writes (ie: always inject #GP to the guest for unhandled accesses), and then add a rdmsr_safe to the read path in order to decide whether to inject a #GP to the guest even if the #GP handler is not setup? I can modify the option introduced on this patch to always inject #GP on unhandled writes and only inject #GP on reads if the physical MSR access on the hardware also triggers a #GP. HVM guests with broken behavior will require having the option enabled in order to work, but I think that's likely OK as long term we expect all HVM guests to be well behaved. My main worry with this approach is that we end up requiring half of the common HVM guests OSes to have the 'legacy MSR handling' option enabled in order to work. I think it's unlikely for this to happen, as we are only aware of Solaris requiring it at the moment. It also raises the question whether we will allow any more exceptions to the MSR policy, like we did for Windows and OpenBSD in: http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ca88a43e660c75796656a544e54a648c60d26ef0 http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4175fd3ccd17face664036fa98e9329aa317f7b6 Or if we are just going to require those guests to enable the legacy MSR handling instead. Thanks, Roger.
On 03.03.2021 16:38, Roger Pau Monné wrote: > On Tue, Mar 02, 2021 at 04:18:59PM +0100, Jan Beulich wrote: >> On 02.03.2021 16:00, Roger Pau Monné wrote: >>> On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: >>>> On 01.03.2021 17:23, Roger Pau Monne wrote: >>>>> RFC because there's still some debate as to how we should solve the >>>>> MSR issue, this is one possible way, but IMO we need to make a >>>>> decision soon-ish because of the release timeline. >>>> >>>> Generally I think it would be far better from a user pov if >>>> things worked out of the box, at least in cases where it can be >>>> made work. Arguably for Solaris this isn't going to be possible >>>> (leaving aside the non-option of fully returning back to original >>>> behavior), but for the old-Linux-PV case I still think my proposed >>>> change is better in this regard. I realize Andrew said (on irc) >>>> that this is making the behavior even weaker. I take a different >>>> perspective: Considering a guest will install exception handlers >>>> at some point, I continue to fail to see what good it will do to >>>> try to inject a #GP(0) when we know the guest will die because of >>>> not having a handler registered just yet. It is a kernel flaw, >>>> yes, but they ended up living with it merely because of our odd >>>> prior behavior, so we can't put all the blame on them. >>> >>> My concern with this would mostly be with newer guests, in the sense >>> that people could come to rely on this behavior of not injecting a >>> #GP if the handler is not setup, which I think we should try to avoid. >>> >>> One option would be to introduce a feature that could be used in the >>> elfnotes to signal that the kernel doesn't require such workarounds >>> for MSR #GP handling, maybe: >>> >>> /* >>> * Signal that the kernel doesn't require suppressing the #GP on >>> * unknown MSR accesses if the handler is not setup. New kernels >>> * should always have this set. >>> */ >>> #define XENFEAT_msr_early_gp 16 >>> >>> We could try to backport this on the Linux kernel as far as we can >>> that we know it's safe to do so. >> >> I too did consider something like this. While I'm not opposed, it >> effectively requires well-behaved consumers who haven't been well- >> behaved in the past. >> >>> If this is not acceptable then I guess your solution is fine. Arguably >>> PV has it's own (weird) architecture, in which it might be safe to say >>> that no #GP will be injected as a result of a MSR access unless the >>> handler is setup. Ideally we should document this behaviour somewhere. >>> >>> Maybe we could add a rdmsr_safe to your path akin to what's done >>> here? >> >> Probably. Would need trying out on the affected older version, >> just like ... >> >>>>> --- a/docs/man/xl.cfg.5.pod.in >>>>> +++ b/docs/man/xl.cfg.5.pod.in >>>>> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. >>>>> >>>>> =back >>>>> >>>>> +=item B<msr_legacy_handling=BOOLEAN> >>>>> + >>>>> +Select whether to use the legacy behaviour for accesses to MSRs not explicitly >>>>> +handled by Xen instead of injecting a #GP to the guest. Such legacy access >>>>> +mode will force Xen to replicate the behaviour from the hardware it's currently >>>>> +running on in order to decide whether a #GP is injected to the guest. Note >>>>> +that the guest is never allowed access to unhandled MSRs, this option only >>>>> +changes whether a #GP might be injected or not. >>>> >>>> This description is appropriate for reads, but not for writes: >>>> Whether a write would fault can only be known by trying a write, >>>> yet for safety reasons we can't risk doing more than a read. An >>>> option might be to make writes fault is the to be written value >>>> differs from that which the probing read has returned (i.e. the >>>> same condition which originally had caused a log message to appear >>>> in 4.14 for PV guests). >>> >>> Read values for unhandled MSRs will always be 0 with the proposed >>> code, so we would inject a #GP to the guest for any write attempt to >>> unhandled MSRs with a value different than 0. >>> >>> Maybe we should just inject a #GP to the guest for any attempts to >>> write to unhandled MSRs? >> >> ... doing this would need to. Iirc I did add the write side of the >> handling in my patch just for symmetry. I'd prefer handling to be >> symmetric, but I can see why we may not want it to be so. > > Kind of in the wrong context, but I will reply here because it's the > last message related to the MSR stuff. > > Jan: would you be fine with modifying your path to not change the > behaviour for writes (ie: always inject #GP to the guest for unhandled > accesses), and then add a rdmsr_safe to the read path in order to > decide whether to inject a #GP to the guest even if the #GP handler is > not setup? I don't mind as long as it ends up making things work (I have no reason to believe it won't). I'll give that a try. > I can modify the option introduced on this patch to always inject #GP > on unhandled writes and only inject #GP on reads if the physical MSR > access on the hardware also triggers a #GP. HVM guests with broken > behavior will require having the option enabled in order to work, > but I think that's likely OK as long term we expect all HVM guests to > be well behaved. > > My main worry with this approach is that we end up requiring half of > the common HVM guests OSes to have the 'legacy MSR handling' option > enabled in order to work. I think it's unlikely for this to happen, as > we are only aware of Solaris requiring it at the moment. > > It also raises the question whether we will allow any more exceptions > to the MSR policy, like we did for Windows and OpenBSD in: > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ca88a43e660c75796656a544e54a648c60d26ef0 > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4175fd3ccd17face664036fa98e9329aa317f7b6 > > Or if we are just going to require those guests to enable the legacy > MSR handling instead. It is my understanding that Andrew's view is that adding such special cases is the only acceptable thing. As voiced before I don't share this view, as it means we're going to be entirely reactive to people reporting issues, when I think we should be pro-active as far as is reasonable. Independent of any pro-active measures there'll still be enough reasons for reactive changes - for example I assume Linux would now issue the log message from if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { if (c->x86 > 0x10 || (c->x86 == 0x10 && c->x86_model >= 0x2)) { u64 val; rdmsrl(MSR_K7_HWCR, val); if (!(val & BIT(24))) pr_warn(FW_BUG "TSC doesn't count with P0 frequency!\n"); } } since we surface a zero value right now (but I didn't verify this in practice yet). Jan
On Thu, Mar 04, 2021 at 09:48:25AM +0100, Jan Beulich wrote: > On 03.03.2021 16:38, Roger Pau Monné wrote: > > On Tue, Mar 02, 2021 at 04:18:59PM +0100, Jan Beulich wrote: > >> On 02.03.2021 16:00, Roger Pau Monné wrote: > >>> On Tue, Mar 02, 2021 at 12:16:12PM +0100, Jan Beulich wrote: > >>>> On 01.03.2021 17:23, Roger Pau Monne wrote: > >>>>> RFC because there's still some debate as to how we should solve the > >>>>> MSR issue, this is one possible way, but IMO we need to make a > >>>>> decision soon-ish because of the release timeline. > >>>> > >>>> Generally I think it would be far better from a user pov if > >>>> things worked out of the box, at least in cases where it can be > >>>> made work. Arguably for Solaris this isn't going to be possible > >>>> (leaving aside the non-option of fully returning back to original > >>>> behavior), but for the old-Linux-PV case I still think my proposed > >>>> change is better in this regard. I realize Andrew said (on irc) > >>>> that this is making the behavior even weaker. I take a different > >>>> perspective: Considering a guest will install exception handlers > >>>> at some point, I continue to fail to see what good it will do to > >>>> try to inject a #GP(0) when we know the guest will die because of > >>>> not having a handler registered just yet. It is a kernel flaw, > >>>> yes, but they ended up living with it merely because of our odd > >>>> prior behavior, so we can't put all the blame on them. > >>> > >>> My concern with this would mostly be with newer guests, in the sense > >>> that people could come to rely on this behavior of not injecting a > >>> #GP if the handler is not setup, which I think we should try to avoid. > >>> > >>> One option would be to introduce a feature that could be used in the > >>> elfnotes to signal that the kernel doesn't require such workarounds > >>> for MSR #GP handling, maybe: > >>> > >>> /* > >>> * Signal that the kernel doesn't require suppressing the #GP on > >>> * unknown MSR accesses if the handler is not setup. New kernels > >>> * should always have this set. > >>> */ > >>> #define XENFEAT_msr_early_gp 16 > >>> > >>> We could try to backport this on the Linux kernel as far as we can > >>> that we know it's safe to do so. > >> > >> I too did consider something like this. While I'm not opposed, it > >> effectively requires well-behaved consumers who haven't been well- > >> behaved in the past. > >> > >>> If this is not acceptable then I guess your solution is fine. Arguably > >>> PV has it's own (weird) architecture, in which it might be safe to say > >>> that no #GP will be injected as a result of a MSR access unless the > >>> handler is setup. Ideally we should document this behaviour somewhere. > >>> > >>> Maybe we could add a rdmsr_safe to your path akin to what's done > >>> here? > >> > >> Probably. Would need trying out on the affected older version, > >> just like ... > >> > >>>>> --- a/docs/man/xl.cfg.5.pod.in > >>>>> +++ b/docs/man/xl.cfg.5.pod.in > >>>>> @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. > >>>>> > >>>>> =back > >>>>> > >>>>> +=item B<msr_legacy_handling=BOOLEAN> > >>>>> + > >>>>> +Select whether to use the legacy behaviour for accesses to MSRs not explicitly > >>>>> +handled by Xen instead of injecting a #GP to the guest. Such legacy access > >>>>> +mode will force Xen to replicate the behaviour from the hardware it's currently > >>>>> +running on in order to decide whether a #GP is injected to the guest. Note > >>>>> +that the guest is never allowed access to unhandled MSRs, this option only > >>>>> +changes whether a #GP might be injected or not. > >>>> > >>>> This description is appropriate for reads, but not for writes: > >>>> Whether a write would fault can only be known by trying a write, > >>>> yet for safety reasons we can't risk doing more than a read. An > >>>> option might be to make writes fault is the to be written value > >>>> differs from that which the probing read has returned (i.e. the > >>>> same condition which originally had caused a log message to appear > >>>> in 4.14 for PV guests). > >>> > >>> Read values for unhandled MSRs will always be 0 with the proposed > >>> code, so we would inject a #GP to the guest for any write attempt to > >>> unhandled MSRs with a value different than 0. > >>> > >>> Maybe we should just inject a #GP to the guest for any attempts to > >>> write to unhandled MSRs? > >> > >> ... doing this would need to. Iirc I did add the write side of the > >> handling in my patch just for symmetry. I'd prefer handling to be > >> symmetric, but I can see why we may not want it to be so. > > > > Kind of in the wrong context, but I will reply here because it's the > > last message related to the MSR stuff. > > > > Jan: would you be fine with modifying your path to not change the > > behaviour for writes (ie: always inject #GP to the guest for unhandled > > accesses), and then add a rdmsr_safe to the read path in order to > > decide whether to inject a #GP to the guest even if the #GP handler is > > not setup? > > I don't mind as long as it ends up making things work (I have no > reason to believe it won't). I'll give that a try. Thanks. I think this behavior would be a fine solution for PV guests because: - It's unlikely we will see any new ports to x86 PV mode, and hence we mostly need to care about the existing ones which we already fixed in new versions, so this behavior is not to be propagated. - The Xen PV architecture is already different from the x86 architecture, and hence we can slightly bend the rules to our liking. - It's not feasible for us to figure out what MSRs older Linux versions tried to access without having a #GP handler setup. Not sure whether this also affects NetBSD. I would like to suggest to introduce a comment to document this behaviour for x86 PV in the public headers, but I'm afraid I cannot find a suitable location. > > I can modify the option introduced on this patch to always inject #GP > > on unhandled writes and only inject #GP on reads if the physical MSR > > access on the hardware also triggers a #GP. HVM guests with broken > > behavior will require having the option enabled in order to work, > > but I think that's likely OK as long term we expect all HVM guests to > > be well behaved. > > > > My main worry with this approach is that we end up requiring half of > > the common HVM guests OSes to have the 'legacy MSR handling' option > > enabled in order to work. I think it's unlikely for this to happen, as > > we are only aware of Solaris requiring it at the moment. > > > > It also raises the question whether we will allow any more exceptions > > to the MSR policy, like we did for Windows and OpenBSD in: > > > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ca88a43e660c75796656a544e54a648c60d26ef0 > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4175fd3ccd17face664036fa98e9329aa317f7b6 > > > > Or if we are just going to require those guests to enable the legacy > > MSR handling instead. > > It is my understanding that Andrew's view is that adding such special > cases is the only acceptable thing. As voiced before I don't share > this view, as it means we're going to be entirely reactive to people > reporting issues, when I think we should be pro-active as far as is > reasonable. Independent of any pro-active measures there'll still be > enough reasons for reactive changes - for example I assume Linux > would now issue the log message from > > if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { > > if (c->x86 > 0x10 || > (c->x86 == 0x10 && c->x86_model >= 0x2)) { > u64 val; > > rdmsrl(MSR_K7_HWCR, val); > if (!(val & BIT(24))) > pr_warn(FW_BUG "TSC doesn't count with P0 frequency!\n"); > } > } > > since we surface a zero value right now (but I didn't verify this in > practice yet). I think we inject a #GP to the guest if it tries to access MSR_K7_HWCR? As I don't see this MSR handled explicitly in svm_msr_read_intercept. So Linux would complain from the unchecked MSR access and the MSR value not having the bit set. This one seems like a fine candidate to implement in svm_msr_read_intercept, because Xen needs to return a specific value for this MSR. Regarding the global approach to fixing the fallout from the MSR policy change, I don't see why we couldn't do a mix between pro-active and reactive. I think we must have an option to fallback to something similar to the old policy for HVM guests so that users have a way to get their guests running after an update without requiring a code change. That doesn't mean we can't reactively add the missing MSRs as we go discovering them. I would even print a warning when using such fallback legacy MSR handling option that you need to report the issue to xen-devel because the option might be removed in future releases. Does the above seem like a sensible plan? Roger.
On 04.03.2021 11:05, Roger Pau Monné wrote: > On Thu, Mar 04, 2021 at 09:48:25AM +0100, Jan Beulich wrote: >> On 03.03.2021 16:38, Roger Pau Monné wrote: >>> It also raises the question whether we will allow any more exceptions >>> to the MSR policy, like we did for Windows and OpenBSD in: >>> >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ca88a43e660c75796656a544e54a648c60d26ef0 >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4175fd3ccd17face664036fa98e9329aa317f7b6 >>> >>> Or if we are just going to require those guests to enable the legacy >>> MSR handling instead. >> >> It is my understanding that Andrew's view is that adding such special >> cases is the only acceptable thing. As voiced before I don't share >> this view, as it means we're going to be entirely reactive to people >> reporting issues, when I think we should be pro-active as far as is >> reasonable. Independent of any pro-active measures there'll still be >> enough reasons for reactive changes - for example I assume Linux >> would now issue the log message from >> >> if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { >> >> if (c->x86 > 0x10 || >> (c->x86 == 0x10 && c->x86_model >= 0x2)) { >> u64 val; >> >> rdmsrl(MSR_K7_HWCR, val); >> if (!(val & BIT(24))) >> pr_warn(FW_BUG "TSC doesn't count with P0 frequency!\n"); >> } >> } >> >> since we surface a zero value right now (but I didn't verify this in >> practice yet). > > I think we inject a #GP to the guest if it tries to access > MSR_K7_HWCR? As I don't see this MSR handled explicitly in > svm_msr_read_intercept. So Linux would complain from the unchecked MSR > access and the MSR value not having the bit set. Right - my description was of the behavior with my workaround already in place. > This one seems like a fine candidate to implement in > svm_msr_read_intercept, because Xen needs to return a specific value > for this MSR. > > Regarding the global approach to fixing the fallout from the MSR > policy change, I don't see why we couldn't do a mix between pro-active > and reactive. > > I think we must have an option to fallback to something similar to the > old policy for HVM guests so that users have a way to get their guests > running after an update without requiring a code change. > > That doesn't mean we can't reactively add the missing MSRs as we go > discovering them. I would even print a warning when using such > fallback legacy MSR handling option that you need to report the issue > to xen-devel because the option might be removed in future releases. > > Does the above seem like a sensible plan? I think so, yes. I wonder what Andrew thinks, though. Jan
Jan Beulich writes ("Re: [PATCH RFC for-4.15] x86/msr: introduce an option for legacy MSR behavior selection"): > On 04.03.2021 11:05, Roger Pau Monné wrote: ... > > This one seems like a fine candidate to implement in > > svm_msr_read_intercept, because Xen needs to return a specific value > > for this MSR. > > > > Regarding the global approach to fixing the fallout from the MSR > > policy change, I don't see why we couldn't do a mix between pro-active > > and reactive. > > > > I think we must have an option to fallback to something similar to the > > old policy for HVM guests so that users have a way to get their guests > > running after an update without requiring a code change. > > > > That doesn't mean we can't reactively add the missing MSRs as we go > > discovering them. I would even print a warning when using such > > fallback legacy MSR handling option that you need to report the issue > > to xen-devel because the option might be removed in future releases. > > > > Does the above seem like a sensible plan? > > I think so, yes. I wonder what Andrew thinks, though. FTR I am on board with this plan. I would like to see quick progress on this issue as it seems like one of the major risks in the release. Thanks, Ian.
On 01/03/2021 19:33, Boris Ostrovsky wrote: > On 3/1/21 11:23 AM, Roger Pau Monne wrote: >> Introduce an option to allow selecting the legacy behavior for >> accesses to MSRs not explicitly handled. Since commit >> 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly >> handled by Xen result in the injection of a #GP to the guest. This is >> a behavior change since previously a #GP was only injected if >> accessing the MSR on the real hardware will also trigger a #GP. >> >> This seems to be problematic for some guests, so introduce an option >> to fallback to this legacy behavior. The main difference between what >> was previously done is that the hardware MSR value is not leaked to >> the guests on reads. >> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> --- >> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> >> --- >> Note that this option is not made available to dom0. I'm not sure >> whether it makes sense to do so, since anyone updating Xen to such >> newer version will also likely pair it with a newish kernel that >> doesn't require such workarounds. >> >> RFC because there's still some debate as to how we should solve the >> MSR issue, this is one possible way, but IMO we need to make a >> decision soon-ish because of the release timeline. >> >> Boris, could you please test with Solaris to see if this fixes the >> issue? > > Yes, it does. Thanks. Really? This doesn't stop #GP being raised for RAPL_POWER_LIMIT AFAICT. Or am I mistaken about how the bug manifested? ~Andrew
On 04/03/2021 14:02, Andrew Cooper wrote: > On 01/03/2021 19:33, Boris Ostrovsky wrote: >> On 3/1/21 11:23 AM, Roger Pau Monne wrote: >>> Introduce an option to allow selecting the legacy behavior for >>> accesses to MSRs not explicitly handled. Since commit >>> 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly >>> handled by Xen result in the injection of a #GP to the guest. This is >>> a behavior change since previously a #GP was only injected if >>> accessing the MSR on the real hardware will also trigger a #GP. >>> >>> This seems to be problematic for some guests, so introduce an option >>> to fallback to this legacy behavior. The main difference between what >>> was previously done is that the hardware MSR value is not leaked to >>> the guests on reads. >>> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> --- >>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> >>> --- >>> Note that this option is not made available to dom0. I'm not sure >>> whether it makes sense to do so, since anyone updating Xen to such >>> newer version will also likely pair it with a newish kernel that >>> doesn't require such workarounds. >>> >>> RFC because there's still some debate as to how we should solve the >>> MSR issue, this is one possible way, but IMO we need to make a >>> decision soon-ish because of the release timeline. >>> >>> Boris, could you please test with Solaris to see if this fixes the >>> issue? >> Yes, it does. Thanks. > Really? This doesn't stop #GP being raised for RAPL_POWER_LIMIT > AFAICT. Or am I mistaken about how the bug manifested? Actually never mind. I've figured out why. I have another bugfix to submit. ~Andrew
diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 040374dcd6..78dadcdfdd 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -2861,6 +2861,17 @@ No MCA capabilities in above list are enabled. =back +=item B<msr_legacy_handling=BOOLEAN> + +Select whether to use the legacy behaviour for accesses to MSRs not explicitly +handled by Xen instead of injecting a #GP to the guest. Such legacy access +mode will force Xen to replicate the behaviour from the hardware it's currently +running on in order to decide whether a #GP is injected to the guest. Note +that the guest is never allowed access to unhandled MSRs, this option only +changes whether a #GP might be injected or not. + +=back + =back =head1 SEE ALSO diff --git a/tools/include/libxl.h b/tools/include/libxl.h index a7b673e89d..3bf6aded97 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -495,6 +495,14 @@ */ #define LIBXL_HAVE_VMTRACE_BUF_KB 1 +/* + * LIBXL_HAVE_MSR_LEGACY_HANDLING indicates the toolstack has support for + * switching the MSR handling in the hypervisor to legacy mode, where #GP is + * only injected to guests for unhandled MSRs if accessing the MSR on the + * physical hardware also triggers a #GP. + */ +#define LIBXL_HAVE_MSR_LEGACY_HANDLING 1 + /* * libxl ABI compatibility * diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index 5b85a7419f..5bb12bc70d 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -644,6 +644,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), ("vuart", libxl_vuart_type), ])), + ("arch_x86", Struct(None, [("msr_legacy_handling", libxl_defbool), + ])), # Alternate p2m is not bound to any architecture or guest type, as it is # supported by x86 HVM and ARM support is planned. ("altp2m", libxl_altp2m_mode), diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c index 58187ed760..4b2b5d69a6 100644 --- a/tools/libs/light/libxl_x86.c +++ b/tools/libs/light/libxl_x86.c @@ -19,6 +19,10 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, abort(); } + config->arch.domain_flags = 0; + if (libxl_defbool_val(d_config->b_info.arch_x86.msr_legacy_handling)) + config->arch.domain_flags |= XEN_X86_LEGACY_MSR_HANDLING; + return 0; } @@ -809,6 +813,7 @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info) { libxl_defbool_setdefault(&b_info->acpi, true); + libxl_defbool_setdefault(&b_info->arch_x86.msr_legacy_handling, false); } int libxl__arch_passthrough_mode_setdefault(libxl__gc *gc, diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index 1893cfc086..fc79b6cc83 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2741,6 +2741,9 @@ skip_usbdev: xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat", &c_info->xend_suspend_evtchn_compat, 0); + xlu_cfg_get_defbool(config, "msr_legacy_handling", + &b_info->arch_x86.msr_legacy_handling, 0); + xlu_cfg_destroy(config); } diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 6c7ee25f3b..28805d50b8 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -852,6 +852,9 @@ int arch_domain_create(struct domain *d, domain_cpu_policy_changed(d); + d->arch.msr_legacy_handling = config->arch.domain_flags & + XEN_X86_LEGACY_MSR_HANDLING; + return 0; fail: diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b819897a4a..b535c5927a 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1795,6 +1795,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) const struct domain *d = v->domain; struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; const struct nestedsvm *nsvm = &vcpu_nestedsvm(v); + uint64_t tmp; switch ( msr ) { @@ -1965,6 +1966,11 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) break; default: + if ( d->arch.msr_legacy_handling && !rdmsr_safe(msr, tmp) ) + { + *msr_content = 0; + break; + } gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr); goto gpf; } @@ -2151,6 +2157,9 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) break; default: + if ( d->arch.msr_legacy_handling && !rdmsr_safe(msr, msr_content) ) + break; + gdprintk(XENLOG_WARNING, "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n", msr, msr_content); diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index faba95d057..7707ae8cbc 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3114,6 +3114,7 @@ static int is_last_branch_msr(u32 ecx) static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) { struct vcpu *curr = current; + uint64_t tmp; HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr); @@ -3195,6 +3196,12 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) break; } + if ( curr->domain->arch.msr_legacy_handling && !rdmsr_safe(msr, tmp) ) + { + *msr_content = 0; + break; + } + gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", msr); goto gp_fault; } @@ -3497,6 +3504,10 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) is_last_branch_msr(msr) ) break; + if ( v->domain->arch.msr_legacy_handling && + !rdmsr_safe(msr, msr_content) ) + break; + gdprintk(XENLOG_WARNING, "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n", msr, msr_content); diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index e5a22b9347..ecf98c4a05 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -875,6 +875,7 @@ static int read_msr(unsigned int reg, uint64_t *val, const struct domain *currd = curr->domain; const struct cpuid_policy *cp = currd->arch.cpuid; bool vpmu_msr = false; + uint64_t tmp; int ret; if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE ) @@ -986,6 +987,12 @@ static int read_msr(unsigned int reg, uint64_t *val, } /* fall through */ default: + if ( currd->arch.msr_legacy_handling && !rdmsr_safe(reg, tmp) ) + { + *val = 0; + return X86EMUL_OKAY; + } + gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg); break; @@ -1148,6 +1155,9 @@ static int write_msr(unsigned int reg, uint64_t val, } /* fall through */ default: + if ( currd->arch.msr_legacy_handling && !rdmsr_safe(reg, val) ) + return X86EMUL_OKAY; + gdprintk(XENLOG_WARNING, "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n", reg, val); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 3900d7b48b..213557fc2c 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -437,6 +437,9 @@ struct arch_domain /* Mem_access emulation control */ bool_t mem_access_emulate_each_rep; + /* Legacy handling of MSR accesses. */ + bool msr_legacy_handling; + /* Emulated devices enabled bitmap. */ uint32_t emulation_flags; } __cacheline_aligned; diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h index 629cb2ba40..0a6b40bb89 100644 --- a/xen/include/public/arch-x86/xen.h +++ b/xen/include/public/arch-x86/xen.h @@ -304,6 +304,9 @@ struct xen_arch_domainconfig { XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\ XEN_X86_EMU_VPCI) uint32_t emulation_flags; + +#define XEN_X86_LEGACY_MSR_HANDLING (1u << 0) + uint32_t domain_flags; }; /* Location of online VCPU bitmap. */
Introduce an option to allow selecting the legacy behavior for accesses to MSRs not explicitly handled. Since commit 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly handled by Xen result in the injection of a #GP to the guest. This is a behavior change since previously a #GP was only injected if accessing the MSR on the real hardware will also trigger a #GP. This seems to be problematic for some guests, so introduce an option to fallback to this legacy behavior. The main difference between what was previously done is that the hardware MSR value is not leaked to the guests on reads. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- Note that this option is not made available to dom0. I'm not sure whether it makes sense to do so, since anyone updating Xen to such newer version will also likely pair it with a newish kernel that doesn't require such workarounds. RFC because there's still some debate as to how we should solve the MSR issue, this is one possible way, but IMO we need to make a decision soon-ish because of the release timeline. Boris, could you please test with Solaris to see if this fixes the issue? --- docs/man/xl.cfg.5.pod.in | 11 +++++++++++ tools/include/libxl.h | 8 ++++++++ tools/libs/light/libxl_types.idl | 2 ++ tools/libs/light/libxl_x86.c | 5 +++++ tools/xl/xl_parse.c | 3 +++ xen/arch/x86/domain.c | 3 +++ xen/arch/x86/hvm/svm/svm.c | 9 +++++++++ xen/arch/x86/hvm/vmx/vmx.c | 11 +++++++++++ xen/arch/x86/pv/emul-priv-op.c | 10 ++++++++++ xen/include/asm-x86/domain.h | 3 +++ xen/include/public/arch-x86/xen.h | 3 +++ 11 files changed, 68 insertions(+)