diff mbox series

[v2,1/2,4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads

Message ID d794bbee-a5e5-6632-3d1f-acd8164b7171@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: guest MSR access handling tweaks | expand

Commit Message

Jan Beulich March 5, 2021, 9:50 a.m. UTC
Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
    messages (in debug builds). Don't alter WRMSR behavior.
---
I'm not convinced we can get away without also making the WRMSR path
somewhat more permissive again, e.g. tolerating attempts to set bits
which are already set. But of course this would require keeping in sync
for which MSRs we "fake" reads, as then a kernel attempt to set a bit
may also appear as an attempt to clear others (because of the zero value
that we gave it for the read).

Comments

Roger Pau Monné March 8, 2021, 8:56 a.m. UTC | #1
On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
> 
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>     messages (in debug builds). Don't alter WRMSR behavior.
> ---
> I'm not convinced we can get away without also making the WRMSR path
> somewhat more permissive again, e.g. tolerating attempts to set bits
> which are already set. But of course this would require keeping in sync
> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
> may also appear as an attempt to clear others (because of the zero value
> that we gave it for the read).

The above approach seems dangerous, as it could allow a guest to
figure out the value of the underlying MSR by probing whether values
trigger a #GP?

I think we want to do something similar to what we do on HVM in 4.14
and previous versions: ignore writes as long as the rdmsr to the
target MSR succeeds, regardless of the value.

> 
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>      struct vcpu *curr = current;
>      const struct domain *currd = curr->domain;
>      const struct cpuid_policy *cp = currd->arch.cpuid;
> -    bool vpmu_msr = false;
> +    bool vpmu_msr = false, warn = false;
>      int ret;
>  
>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>          if ( ret == X86EMUL_EXCEPTION )
>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>  
> -        return ret;
> +        goto done;
>      }
>  
>      switch ( reg )
> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>          }
>          /* fall through */
>      default:
> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> +        warn = true;
>          break;
>  
>      normal:
> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>          return X86EMUL_OKAY;
>      }
>  
> -    return X86EMUL_UNHANDLEABLE;
> + done:

Won't this handling be better placed in the 'default' switch case
above?

> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )

We didn't used to care about explicitly blocking the reserved MSR
range, do we really wan to do it now?

I'm not sure I see an issue with that, but given that we are trying to
bring back something similar to the previous behavior, I would avoid
adding new conditions.

Thanks, Roger.
Jan Beulich March 8, 2021, 9:33 a.m. UTC | #2
On 08.03.2021 09:56, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>>     messages (in debug builds). Don't alter WRMSR behavior.
>> ---
>> I'm not convinced we can get away without also making the WRMSR path
>> somewhat more permissive again, e.g. tolerating attempts to set bits
>> which are already set. But of course this would require keeping in sync
>> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
>> may also appear as an attempt to clear others (because of the zero value
>> that we gave it for the read).
> 
> The above approach seems dangerous, as it could allow a guest to
> figure out the value of the underlying MSR by probing whether values
> trigger a #GP?

Perhaps, yes. But what do you do? There's potentially a huge value
range to probe ...

> I think we want to do something similar to what we do on HVM in 4.14
> and previous versions: ignore writes as long as the rdmsr to the
> target MSR succeeds, regardless of the value.

Which, as said elsewhere, has its own downsides - writable MSRs don't
need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
MSRs.

>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>      struct vcpu *curr = current;
>>      const struct domain *currd = curr->domain;
>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>> -    bool vpmu_msr = false;
>> +    bool vpmu_msr = false, warn = false;
>>      int ret;
>>  
>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>          if ( ret == X86EMUL_EXCEPTION )
>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>  
>> -        return ret;
>> +        goto done;
>>      }
>>  
>>      switch ( reg )
>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>          }
>>          /* fall through */
>>      default:
>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>> +        warn = true;
>>          break;
>>  
>>      normal:
>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>          return X86EMUL_OKAY;
>>      }
>>  
>> -    return X86EMUL_UNHANDLEABLE;
>> + done:
> 
> Won't this handling be better placed in the 'default' switch case
> above?

No - see the "goto done" added near the top of the function.

>> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
>> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> 
> We didn't used to care about explicitly blocking the reserved MSR
> range, do we really wan to do it now?
> 
> I'm not sure I see an issue with that, but given that we are trying to
> bring back something similar to the previous behavior, I would avoid
> adding new conditions.

What I'm particularly trying to avoid here is to allow
information from an underlying hypervisor to "shine through",
even if it's only information as to whether a certain MSR is
readable. It should be solely our own selection which MSRs in
this range a guest is able to (appear to) read.

Plus of course by avoiding the rdmsr_safe() in this case we
also avoid the unnecessary (debug only) log message associated
with such attempted reads.

Jan
Roger Pau Monné March 8, 2021, 12:11 p.m. UTC | #3
On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> On 08.03.2021 09:56, Roger Pau Monné wrote:
> > On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> >> handler early enough to cover for example the rdmsrl_safe() of
> >> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> >> of MSR_K7_HWCR later in the same function). The respective change
> >> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> >> backported to 4.14, but no further - presumably since it wasn't really
> >> easy because of other dependencies.
> >>
> >> Therefore, to prevent our change in the handling of guest MSR accesses
> >> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> >> the raising of #GP on this paths conditional upon the guest having
> >> installed a handler, provided of course the MSR can be read in the first
> >> place (we would have raised #GP in that case even before). Producing
> >> zero for reads isn't necessarily correct and may trip code trying to
> >> detect presence of MSRs early, but since such detection logic won't work
> >> without a #GP handler anyway, this ought to be a fair workaround.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
> >>     messages (in debug builds). Don't alter WRMSR behavior.
> >> ---
> >> I'm not convinced we can get away without also making the WRMSR path
> >> somewhat more permissive again, e.g. tolerating attempts to set bits
> >> which are already set. But of course this would require keeping in sync
> >> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
> >> may also appear as an attempt to clear others (because of the zero value
> >> that we gave it for the read).
> > 
> > The above approach seems dangerous, as it could allow a guest to
> > figure out the value of the underlying MSR by probing whether values
> > trigger a #GP?
> 
> Perhaps, yes. But what do you do? There's potentially a huge value
> range to probe ...
> 
> > I think we want to do something similar to what we do on HVM in 4.14
> > and previous versions: ignore writes as long as the rdmsr to the
> > target MSR succeeds, regardless of the value.
> 
> Which, as said elsewhere, has its own downsides - writable MSRs don't
> need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
> MSRs.

Yes, but it's IMO the lesser of two evils, I think we should avoid any
kind of behavior that depends on the underlying MSR value, just to be
on the safe side.

> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>      struct vcpu *curr = current;
> >>      const struct domain *currd = curr->domain;
> >>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >> -    bool vpmu_msr = false;
> >> +    bool vpmu_msr = false, warn = false;
> >>      int ret;
> >>  
> >>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>          if ( ret == X86EMUL_EXCEPTION )
> >>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>  
> >> -        return ret;
> >> +        goto done;
> >>      }
> >>  
> >>      switch ( reg )
> >> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>          }
> >>          /* fall through */
> >>      default:
> >> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >> +        warn = true;
> >>          break;
> >>  
> >>      normal:
> >> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>          return X86EMUL_OKAY;
> >>      }
> >>  
> >> -    return X86EMUL_UNHANDLEABLE;
> >> + done:
> > 
> > Won't this handling be better placed in the 'default' switch case
> > above?
> 
> No - see the "goto done" added near the top of the function.

Yes, I'm not sure of that. If guest_rdmsr returns anything different
than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
and hence we shouldn't check whether the #GP handler is set or not.

This is not the behavior of older Xen versions, so I'm unsure whether
we should introduce a policy that's even less strict than the previous
one in regard to whether a #GP is injected or not.

I know injecting a #GP when the handler is not set is not helpful for
the guest, but we should limit the workaround to kind of restoring the
previous behavior for making buggy guests happy, not expanding it
anymore.

> >> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> >> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> > 
> > We didn't used to care about explicitly blocking the reserved MSR
> > range, do we really wan to do it now?
> > 
> > I'm not sure I see an issue with that, but given that we are trying to
> > bring back something similar to the previous behavior, I would avoid
> > adding new conditions.
> 
> What I'm particularly trying to avoid here is to allow
> information from an underlying hypervisor to "shine through",
> even if it's only information as to whether a certain MSR is
> readable. It should be solely our own selection which MSRs in
> this range a guest is able to (appear to) read.
> 
> Plus of course by avoiding the rdmsr_safe() in this case we
> also avoid the unnecessary (debug only) log message associated
> with such attempted reads.

OK, I think that's fine.

Thanks, Roger.
Jan Beulich March 8, 2021, 1:49 p.m. UTC | #4
On 08.03.2021 13:11, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>      struct vcpu *curr = current;
>>>>      const struct domain *currd = curr->domain;
>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>> -    bool vpmu_msr = false;
>>>> +    bool vpmu_msr = false, warn = false;
>>>>      int ret;
>>>>  
>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>  
>>>> -        return ret;
>>>> +        goto done;
>>>>      }
>>>>  
>>>>      switch ( reg )
>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>          }
>>>>          /* fall through */
>>>>      default:
>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>> +        warn = true;
>>>>          break;
>>>>  
>>>>      normal:
>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>          return X86EMUL_OKAY;
>>>>      }
>>>>  
>>>> -    return X86EMUL_UNHANDLEABLE;
>>>> + done:
>>>
>>> Won't this handling be better placed in the 'default' switch case
>>> above?
>>
>> No - see the "goto done" added near the top of the function.
> 
> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> and hence we shouldn't check whether the #GP handler is set or not.
> 
> This is not the behavior of older Xen versions, so I'm unsure whether
> we should introduce a policy that's even less strict than the previous
> one in regard to whether a #GP is injected or not.
> 
> I know injecting a #GP when the handler is not set is not helpful for
> the guest, but we should limit the workaround to kind of restoring the
> previous behavior for making buggy guests happy, not expanding it
> anymore.

Yet then we risk breaking guests with any subsequent addition to
guest_rdmsr() which, under whatever extra conditions, wants to
raise #GP.

Jan
Roger Pau Monné March 9, 2021, 10:17 a.m. UTC | #5
On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> On 08.03.2021 13:11, Roger Pau Monné wrote:
> > On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>      struct vcpu *curr = current;
> >>>>      const struct domain *currd = curr->domain;
> >>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>> -    bool vpmu_msr = false;
> >>>> +    bool vpmu_msr = false, warn = false;
> >>>>      int ret;
> >>>>  
> >>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>  
> >>>> -        return ret;
> >>>> +        goto done;
> >>>>      }
> >>>>  
> >>>>      switch ( reg )
> >>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>          }
> >>>>          /* fall through */
> >>>>      default:
> >>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>> +        warn = true;
> >>>>          break;
> >>>>  
> >>>>      normal:
> >>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>          return X86EMUL_OKAY;
> >>>>      }
> >>>>  
> >>>> -    return X86EMUL_UNHANDLEABLE;
> >>>> + done:
> >>>
> >>> Won't this handling be better placed in the 'default' switch case
> >>> above?
> >>
> >> No - see the "goto done" added near the top of the function.
> > 
> > Yes, I'm not sure of that. If guest_rdmsr returns anything different
> > than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> > and hence we shouldn't check whether the #GP handler is set or not.
> > 
> > This is not the behavior of older Xen versions, so I'm unsure whether
> > we should introduce a policy that's even less strict than the previous
> > one in regard to whether a #GP is injected or not.
> > 
> > I know injecting a #GP when the handler is not set is not helpful for
> > the guest, but we should limit the workaround to kind of restoring the
> > previous behavior for making buggy guests happy, not expanding it
> > anymore.
> 
> Yet then we risk breaking guests with any subsequent addition to
> guest_rdmsr() which, under whatever extra conditions, wants to
> raise #GP.

But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
preventing taking the default path in the {read/write}_msr PV
handlers.

If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
#GP handler is not set we might as well drop the rdmsr_safe check and
just don't try to inject any #GP at all from MSR accesses unless the
handler is setup?

I feel this is likely going too far. I agree we should attempt to
restore something compatible with the previous behavior for unhandled
MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
seems to go beyond that.

Thanks, Roger.
Jan Beulich March 9, 2021, 11:16 a.m. UTC | #6
On 09.03.2021 11:17, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
>> On 08.03.2021 13:11, Roger Pau Monné wrote:
>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>      struct vcpu *curr = current;
>>>>>>      const struct domain *currd = curr->domain;
>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>>>> -    bool vpmu_msr = false;
>>>>>> +    bool vpmu_msr = false, warn = false;
>>>>>>      int ret;
>>>>>>  
>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>>>  
>>>>>> -        return ret;
>>>>>> +        goto done;
>>>>>>      }
>>>>>>  
>>>>>>      switch ( reg )
>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>          }
>>>>>>          /* fall through */
>>>>>>      default:
>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>>>> +        warn = true;
>>>>>>          break;
>>>>>>  
>>>>>>      normal:
>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>>>          return X86EMUL_OKAY;
>>>>>>      }
>>>>>>  
>>>>>> -    return X86EMUL_UNHANDLEABLE;
>>>>>> + done:
>>>>>
>>>>> Won't this handling be better placed in the 'default' switch case
>>>>> above?
>>>>
>>>> No - see the "goto done" added near the top of the function.
>>>
>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
>>> and hence we shouldn't check whether the #GP handler is set or not.
>>>
>>> This is not the behavior of older Xen versions, so I'm unsure whether
>>> we should introduce a policy that's even less strict than the previous
>>> one in regard to whether a #GP is injected or not.
>>>
>>> I know injecting a #GP when the handler is not set is not helpful for
>>> the guest, but we should limit the workaround to kind of restoring the
>>> previous behavior for making buggy guests happy, not expanding it
>>> anymore.
>>
>> Yet then we risk breaking guests with any subsequent addition to
>> guest_rdmsr() which, under whatever extra conditions, wants to
>> raise #GP.
> 
> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> preventing taking the default path in the {read/write}_msr PV
> handlers.

Yes, but the impact so far and the impact going forward are different.

> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> #GP handler is not set we might as well drop the rdmsr_safe check and
> just don't try to inject any #GP at all from MSR accesses unless the
> handler is setup?

Well, that's what I had initially. You asked me to change to what I
have now.

> I feel this is likely going too far. I agree we should attempt to
> restore something compatible with the previous behavior for unhandled
> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> seems to go beyond that.

I understand this is a downside. Yet as said - the downside of _not_
doing so is that every further raising of #GP will risk breaking a
random guest kernel variant.

Jan
Roger Pau Monné March 9, 2021, 1:37 p.m. UTC | #7
On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
> On 09.03.2021 11:17, Roger Pau Monné wrote:
> > On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> >> On 08.03.2021 13:11, Roger Pau Monné wrote:
> >>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>      struct vcpu *curr = current;
> >>>>>>      const struct domain *currd = curr->domain;
> >>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>>>> -    bool vpmu_msr = false;
> >>>>>> +    bool vpmu_msr = false, warn = false;
> >>>>>>      int ret;
> >>>>>>  
> >>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>>>  
> >>>>>> -        return ret;
> >>>>>> +        goto done;
> >>>>>>      }
> >>>>>>  
> >>>>>>      switch ( reg )
> >>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          }
> >>>>>>          /* fall through */
> >>>>>>      default:
> >>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>>>> +        warn = true;
> >>>>>>          break;
> >>>>>>  
> >>>>>>      normal:
> >>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          return X86EMUL_OKAY;
> >>>>>>      }
> >>>>>>  
> >>>>>> -    return X86EMUL_UNHANDLEABLE;
> >>>>>> + done:
> >>>>>
> >>>>> Won't this handling be better placed in the 'default' switch case
> >>>>> above?
> >>>>
> >>>> No - see the "goto done" added near the top of the function.
> >>>
> >>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> >>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> >>> and hence we shouldn't check whether the #GP handler is set or not.
> >>>
> >>> This is not the behavior of older Xen versions, so I'm unsure whether
> >>> we should introduce a policy that's even less strict than the previous
> >>> one in regard to whether a #GP is injected or not.
> >>>
> >>> I know injecting a #GP when the handler is not set is not helpful for
> >>> the guest, but we should limit the workaround to kind of restoring the
> >>> previous behavior for making buggy guests happy, not expanding it
> >>> anymore.
> >>
> >> Yet then we risk breaking guests with any subsequent addition to
> >> guest_rdmsr() which, under whatever extra conditions, wants to
> >> raise #GP.
> > 
> > But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> > preventing taking the default path in the {read/write}_msr PV
> > handlers.
> 
> Yes, but the impact so far and the impact going forward are different.

OK, I assume this is because we plan to handle more MSRs in
guest_{rd/wr}msr?

In which case those newly added handlers are not likely to inject a
#GP?

> > If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> > #GP handler is not set we might as well drop the rdmsr_safe check and
> > just don't try to inject any #GP at all from MSR accesses unless the
> > handler is setup?
> 
> Well, that's what I had initially. You asked me to change to what I
> have now.
> 
> > I feel this is likely going too far. I agree we should attempt to
> > restore something compatible with the previous behavior for unhandled
> > MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> > seems to go beyond that.
> 
> I understand this is a downside. Yet as said - the downside of _not_
> doing so is that every further raising of #GP will risk breaking a
> random guest kernel variant.

Right. So given this awkward position Xen is in, we should maybe make
the lack of #GP injection as a result of an MSR access when no handler
is set formally part of the ABI and written down somewhere?

It's not ideal, but at the end of day PV is 'our' own architecture,
and given that this workaround will be enabled by default, and that we
won't be able to turn it off we should have it written down as part of
the ABI.

If you agree with this I'm fine with not injecting a #GP at all unless
the handler is set for PV, like you proposed in your first patch. IMO
it's not ideal, but it's better if it's a consistent behavior and
clearly written down in the public headers (likely next to the
hypercall used to setup the #GP handler).

I know this can be seen as broken behavior from an x86 perspective,
but again PV is already different from x86.

Thanks, Roger.
Jan Beulich March 9, 2021, 2:50 p.m. UTC | #8
On 09.03.2021 14:37, Roger Pau Monné wrote:
> On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
>> On 09.03.2021 11:17, Roger Pau Monné wrote:
>>> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
>>>> On 08.03.2021 13:11, Roger Pau Monné wrote:
>>>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>>>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>      struct vcpu *curr = current;
>>>>>>>>      const struct domain *currd = curr->domain;
>>>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>>>>>> -    bool vpmu_msr = false;
>>>>>>>> +    bool vpmu_msr = false, warn = false;
>>>>>>>>      int ret;
>>>>>>>>  
>>>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>>>>>  
>>>>>>>> -        return ret;
>>>>>>>> +        goto done;
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      switch ( reg )
>>>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          }
>>>>>>>>          /* fall through */
>>>>>>>>      default:
>>>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>>>>>> +        warn = true;
>>>>>>>>          break;
>>>>>>>>  
>>>>>>>>      normal:
>>>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          return X86EMUL_OKAY;
>>>>>>>>      }
>>>>>>>>  
>>>>>>>> -    return X86EMUL_UNHANDLEABLE;
>>>>>>>> + done:
>>>>>>>
>>>>>>> Won't this handling be better placed in the 'default' switch case
>>>>>>> above?
>>>>>>
>>>>>> No - see the "goto done" added near the top of the function.
>>>>>
>>>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
>>>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
>>>>> and hence we shouldn't check whether the #GP handler is set or not.
>>>>>
>>>>> This is not the behavior of older Xen versions, so I'm unsure whether
>>>>> we should introduce a policy that's even less strict than the previous
>>>>> one in regard to whether a #GP is injected or not.
>>>>>
>>>>> I know injecting a #GP when the handler is not set is not helpful for
>>>>> the guest, but we should limit the workaround to kind of restoring the
>>>>> previous behavior for making buggy guests happy, not expanding it
>>>>> anymore.
>>>>
>>>> Yet then we risk breaking guests with any subsequent addition to
>>>> guest_rdmsr() which, under whatever extra conditions, wants to
>>>> raise #GP.
>>>
>>> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
>>> preventing taking the default path in the {read/write}_msr PV
>>> handlers.
>>
>> Yes, but the impact so far and the impact going forward are different.
> 
> OK, I assume this is because we plan to handle more MSRs in
> guest_{rd/wr}msr?

Yes.

> In which case those newly added handlers are not likely to inject a
> #GP?

Kind of the opposite - because it's not impossible that some
addition there may want to raise #GP.

>>> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
>>> #GP handler is not set we might as well drop the rdmsr_safe check and
>>> just don't try to inject any #GP at all from MSR accesses unless the
>>> handler is setup?
>>
>> Well, that's what I had initially. You asked me to change to what I
>> have now.
>>
>>> I feel this is likely going too far. I agree we should attempt to
>>> restore something compatible with the previous behavior for unhandled
>>> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
>>> seems to go beyond that.
>>
>> I understand this is a downside. Yet as said - the downside of _not_
>> doing so is that every further raising of #GP will risk breaking a
>> random guest kernel variant.
> 
> Right. So given this awkward position Xen is in, we should maybe make
> the lack of #GP injection as a result of an MSR access when no handler
> is set formally part of the ABI and written down somewhere?
> 
> It's not ideal, but at the end of day PV is 'our' own architecture,
> and given that this workaround will be enabled by default, and that we
> won't be able to turn it off we should have it written down as part of
> the ABI.
> 
> If you agree with this I'm fine with not injecting a #GP at all unless
> the handler is set for PV, like you proposed in your first patch. IMO
> it's not ideal, but it's better if it's a consistent behavior and
> clearly written down in the public headers (likely next to the
> hypercall used to setup the #GP handler).
> 
> I know this can be seen as broken behavior from an x86 perspective,
> but again PV is already different from x86.

I'm certainly not opposed to spelling this out somewhere; iirc you
said the other day that you couldn't spot a good place. I can't think
of a good place either. Furthermore before we spell out anything we
(which specifically includes Andrew) need to settle on the precise
behavior we want. I did suggest earlier that I could see us tighten
the condition, and there are many possible variations. For example we
could record whether a #GP handler was ever installed, so we wouldn't
return back to the relaxed behavior in case a guest zapped its handler
again. But for behavior like this the immediate question is going to
be what effect migration (or saving/restoring) of the guest ought to
have.

Jan
Roger Pau Monné March 9, 2021, 3:19 p.m. UTC | #9
On Tue, Mar 09, 2021 at 03:50:59PM +0100, Jan Beulich wrote:
> On 09.03.2021 14:37, Roger Pau Monné wrote:
> > On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
> >> On 09.03.2021 11:17, Roger Pau Monné wrote:
> >>> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> >>>> On 08.03.2021 13:11, Roger Pau Monné wrote:
> >>>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >>>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>      struct vcpu *curr = current;
> >>>>>>>>      const struct domain *currd = curr->domain;
> >>>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>>>>>> -    bool vpmu_msr = false;
> >>>>>>>> +    bool vpmu_msr = false, warn = false;
> >>>>>>>>      int ret;
> >>>>>>>>  
> >>>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>>>>>  
> >>>>>>>> -        return ret;
> >>>>>>>> +        goto done;
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>>      switch ( reg )
> >>>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          }
> >>>>>>>>          /* fall through */
> >>>>>>>>      default:
> >>>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>>>>>> +        warn = true;
> >>>>>>>>          break;
> >>>>>>>>  
> >>>>>>>>      normal:
> >>>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          return X86EMUL_OKAY;
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>> -    return X86EMUL_UNHANDLEABLE;
> >>>>>>>> + done:
> >>>>>>>
> >>>>>>> Won't this handling be better placed in the 'default' switch case
> >>>>>>> above?
> >>>>>>
> >>>>>> No - see the "goto done" added near the top of the function.
> >>>>>
> >>>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> >>>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> >>>>> and hence we shouldn't check whether the #GP handler is set or not.
> >>>>>
> >>>>> This is not the behavior of older Xen versions, so I'm unsure whether
> >>>>> we should introduce a policy that's even less strict than the previous
> >>>>> one in regard to whether a #GP is injected or not.
> >>>>>
> >>>>> I know injecting a #GP when the handler is not set is not helpful for
> >>>>> the guest, but we should limit the workaround to kind of restoring the
> >>>>> previous behavior for making buggy guests happy, not expanding it
> >>>>> anymore.
> >>>>
> >>>> Yet then we risk breaking guests with any subsequent addition to
> >>>> guest_rdmsr() which, under whatever extra conditions, wants to
> >>>> raise #GP.
> >>>
> >>> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> >>> preventing taking the default path in the {read/write}_msr PV
> >>> handlers.
> >>
> >> Yes, but the impact so far and the impact going forward are different.
> > 
> > OK, I assume this is because we plan to handle more MSRs in
> > guest_{rd/wr}msr?
> 
> Yes.
> 
> > In which case those newly added handlers are not likely to inject a
> > #GP?
> 
> Kind of the opposite - because it's not impossible that some
> addition there may want to raise #GP.
> 
> >>> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> >>> #GP handler is not set we might as well drop the rdmsr_safe check and
> >>> just don't try to inject any #GP at all from MSR accesses unless the
> >>> handler is setup?
> >>
> >> Well, that's what I had initially. You asked me to change to what I
> >> have now.
> >>
> >>> I feel this is likely going too far. I agree we should attempt to
> >>> restore something compatible with the previous behavior for unhandled
> >>> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> >>> seems to go beyond that.
> >>
> >> I understand this is a downside. Yet as said - the downside of _not_
> >> doing so is that every further raising of #GP will risk breaking a
> >> random guest kernel variant.
> > 
> > Right. So given this awkward position Xen is in, we should maybe make
> > the lack of #GP injection as a result of an MSR access when no handler
> > is set formally part of the ABI and written down somewhere?
> > 
> > It's not ideal, but at the end of day PV is 'our' own architecture,
> > and given that this workaround will be enabled by default, and that we
> > won't be able to turn it off we should have it written down as part of
> > the ABI.
> > 
> > If you agree with this I'm fine with not injecting a #GP at all unless
> > the handler is set for PV, like you proposed in your first patch. IMO
> > it's not ideal, but it's better if it's a consistent behavior and
> > clearly written down in the public headers (likely next to the
> > hypercall used to setup the #GP handler).
> > 
> > I know this can be seen as broken behavior from an x86 perspective,
> > but again PV is already different from x86.
> 
> I'm certainly not opposed to spelling this out somewhere; iirc you
> said the other day that you couldn't spot a good place. I can't think
> of a good place either.

After looking some more, I think placing such comment next to
HYPERVISOR_set_trap_table (in arch-x86/xen.h) would be fine.

> Furthermore before we spell out anything we
> (which specifically includes Andrew) need to settle on the precise
> behavior we want. I did suggest earlier that I could see us tighten
> the condition, and there are many possible variations. For example we
> could record whether a #GP handler was ever installed, so we wouldn't
> return back to the relaxed behavior in case a guest zapped its handler
> again. But for behavior like this the immediate question is going to
> be what effect migration (or saving/restoring) of the guest ought to
> have.

Replying to the save/restore part: this is covered by my patch. Any
restore (or incoming live migration) from a source that doesn't have
msr_relaxed support will get that option enabled by default, so that
guests migrated from previous Xen versions don't see a change in MSR
access behavior. That applies to both PV and HVM guests (unless I have
messed things up in my patch).

Thanks, Roger.
Jan Beulich March 9, 2021, 4:07 p.m. UTC | #10
On 09.03.2021 16:19, Roger Pau Monné wrote:
> On Tue, Mar 09, 2021 at 03:50:59PM +0100, Jan Beulich wrote:
>> On 09.03.2021 14:37, Roger Pau Monné wrote:
>>> Right. So given this awkward position Xen is in, we should maybe make
>>> the lack of #GP injection as a result of an MSR access when no handler
>>> is set formally part of the ABI and written down somewhere?
>>>
>>> It's not ideal, but at the end of day PV is 'our' own architecture,
>>> and given that this workaround will be enabled by default, and that we
>>> won't be able to turn it off we should have it written down as part of
>>> the ABI.
>>>
>>> If you agree with this I'm fine with not injecting a #GP at all unless
>>> the handler is set for PV, like you proposed in your first patch. IMO
>>> it's not ideal, but it's better if it's a consistent behavior and
>>> clearly written down in the public headers (likely next to the
>>> hypercall used to setup the #GP handler).
>>>
>>> I know this can be seen as broken behavior from an x86 perspective,
>>> but again PV is already different from x86.
>>
>> I'm certainly not opposed to spelling this out somewhere; iirc you
>> said the other day that you couldn't spot a good place. I can't think
>> of a good place either.
> 
> After looking some more, I think placing such comment next to
> HYPERVISOR_set_trap_table (in arch-x86/xen.h) would be fine.
> 
>> Furthermore before we spell out anything we
>> (which specifically includes Andrew) need to settle on the precise
>> behavior we want. I did suggest earlier that I could see us tighten
>> the condition, and there are many possible variations. For example we
>> could record whether a #GP handler was ever installed, so we wouldn't
>> return back to the relaxed behavior in case a guest zapped its handler
>> again. But for behavior like this the immediate question is going to
>> be what effect migration (or saving/restoring) of the guest ought to
>> have.
> 
> Replying to the save/restore part: this is covered by my patch. Any
> restore (or incoming live migration) from a source that doesn't have
> msr_relaxed support will get that option enabled by default, so that
> guests migrated from previous Xen versions don't see a change in MSR
> access behavior. That applies to both PV and HVM guests (unless I have
> messed things up in my patch).

Well, yes, that's for your changes. But here the question is about
mine (and remember we didn't settle on the precise condition(s) yet,
so the migration aspect may not be relevant in the end).

Jan
diff mbox series

Patch

--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -874,7 +874,7 @@  static int read_msr(unsigned int reg, ui
     struct vcpu *curr = current;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
-    bool vpmu_msr = false;
+    bool vpmu_msr = false, warn = false;
     int ret;
 
     if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
@@ -882,7 +882,7 @@  static int read_msr(unsigned int reg, ui
         if ( ret == X86EMUL_EXCEPTION )
             x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
 
-        return ret;
+        goto done;
     }
 
     switch ( reg )
@@ -986,7 +986,7 @@  static int read_msr(unsigned int reg, ui
         }
         /* fall through */
     default:
-        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+        warn = true;
         break;
 
     normal:
@@ -995,7 +995,19 @@  static int read_msr(unsigned int reg, ui
         return X86EMUL_OKAY;
     }
 
-    return X86EMUL_UNHANDLEABLE;
+ done:
+    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
+         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
+    {
+        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
+        *val = 0;
+        x86_emul_reset_event(ctxt);
+        ret = X86EMUL_OKAY;
+    }
+    else if ( warn )
+        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+
+    return ret;
 }
 
 static int write_msr(unsigned int reg, uint64_t val,