diff mbox series

[v2,2/2,4.15] x86/AMD: expose HWCR.TscFreqSel to guests

Message ID c91b190a-aaa1-d3b8-10eb-d8da7ad1f834@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: guest MSR access handling tweaks | expand

Commit Message

Jan Beulich March 5, 2021, 9:50 a.m. UTC
Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.
---
There are likely more bits worthwhile to expose, but for about every one
of them there would be the risk of a lengthy discussion, as there are
clear downsides to exposing such information, the more that it would be
tbd whether the hardware values should be surfaced, and if so what
should happen when the guest gets migrated.

The main risk with making the read not fault here is that guests might
imply they can also write this MSR then.

Comments

Roger Pau Monné March 8, 2021, 11:37 a.m. UTC | #1
On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
> Linux has been warning ("firmware bug") about this bit being clear for a
> long time. While writable in older hardware it has been readonly on more
> than just most recent hardware. For simplicitly report it always set (if
> anything we may want to log the issue ourselves if it turns out to be
> clear on older hardware).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

One question below.

> ---
> v2: New.
> ---
> There are likely more bits worthwhile to expose, but for about every one
> of them there would be the risk of a lengthy discussion, as there are
> clear downsides to exposing such information, the more that it would be
> tbd whether the hardware values should be surfaced, and if so what
> should happen when the guest gets migrated.
> 
> The main risk with making the read not fault here is that guests might
> imply they can also write this MSR then.
> 
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>          *val = msrs->tsc_aux;
>          break;
>  
> +    case MSR_K8_HWCR:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +            goto gp_fault;
> +        *val = K8_HWCR_TSC_FREQ_SEL;

I've been only able to find information about this MSR up to family
10h, but I think in theory Xen might also run on family 0Fh, do you
know if the MSR is present there, and the bit has the same meaning?

> +        break;
> +
>      case MSR_AMD64_DE_CFG:
>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>              goto gp_fault;
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -287,6 +287,8 @@
>  
>  #define MSR_K7_HWCR			0xc0010015

We could likely drop the K7 define here, as Xen won't be able to run
on K7 hardware anymore AFAICT.

Thanks, Roger.
Jan Beulich March 8, 2021, 11:47 a.m. UTC | #2
On 08.03.2021 12:37, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
>> Linux has been warning ("firmware bug") about this bit being clear for a
>> long time. While writable in older hardware it has been readonly on more
>> than just most recent hardware. For simplicitly report it always set (if
>> anything we may want to log the issue ourselves if it turns out to be
>> clear on older hardware).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> One question below.
> 
>> ---
>> v2: New.
>> ---
>> There are likely more bits worthwhile to expose, but for about every one
>> of them there would be the risk of a lengthy discussion, as there are
>> clear downsides to exposing such information, the more that it would be
>> tbd whether the hardware values should be surfaced, and if so what
>> should happen when the guest gets migrated.
>>
>> The main risk with making the read not fault here is that guests might
>> imply they can also write this MSR then.
>>
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>          *val = msrs->tsc_aux;
>>          break;
>>  
>> +    case MSR_K8_HWCR:
>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +            goto gp_fault;
>> +        *val = K8_HWCR_TSC_FREQ_SEL;
> 
> I've been only able to find information about this MSR up to family
> 10h, but I think in theory Xen might also run on family 0Fh, do you
> know if the MSR is present there, and the bit has the same meaning?

From its name (and its K7 alternative name) it's clear the register
had been there at that point. And indeed the bit has a different
meaning there (its the bottom bit of a 6-bit START_FID field if the
BKDG I'm looking at can be trusted. Since I don't think it matters
much whether we expose a value of 0x00 or a value of 0x01 there,
and since we likely don't want to make #GP raising dependent upon
family when we don't _really_ need to, I would want to propose that
the value used is good enough uniformly.

>> --- a/xen/include/asm-x86/msr-index.h
>> +++ b/xen/include/asm-x86/msr-index.h
>> @@ -287,6 +287,8 @@
>>  
>>  #define MSR_K7_HWCR			0xc0010015
> 
> We could likely drop the K7 define here, as Xen won't be able to run
> on K7 hardware anymore AFAICT.

Indeed, but not at this point in time.

Jan
Roger Pau Monné March 8, 2021, 12:29 p.m. UTC | #3
On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote:
> On 08.03.2021 12:37, Roger Pau Monné wrote:
> > On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
> >> Linux has been warning ("firmware bug") about this bit being clear for a
> >> long time. While writable in older hardware it has been readonly on more
> >> than just most recent hardware. For simplicitly report it always set (if
> >> anything we may want to log the issue ourselves if it turns out to be
> >> clear on older hardware).
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> > One question below.
> > 
> >> ---
> >> v2: New.
> >> ---
> >> There are likely more bits worthwhile to expose, but for about every one
> >> of them there would be the risk of a lengthy discussion, as there are
> >> clear downsides to exposing such information, the more that it would be
> >> tbd whether the hardware values should be surfaced, and if so what
> >> should happen when the guest gets migrated.
> >>
> >> The main risk with making the read not fault here is that guests might
> >> imply they can also write this MSR then.
> >>
> >> --- a/xen/arch/x86/msr.c
> >> +++ b/xen/arch/x86/msr.c
> >> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
> >>          *val = msrs->tsc_aux;
> >>          break;
> >>  
> >> +    case MSR_K8_HWCR:
> >> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> >> +            goto gp_fault;
> >> +        *val = K8_HWCR_TSC_FREQ_SEL;
> > 
> > I've been only able to find information about this MSR up to family
> > 10h, but I think in theory Xen might also run on family 0Fh, do you
> > know if the MSR is present there, and the bit has the same meaning?
> 
> From its name (and its K7 alternative name) it's clear the register
> had been there at that point. And indeed the bit has a different
> meaning there (its the bottom bit of a 6-bit START_FID field if the
> BKDG I'm looking at can be trusted.

OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I
can find is for Family 10h [0].

> Since I don't think it matters
> much whether we expose a value of 0x00 or a value of 0x01 there,
> and since we likely don't want to make #GP raising dependent upon
> family when we don't _really_ need to, I would want to propose that
> the value used is good enough uniformly.

I would be fine with setting it to 0 if Fam < 10h if you think that's
acceptable. I think the chances of someone running Xen >= 4.15 on such
old hardware are quite dim.

Thanks, Roger.

[0] https://developer.amd.com/resources/developer-guides-manuals/
Andrew Cooper March 8, 2021, 12:41 p.m. UTC | #4
On 05/03/2021 09:50, Jan Beulich wrote:
> Linux has been warning ("firmware bug") about this bit being clear for a
> long time. While writable in older hardware it has been readonly on more
> than just most recent hardware. For simplicitly report it always set (if
> anything we may want to log the issue ourselves if it turns out to be
> clear on older hardware).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I realise Linux is complaining, but simply setting the bit isn't a fix.

This needs corresponding updates in the ACPI tables, as well as Pstate
MSRs, or Linux will derive a false relationship between the TSC rate and
wallclock.

~Andrew
Roger Pau Monné March 8, 2021, 1:23 p.m. UTC | #5
On Mon, Mar 08, 2021 at 12:41:26PM +0000, Andrew Cooper wrote:
> On 05/03/2021 09:50, Jan Beulich wrote:
> > Linux has been warning ("firmware bug") about this bit being clear for a
> > long time. While writable in older hardware it has been readonly on more
> > than just most recent hardware. For simplicitly report it always set (if
> > anything we may want to log the issue ourselves if it turns out to be
> > clear on older hardware).
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I realise Linux is complaining, but simply setting the bit isn't a fix.
> 
> This needs corresponding updates in the ACPI tables, as well as Pstate
> MSRs, or Linux will derive a false relationship between the TSC rate and
> wallclock.

Is there any description of those relations?

I don't seem to find any other MSR referencing the TscFreqSel bit in
HWCR on the AMD Open-Source Register Reference, but I might be looking
at the wrong place.

Thanks, Roger.
Jan Beulich March 8, 2021, 1:24 p.m. UTC | #6
On 08.03.2021 13:41, Andrew Cooper wrote:
> On 05/03/2021 09:50, Jan Beulich wrote:
>> Linux has been warning ("firmware bug") about this bit being clear for a
>> long time. While writable in older hardware it has been readonly on more
>> than just most recent hardware. For simplicitly report it always set (if
>> anything we may want to log the issue ourselves if it turns out to be
>> clear on older hardware).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I realise Linux is complaining, but simply setting the bit isn't a fix.
> 
> This needs corresponding updates in the ACPI tables, as well as Pstate
> MSRs, or Linux will derive a false relationship between the TSC rate and
> wallclock.

I guess I don't follow: AMD's doc is very clear: BIOSes ought to set the
bit. It not being set is more likely a mistake than an indication of
other pieces (MSRs, ACPI tables) reflecting this unintended state. Plus
isn't what you say true also if Linux sees the bit wrongly clear (which
would be the case prior to this patch)? Are you suggesting we should
revert behavior here all the way to letting the hardware bit shine
through again (for Dom0; for DomU neither other MSRs nor ACPI tables
are possibly aware of this bit's state)?

Jan
Jan Beulich March 8, 2021, 1:42 p.m. UTC | #7
On 08.03.2021 13:29, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote:
>> On 08.03.2021 12:37, Roger Pau Monné wrote:
>>> On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/msr.c
>>>> +++ b/xen/arch/x86/msr.c
>>>> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>>>          *val = msrs->tsc_aux;
>>>>          break;
>>>>  
>>>> +    case MSR_K8_HWCR:
>>>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>>>> +            goto gp_fault;
>>>> +        *val = K8_HWCR_TSC_FREQ_SEL;
>>>
>>> I've been only able to find information about this MSR up to family
>>> 10h, but I think in theory Xen might also run on family 0Fh, do you
>>> know if the MSR is present there, and the bit has the same meaning?
>>
>> From its name (and its K7 alternative name) it's clear the register
>> had been there at that point. And indeed the bit has a different
>> meaning there (its the bottom bit of a 6-bit START_FID field if the
>> BKDG I'm looking at can be trusted.
> 
> OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I
> can find is for Family 10h [0].
> 
>> Since I don't think it matters
>> much whether we expose a value of 0x00 or a value of 0x01 there,
>> and since we likely don't want to make #GP raising dependent upon
>> family when we don't _really_ need to, I would want to propose that
>> the value used is good enough uniformly.
> 
> I would be fine with setting it to 0 if Fam < 10h if you think that's
> acceptable. I think the chances of someone running Xen >= 4.15 on such
> old hardware are quite dim.

Would you mind explaining how returning 0 in this case would be
better? No hard-coded value will ever be guaranteed to reflect the
truth. See my reply to Andrew - if anything we'd need to let the
hardware field shine through, and in _that_ case I of course I
agree that we then should treat Fam0F specially.

I will admit though that as per the BKDG I'm looking at only even
values are defined for the field. Reporting 1 here therefore may
do good (keep OSes from trying to use any of this P-state stuff)
or bad (confuse OSes).

Jan
diff mbox series

Patch

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -315,6 +315,12 @@  int guest_rdmsr(struct vcpu *v, uint32_t
         *val = msrs->tsc_aux;
         break;
 
+    case MSR_K8_HWCR:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+        *val = K8_HWCR_TSC_FREQ_SEL;
+        break;
+
     case MSR_AMD64_DE_CFG:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -287,6 +287,8 @@ 
 
 #define MSR_K7_HWCR			0xc0010015
 #define MSR_K8_HWCR			0xc0010015
+#define K8_HWCR_TSC_FREQ_SEL		(1ULL << 24)
+
 #define MSR_K7_FID_VID_CTL		0xc0010041
 #define MSR_K7_FID_VID_STATUS		0xc0010042
 #define MSR_K8_PSTATE_LIMIT		0xc0010061