diff mbox series

[XEN] intel/msr: Fix handling of MSR_RAPL_POWER_UNIT

Message ID 0ac778dbcc7ab383447abe672225ff77b0d4802e.1736793323.git.teddy.astie@vates.tech (mailing list archive)
State New
Headers show
Series [XEN] intel/msr: Fix handling of MSR_RAPL_POWER_UNIT | expand

Commit Message

Teddy Astie Jan. 13, 2025, 6:42 p.m. UTC
Solaris 11.4 tries to access this MSR on some Intel platforms without properly
setting up a proper #GP handler, which leads to a immediate crash.

Emulate the access of this MSR by giving it a legal value (all values set to
default, as defined by Intel SDM "RAPL Interfaces").

Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
---
Does it have a risk of negatively affecting other operating systems expecting
this MSR read to fail ?
---
 xen/arch/x86/include/asm/msr-index.h |  2 ++
 xen/arch/x86/msr.c                   | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

Comments

Roger Pau Monné Jan. 14, 2025, 9:32 a.m. UTC | #1
On Mon, Jan 13, 2025 at 06:42:44PM +0000, Teddy Astie wrote:
> Solaris 11.4 tries to access this MSR on some Intel platforms without properly
> setting up a proper #GP handler, which leads to a immediate crash.
> 
> Emulate the access of this MSR by giving it a legal value (all values set to
> default, as defined by Intel SDM "RAPL Interfaces").
> 
> Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')

Hm, 

> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> Does it have a risk of negatively affecting other operating systems expecting
> this MSR read to fail ?
> ---
>  xen/arch/x86/include/asm/msr-index.h |  2 ++
>  xen/arch/x86/msr.c                   | 16 ++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
> index 9cdb5b2625..2adcdf344f 100644
> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -144,6 +144,8 @@
>  #define MSR_RTIT_ADDR_A(n)                 (0x00000580 + (n) * 2)
>  #define MSR_RTIT_ADDR_B(n)                 (0x00000581 + (n) * 2)
>  
> +#define MSR_RAPL_POWER_UNIT                 0x00000606
> +
>  #define MSR_U_CET                           0x000006a0
>  #define MSR_S_CET                           0x000006a2
>  #define  CET_SHSTK_EN                       (_AC(1, ULL) <<  0)
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 289cf10b78..b14d42dacf 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -169,6 +169,22 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>          if ( likely(!is_cpufreq_controller(d)) || rdmsr_safe(msr, *val) == 0 )
>              break;
>          goto gp_fault;
> +    

Trailing spaces in the added newline.

> +        /*
> +         * Solaris 11.4 DomU tries to use read this MSR without setting up a
> +         * proper #GP handler leading to a crash. Emulate this MSR by giving a
> +         * legal value.
> +         */

The comment should be after (inside) the case statement IMO (but not
strong opinion.  Could you also raise a bug with Solaris and put a
link to the bug report here, so that we have a reference to it?

> +    case MSR_RAPL_POWER_UNIT:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) )

Has Centaur ever released a CPU with RAPL?

> +            goto gp_fault;
> +
> +        /*
> +         * Return a legal register content with all default values defined in
> +         * Intel Architecture Software Developer Manual 16.10.1 RAPL Interfaces
> +         */
> +        *val = 0x0000A1003;

The SPR Specification defines the default as 000A0E03h:

* SDM:

Energy Status Units (bits 12:8): Energy related information (in
Joules) is based on the multiplier, 1/2^ESU; where ESU is an unsigned
integer represented by bits 12:8. Default value is 10000b, indicating
energy status unit is in 15.3 micro-Joules increment.

* SPR:

Energy Units (ENERGY_UNIT):
Energy Units used for power control registers.
The actual unit value is calculated by 1 J / Power(2,ENERGY_UNIT).
The default value of 14 corresponds to Ux.14 number.

Note that KVM just returns all 0s [0], so we might consider doing the
same, as otherwise that could lead OSes to poke at further RAPL
related MSRs if the returned value from MSR_RAPL_POWER_UNIT looks
plausible.

[0] https://elixir.bootlin.com/linux/v6.12.6/source/arch/x86/kvm/x86.c#L4236

Thanks.
Andrew Cooper Jan. 14, 2025, 9:41 a.m. UTC | #2
On 13/01/2025 6:42 pm, Teddy Astie wrote:
> Solaris 11.4 tries

Is it only Solaris 11.4, or is the simply the one repro you had?

Have you reported a bug?

>  to access this MSR on some Intel platforms without properly
> setting up a proper #GP handler, which leads to a immediate crash.

Minor grammar note.  Either "without a proper #GP handler" or "without
properly setting up a #GP handler", but having two proper(ly)'s in there
is less than ideal.

> Emulate the access of this MSR by giving it a legal value (all values set to
> default, as defined by Intel SDM "RAPL Interfaces").
>
> Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> Does it have a risk of negatively affecting other operating systems expecting
> this MSR read to fail?

It's Complicated.

RAPL is a non-architectural feature (on Intel; AMD did it properly).  It
does not have a CPUID bit to announce the presence of the MSRs. 
Therefore OSes use a mixture of model numbers and {wr,rd}msr_safe() to
probe.

I expect this will change the behaviour of Linux.

~Andrew
Roger Pau Monné Jan. 14, 2025, 10:38 a.m. UTC | #3
On Tue, Jan 14, 2025 at 10:32:25AM +0100, Roger Pau Monné wrote:
> On Mon, Jan 13, 2025 at 06:42:44PM +0000, Teddy Astie wrote:
> > Solaris 11.4 tries to access this MSR on some Intel platforms without properly
> > setting up a proper #GP handler, which leads to a immediate crash.
> > 
> > Emulate the access of this MSR by giving it a legal value (all values set to
> > default, as defined by Intel SDM "RAPL Interfaces").
> > 
> > Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')

Nit: I think we usually use 12 hex character hashes, the above one is
11 characters long.

> 
> Hm, 

Seems like I've sent this too early.

I wanted to say I wasn't convinced this is a fix for the above, but I
can see how the change can be seen as a regression if Solaris booted
before that change in behavior, so I'm fine with leaving the "Fixes:" tag.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 9cdb5b2625..2adcdf344f 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -144,6 +144,8 @@ 
 #define MSR_RTIT_ADDR_A(n)                 (0x00000580 + (n) * 2)
 #define MSR_RTIT_ADDR_B(n)                 (0x00000581 + (n) * 2)
 
+#define MSR_RAPL_POWER_UNIT                 0x00000606
+
 #define MSR_U_CET                           0x000006a0
 #define MSR_S_CET                           0x000006a2
 #define  CET_SHSTK_EN                       (_AC(1, ULL) <<  0)
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 289cf10b78..b14d42dacf 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -169,6 +169,22 @@  int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
         if ( likely(!is_cpufreq_controller(d)) || rdmsr_safe(msr, *val) == 0 )
             break;
         goto gp_fault;
+    
+        /*
+         * Solaris 11.4 DomU tries to use read this MSR without setting up a
+         * proper #GP handler leading to a crash. Emulate this MSR by giving a
+         * legal value.
+         */
+    case MSR_RAPL_POWER_UNIT:
+        if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) )
+            goto gp_fault;
+
+        /*
+         * Return a legal register content with all default values defined in
+         * Intel Architecture Software Developer Manual 16.10.1 RAPL Interfaces
+         */
+        *val = 0x0000A1003;
+        break;
 
     case MSR_IA32_THERM_STATUS:
         if ( cp->x86_vendor != X86_VENDOR_INTEL )