diff mbox series

[RFC] Avoid dom0/HVM performance penalty from MSR access tightening

Message ID 949b4776e23e4607776685a7e2705b9e77f5b717.camel@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] Avoid dom0/HVM performance penalty from MSR access tightening | expand

Commit Message

Alex Olson Feb. 10, 2022, 5:27 p.m. UTC
I'm seeing strange performance issues under Xen on a Supermicro server with a Xeon D-1541 CPU caused by an MSR-related commit.

Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to unknown MSRs'
surprisingly introduces a severe performance penality where dom0 has about 1/8th
the normal CPU performance. Even even when 'xenpm' is used to select the
performance governor and operate the CPU at maximum frequency, actual CPU
performance is still 1/2 of normal (as well as using "cpufreq=xen,performance").

The patch below fixes it but I don't fully understand why.

Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
guests (pinned to other CPUs) see the performance issues.

For benchmarking purposes, I built a small C program that runs a "for
loop" 
4Billion iterations and timed its execution. In dom0, the
performance issues
also cause HVM guest startup time to go from 9-10
seconds to almost 80 seconds.

I assumed Xen was managing CPU frequency and thus blocking related MSR
access by dom0 (or any other domain). However,  clearly something else
is happening and I don't understand why.

I initially attempted to copy the same logic as the write MSR case. This
was effective at fixing the dom0 performance issue, but still left other
domains running at 1/2 speed. Hence, the change below has no access control.


If anyone has any insight as to what is really happening, I would be all ears
as I am unsure if the change below is a proper solution.

Thanks

-Alex

---
---
 xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Andrew Cooper Feb. 10, 2022, 6:27 p.m. UTC | #1
On 10/02/2022 17:27, Alex Olson wrote:
> I'm seeing strange performance issues under Xen on a Supermicro server with a Xeon D-1541 CPU caused by an MSR-related commit.
>
> Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to unknown MSRs'
> surprisingly introduces a severe performance penality where dom0 has about 1/8th
> the normal CPU performance. Even even when 'xenpm' is used to select the
> performance governor and operate the CPU at maximum frequency, actual CPU
> performance is still 1/2 of normal (as well as using "cpufreq=xen,performance").
>
> The patch below fixes it but I don't fully understand why.
>
> Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> guests (pinned to other CPUs) see the performance issues.
>
> For benchmarking purposes, I built a small C program that runs a "for
> loop" 
> 4Billion iterations and timed its execution. In dom0, the
> performance issues
> also cause HVM guest startup time to go from 9-10
> seconds to almost 80 seconds.
>
> I assumed Xen was managing CPU frequency and thus blocking related MSR
> access by dom0 (or any other domain). However,  clearly something else
> is happening and I don't understand why.
>
> I initially attempted to copy the same logic as the write MSR case. This
> was effective at fixing the dom0 performance issue, but still left other
> domains running at 1/2 speed. Hence, the change below has no access control.
>
>
> If anyone has any insight as to what is really happening, I would be all ears
> as I am unsure if the change below is a proper solution.

Well that's especially entertaining...

So your patch edits pv/emul-priv-op.c#read_msr(), so is only changing
the behaviour for PV dom0.

What exactly is your small C program doing?


The change that that patch made was to turn a read which previously
succeeded into a #GP fault.

The read has already been bogus, even if they appeared to work before. 
When dom0 is scheduled around, it no longer knows which MSR it is
actually reading, so at the best, the data being read is racy as to
which CPU you're instantaneously scheduled on.


At a guess, something in Linux is doing something especially dumb when
given #GP and is falling into a tight loop of trying to read the MSR. 
Do you happen to know which of those two is the more dominating factor?

~Andrew
Roger Pau Monné Feb. 11, 2022, 8:28 a.m. UTC | #2
On Thu, Feb 10, 2022 at 11:27:15AM -0600, Alex Olson wrote:
> I'm seeing strange performance issues under Xen on a Supermicro server with a Xeon D-1541 CPU caused by an MSR-related commit.
> 
> Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to unknown MSRs'
> surprisingly introduces a severe performance penality where dom0 has about 1/8th
> the normal CPU performance. Even even when 'xenpm' is used to select the
> performance governor and operate the CPU at maximum frequency, actual CPU
> performance is still 1/2 of normal (as well as using "cpufreq=xen,performance").
> 
> The patch below fixes it but I don't fully understand why.
> 
> Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> guests (pinned to other CPUs) see the performance issues.

You only mention MSR_IA32_THERM_CONTROL here...

> For benchmarking purposes, I built a small C program that runs a "for
> loop" 
> 4Billion iterations and timed its execution. In dom0, the
> performance issues
> also cause HVM guest startup time to go from 9-10
> seconds to almost 80 seconds.
> 
> I assumed Xen was managing CPU frequency and thus blocking related MSR
> access by dom0 (or any other domain). However,  clearly something else
> is happening and I don't understand why.
> 
> I initially attempted to copy the same logic as the write MSR case. This
> was effective at fixing the dom0 performance issue, but still left other
> domains running at 1/2 speed. Hence, the change below has no access control.
> 
> 
> If anyone has any insight as to what is really happening, I would be all ears
> as I am unsure if the change below is a proper solution.
> 
> Thanks
> 
> -Alex
> 
> ---
> ---
>  xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index 7f4279a051..f254479bda 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -970,6 +970,18 @@ static int read_msr(unsigned int reg, uint64_t *val,
>          *val = 0;
>          return X86EMUL_OKAY;
>  
> +    /* being unable to read MSR_IA32_THERM_CONTROL seems to significantly affect
> +     * dom0 and thus HVM guest startup performance, as well as PVH VMs.
> +     */
> +    case MSR_IA32_THERM_CONTROL:
> +    case MSR_IA32_ENERGY_PERF_BIAS:

...yet in the patch you also allow access to
MSR_IA32_ENERGY_PERF_BIAS, which makes me wonder whether
MSR_IA32_THERM_CONTROL is the only required one.

It could help to post full logs Xen + Linux dmesgs.

Is this reproducible with different Linux versions?

Thanks, Roger.
Alex Olson Feb. 23, 2022, 3:38 p.m. UTC | #3
I appreciate your interest, apologies for not replying right away. I've been
digging deeper to have a more meaningful resposne.

I had attempted to instrument the MSR reads, but only saw a small number reads
being blocked by the code change. They appear to be the list below and the
others seem fairly harmless:

0x00000034	MSR_SMI_COUNT
0x0000019a	IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR
0x000003f8	MSR_PKG_C3_RESIDENCY
0x000003f9	MSR_PKG_C6_RESIDENCY
0x000003fa	MSR_PKG_C7_RESIDENCY
0x00000606	MSR_RAPL_POWER_UNIT
0x0000060d	MSR_PKG_C2_RESIDENCY
0x00000611	MSR_PKG_ENERGY_STATUS
0x00000619	MSR_DRAM_ENERGY_STATUS
0x00000630	MSR_PKG_C8_RESIDENCY
0x00000631	MSR_PKG_C9_RESIDENCY
0x00000632	MSR_PKG_C10_RESIDENCY
0x00000639	MSR_PP0_ENERGY_STATUS
0x00000641	MSR_PP1_ENERGY_STATUS

As for my test program, it is just a crude loop compiled with "gcc -O3",
normally takes about 10 seconds to execute:
int main()
{
    for (volatile int i=1; i!=0; ++i){}
    return 0;
}

The relative changes in execution time of the test program and also that  HVM
guest startup time (associated with the "qemu" process being busy) completely
agreed.  I also observed the same changes under a PVH guest for the test
program.

Thus, it seemed like the CPU was somehow operating a different frequency than
expected, rather than faults consuming execution time.

-- (after a lot more investigation) --

Further instrumentation showed that the
IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR initially had value
"0x10"  which appears to be invalid both in the Intel Software Developer's
manual and what I think I'm seeing in the ACPI tables.

In dom0 Linux 5.2.38,  this value seems to have caused the
acpi_processor_get_throttling_ptc() function to see an invalid result from
acpi_get_throttling_state() and thus execute __acpi_processor_set_throttling()
which wrote the MSR with a value of zero and had the side effect of disabling
throttling (restoring normal performance).  (This all happened as the CPUs were
detected).

When the unknown MSR reads are blocked, the call to
__acpi_processor_set_throttling() did not occur since the MSR read did not
result in the invalid value -- thus the CPU remained in a throttling state.

So far, this seems to explain the dom0 performance issues I saw.

The domU observation was related... In some of my testing, dom0 was limited (via
Xen command-line) to a small number of cores so that the others could be
dedicated to other domains.  When a domU VM was launched on the others (not used
by dom0), its MSR remained at the original value resulting in low performance
since dom0 hadn't a chance to rewrite it...   Thus, I saw different domU
behavior based on the number of cores allocated to dom0.


-- summary --

In desparation, I ended up resetting BIOS settings to defaults and mysteriously
this issue doesn't occur anymore.  Not sure what could have gone wrong before as
the original settings were not far from defaults.  It seems my issues stemmed
from the server's BIOS setting the throttling MSR to an invalid value but it had
illuminated some unusual behaviors under Xen...

It seems to me there are a few findings useful to the Xen developers from
venturing down this rabbithole:

1) For conditions in which MSR registers are writeable from PV guests (such as
dom0),  they should probably be readable well, looks like MSR_IA32_THERM_CONTROL
is currently one of a small number of "unreadable" but writeable
MSRs.  Otherwise seemingly valid read-(check/modify)-write operations will
behave incorrectly under Xen.

2) As Xen controls CPU frequency and c-states,  might there be benefit to it
being extended to manage Clock Modulation / Throttling? (I wasn't expecting dom0
to be able to influence this!)

3) Perhaps PV domains (such as dom0) should not be allowed to modify such MSRs
at all since it would result in unintended effects depending on how CPU pools
and dom0 are managed?

Regards,

-Alex



On Thu, 2022-02-10 at 18:27 +0000, Andrew Cooper wrote:
> On 10/02/2022 17:27, Alex Olson wrote:
> > I'm seeing strange performance issues under Xen on a Supermicro server with
> > a Xeon D-1541 CPU caused by an MSR-related commit.
> > 
> > Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to
> > unknown MSRs'
> > surprisingly introduces a severe performance penality where dom0 has about
> > 1/8th
> > the normal CPU performance. Even even when 'xenpm' is used to select the
> > performance governor and operate the CPU at maximum frequency, actual CPU
> > performance is still 1/2 of normal (as well as using
> > "cpufreq=xen,performance").
> > 
> > The patch below fixes it but I don't fully understand why.
> > 
> > Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> > guests (pinned to other CPUs) see the performance issues.
> > 
> > For benchmarking purposes, I built a small C program that runs a "for
> > loop" 
> > 4Billion iterations and timed its execution. In dom0, the
> > performance issues
> > also cause HVM guest startup time to go from 9-10
> > seconds to almost 80 seconds.
> > 
> > I assumed Xen was managing CPU frequency and thus blocking related MSR
> > access by dom0 (or any other domain). However,  clearly something else
> > is happening and I don't understand why.
> > 
> > I initially attempted to copy the same logic as the write MSR case. This
> > was effective at fixing the dom0 performance issue, but still left other
> > domains running at 1/2 speed. Hence, the change below has no access control.
> > 
> > 
> > If anyone has any insight as to what is really happening, I would be all
> > ears
> > as I am unsure if the change below is a proper solution.
> 
> Well that's especially entertaining...
> 
> So your patch edits pv/emul-priv-op.c#read_msr(), so is only changing
> the behaviour for PV dom0.
> 
> What exactly is your small C program doing?
> 
> 
> The change that that patch made was to turn a read which previously
> succeeded into a #GP fault.
> 
> The read has already been bogus, even if they appeared to work before. 
> When dom0 is scheduled around, it no longer knows which MSR it is
> actually reading, so at the best, the data being read is racy as to
> which CPU you're instantaneously scheduled on.
> 
> 
> At a guess, something in Linux is doing something especially dumb when
> given #GP and is falling into a tight loop of trying to read the MSR. 
> Do you happen to know which of those two is the more dominating factor?
> 
> ~Andrew
Alex Olson Feb. 23, 2022, 3:44 p.m. UTC | #4
Hi Roger,

See my other reply which is more detailed.  While enabling reads of
"MSR_IA32_ENERGY_PERF_BIAS" did not cause any effect in my case, it is one of a
handful of exceptions in which MSRs are writeable but not readable. I believe
this may result in potentially unexpected behavior in  read-check/modify-write
cases.

As such, I now see that my original patch of making these MSRs globally readable
is too lenient and the conditions should likely be restricted to the same as
those in which writes are allowed.

In my particular case, it looks like all my troubles resulted from the BIOS
setting MSR_IA32_THERM_CONTROL to an invalid value and the recent code change
prevented dom0 from seeing (and correcting) it...

Regards,

-Alex


On Fri, 2022-02-11 at 09:28 +0100, Roger Pau Monné wrote:
> On Thu, Feb 10, 2022 at 11:27:15AM -0600, Alex Olson wrote:
> > I'm seeing strange performance issues under Xen on a Supermicro server with
> > a Xeon D-1541 CPU caused by an MSR-related commit.
> > 
> > Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to
> > unknown MSRs'
> > surprisingly introduces a severe performance penality where dom0 has about
> > 1/8th
> > the normal CPU performance. Even even when 'xenpm' is used to select the
> > performance governor and operate the CPU at maximum frequency, actual CPU
> > performance is still 1/2 of normal (as well as using
> > "cpufreq=xen,performance").
> > 
> > The patch below fixes it but I don't fully understand why.
> > 
> > Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> > guests (pinned to other CPUs) see the performance issues.
> 
> You only mention MSR_IA32_THERM_CONTROL here...
> 
> > For benchmarking purposes, I built a small C program that runs a "for
> > loop" 
> > 4Billion iterations and timed its execution. In dom0, the
> > performance issues
> > also cause HVM guest startup time to go from 9-10
> > seconds to almost 80 seconds.
> > 
> > I assumed Xen was managing CPU frequency and thus blocking related MSR
> > access by dom0 (or any other domain). However,  clearly something else
> > is happening and I don't understand why.
> > 
> > I initially attempted to copy the same logic as the write MSR case. This
> > was effective at fixing the dom0 performance issue, but still left other
> > domains running at 1/2 speed. Hence, the change below has no access control.
> > 
> > 
> > If anyone has any insight as to what is really happening, I would be all
> > ears
> > as I am unsure if the change below is a proper solution.
> > 
> > Thanks
> > 
> > -Alex
> > 
> > ---
> > ---
> >  xen/arch/x86/pv/emul-priv-op.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> > index 7f4279a051..f254479bda 100644
> > --- a/xen/arch/x86/pv/emul-priv-op.c
> > +++ b/xen/arch/x86/pv/emul-priv-op.c
> > @@ -970,6 +970,18 @@ static int read_msr(unsigned int reg, uint64_t *val,
> >          *val = 0;
> >          return X86EMUL_OKAY;
> >  
> > +    /* being unable to read MSR_IA32_THERM_CONTROL seems to significantly
> > affect
> > +     * dom0 and thus HVM guest startup performance, as well as PVH VMs.
> > +     */
> > +    case MSR_IA32_THERM_CONTROL:
> > +    case MSR_IA32_ENERGY_PERF_BIAS:
> 
> ...yet in the patch you also allow access to
> MSR_IA32_ENERGY_PERF_BIAS, which makes me wonder whether
> MSR_IA32_THERM_CONTROL is the only required one.
> 
> It could help to post full logs Xen + Linux dmesgs.
> 
> Is this reproducible with different Linux versions?
> 
> Thanks, Roger.
Roger Pau Monné Feb. 23, 2022, 4:11 p.m. UTC | #5
On Wed, Feb 23, 2022 at 09:38:56AM -0600, Alex Olson wrote:
> I appreciate your interest, apologies for not replying right away. I've been
> digging deeper to have a more meaningful resposne.
> 
> I had attempted to instrument the MSR reads, but only saw a small number reads
> being blocked by the code change. They appear to be the list below and the
> others seem fairly harmless:
> 
> 0x00000034	MSR_SMI_COUNT
> 0x0000019a	IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR
> 0x000003f8	MSR_PKG_C3_RESIDENCY
> 0x000003f9	MSR_PKG_C6_RESIDENCY
> 0x000003fa	MSR_PKG_C7_RESIDENCY
> 0x00000606	MSR_RAPL_POWER_UNIT
> 0x0000060d	MSR_PKG_C2_RESIDENCY
> 0x00000611	MSR_PKG_ENERGY_STATUS
> 0x00000619	MSR_DRAM_ENERGY_STATUS
> 0x00000630	MSR_PKG_C8_RESIDENCY
> 0x00000631	MSR_PKG_C9_RESIDENCY
> 0x00000632	MSR_PKG_C10_RESIDENCY
> 0x00000639	MSR_PP0_ENERGY_STATUS
> 0x00000641	MSR_PP1_ENERGY_STATUS
> 
> As for my test program, it is just a crude loop compiled with "gcc -O3",
> normally takes about 10 seconds to execute:
> int main()
> {
>     for (volatile int i=1; i!=0; ++i){}
>     return 0;
> }
> 
> The relative changes in execution time of the test program and also that  HVM
> guest startup time (associated with the "qemu" process being busy) completely
> agreed.  I also observed the same changes under a PVH guest for the test
> program.
> 
> Thus, it seemed like the CPU was somehow operating a different frequency than
> expected, rather than faults consuming execution time.
> 
> -- (after a lot more investigation) --
> 
> Further instrumentation showed that the
> IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR initially had value
> "0x10"  which appears to be invalid both in the Intel Software Developer's
> manual and what I think I'm seeing in the ACPI tables.
> 
> In dom0 Linux 5.2.38,  this value seems to have caused the
> acpi_processor_get_throttling_ptc() function to see an invalid result from
> acpi_get_throttling_state() and thus execute __acpi_processor_set_throttling()
> which wrote the MSR with a value of zero and had the side effect of disabling
> throttling (restoring normal performance).  (This all happened as the CPUs were
> detected).
> 
> When the unknown MSR reads are blocked, the call to
> __acpi_processor_set_throttling() did not occur since the MSR read did not
> result in the invalid value -- thus the CPU remained in a throttling state.
> 
> So far, this seems to explain the dom0 performance issues I saw.
> 
> The domU observation was related... In some of my testing, dom0 was limited (via
> Xen command-line) to a small number of cores so that the others could be
> dedicated to other domains.  When a domU VM was launched on the others (not used
> by dom0), its MSR remained at the original value resulting in low performance
> since dom0 hadn't a chance to rewrite it...   Thus, I saw different domU
> behavior based on the number of cores allocated to dom0.
> 
> 
> -- summary --
> 
> In desparation, I ended up resetting BIOS settings to defaults and mysteriously
> this issue doesn't occur anymore.  Not sure what could have gone wrong before as
> the original settings were not far from defaults.  It seems my issues stemmed
> from the server's BIOS setting the throttling MSR to an invalid value but it had
> illuminated some unusual behaviors under Xen...
> 
> It seems to me there are a few findings useful to the Xen developers from
> venturing down this rabbithole:
> 
> 1) For conditions in which MSR registers are writeable from PV guests (such as
> dom0),  they should probably be readable well, looks like MSR_IA32_THERM_CONTROL
> is currently one of a small number of "unreadable" but writeable
> MSRs.  Otherwise seemingly valid read-(check/modify)-write operations will
> behave incorrectly under Xen.

So it's one of those MSRs that's only writable when dom0 has it's
vCPUs pinned. We could allow dom0 to read from it in that case (that's
an oversight of the MSR handling rework), but it would seem better to
just remove access to it altogether: it's bogus to allow dom0 to play
with the MSR in the first place IMO, and it won't really solve issues
like the one reported here unless dom0 vCPUs == pCPUs and all are
pinned, so that dom0 can fix the MSR in all CPUs.

Thanks, Roger.
Jan Beulich Feb. 23, 2022, 4:27 p.m. UTC | #6
On 23.02.2022 16:38, Alex Olson wrote:
> It seems to me there are a few findings useful to the Xen developers from
> venturing down this rabbithole:
> 
> 1) For conditions in which MSR registers are writeable from PV guests (such as
> dom0),  they should probably be readable well, looks like MSR_IA32_THERM_CONTROL
> is currently one of a small number of "unreadable" but writeable
> MSRs.  Otherwise seemingly valid read-(check/modify)-write operations will
> behave incorrectly under Xen.

Hmm, this is indeed odd, just like for the adjacent MSR_IA32_ENERGY_PERF_BIAS.
But changing the behavior for such things requires (a) doing archaeology and
(b) being certain that no old Dom0 may be affected by an adjustment. Quite
likely this write-only behavior is a result from removing read access in the
general case. IOW I think we want to re-add read access for these two MSRs
(and any others fitting this pattern) for Dom0. Care to make a patch?

It's perhaps worth noting that (write) access to these two MSRs sits behind
is_hwdom_pinned_vcpu(). This is a mode we generally recommend against using
anyway. I'm not even sure we consider it (security) supported.

> 2) As Xen controls CPU frequency and c-states,  might there be benefit to it
> being extended to manage Clock Modulation / Throttling? (I wasn't expecting dom0
> to be able to influence this!)

This had been the plan many, many years ago. Yet no-one ever came forward
with any code, afaik.

> 3) Perhaps PV domains (such as dom0) should not be allowed to modify such MSRs
> at all since it would result in unintended effects depending on how CPU pools
> and dom0 are managed?

Well, the general rule already is to limit what can be written. So wrong
cases now need to really be dealt with per individual MSR.

Jan
Jan Beulich Feb. 23, 2022, 4:31 p.m. UTC | #7
On 23.02.2022 17:11, Roger Pau Monné wrote:
> On Wed, Feb 23, 2022 at 09:38:56AM -0600, Alex Olson wrote:
>> 1) For conditions in which MSR registers are writeable from PV guests (such as
>> dom0),  they should probably be readable well, looks like MSR_IA32_THERM_CONTROL
>> is currently one of a small number of "unreadable" but writeable
>> MSRs.  Otherwise seemingly valid read-(check/modify)-write operations will
>> behave incorrectly under Xen.
> 
> So it's one of those MSRs that's only writable when dom0 has it's
> vCPUs pinned. We could allow dom0 to read from it in that case (that's
> an oversight of the MSR handling rework), but it would seem better to
> just remove access to it altogether: it's bogus to allow dom0 to play
> with the MSR in the first place IMO, and it won't really solve issues
> like the one reported here unless dom0 vCPUs == pCPUs and all are
> pinned, so that dom0 can fix the MSR in all CPUs.

Dropping this is imo only legitimate if we decide to do away with
"cpufreq=dom0-kernel" and alike. This limited access permission to
certain MSRs was largely if not exclusively to make this extended
Dom0 control work (which as a prereq took pinning Dom0's vCPU-s).

Jan
Roger Pau Monné Feb. 24, 2022, 8:52 a.m. UTC | #8
On Wed, Feb 23, 2022 at 05:31:53PM +0100, Jan Beulich wrote:
> On 23.02.2022 17:11, Roger Pau Monné wrote:
> > On Wed, Feb 23, 2022 at 09:38:56AM -0600, Alex Olson wrote:
> >> 1) For conditions in which MSR registers are writeable from PV guests (such as
> >> dom0),  they should probably be readable well, looks like MSR_IA32_THERM_CONTROL
> >> is currently one of a small number of "unreadable" but writeable
> >> MSRs.  Otherwise seemingly valid read-(check/modify)-write operations will
> >> behave incorrectly under Xen.
> > 
> > So it's one of those MSRs that's only writable when dom0 has it's
> > vCPUs pinned. We could allow dom0 to read from it in that case (that's
> > an oversight of the MSR handling rework), but it would seem better to
> > just remove access to it altogether: it's bogus to allow dom0 to play
> > with the MSR in the first place IMO, and it won't really solve issues
> > like the one reported here unless dom0 vCPUs == pCPUs and all are
> > pinned, so that dom0 can fix the MSR in all CPUs.
> 
> Dropping this is imo only legitimate if we decide to do away with
> "cpufreq=dom0-kernel" and alike.

I would be fine with that. I think the mode is bogus anyway: dom0
doesn't have enough knowledge to take meaningful decisions, and it
would require that dom0 vCPUs == pCPUs, or else it's only acting on a
subset of CPUs which is already bogus IMO.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 7f4279a051..f254479bda 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -970,6 +970,18 @@  static int read_msr(unsigned int reg, uint64_t *val,
         *val = 0;
         return X86EMUL_OKAY;
 
+    /* being unable to read MSR_IA32_THERM_CONTROL seems to significantly affect
+     * dom0 and thus HVM guest startup performance, as well as PVH VMs.
+     */
+    case MSR_IA32_THERM_CONTROL:
+    case MSR_IA32_ENERGY_PERF_BIAS:
+        *val = 0;
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( rdmsr_safe(reg, *val) == 0)
+            return X86EMUL_OKAY;
+        break;
+
     case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
     case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
     case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: