diff mbox series

[for-4.20,v2] x86/intel: Fix PERF_GLOBAL fixup when virtualised

Message ID 20250121165626.380627-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series [for-4.20,v2] x86/intel: Fix PERF_GLOBAL fixup when virtualised | expand

Commit Message

Andrew Cooper Jan. 21, 2025, 4:56 p.m. UTC
Logic using performance counters needs to look at
MSR_MISC_ENABLE.PERF_AVAILABLE before touching any other resources.

When virtualised under ESX, Xen dies with a #GP fault trying to read
MSR_CORE_PERF_GLOBAL_CTRL.

Factor this logic out into a separate function (it's already too squashed to
the RHS), and insert a check of MSR_MISC_ENABLE.PERF_AVAILABLE.

This also avoids setting X86_FEATURE_ARCH_PERFMON if MSR_MISC_ENABLE says that
PERF is unavailable, although oprofile (the only consumer of this flag)
cross-checks too.

Fixes: 6bdb965178bb ("x86/intel: ensure Global Performance Counter Control is setup correctly")
Reported-by: Jonathan Katz <jonathan.katz@aptar.com>
Link: https://xcp-ng.org/forum/topic/10286/nesting-xcp-ng-on-esx-8
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>

v2:
 * Drop _safe() on MSR_IA32_MISC_ENABLE read.
 * Fix the setting of X86_FEATURE_ARCH_PERFMON.

Untested, but this is the same pattern used by oprofile and watchdog setup.

I've intentionally stopped using Intel style.  This file is already mixed (as
visible even in context), and it doesn't remotely resemble it's Linux origin
any more.

For 4.20.  This regressions has already been backported.
---
 xen/arch/x86/cpu/intel.c | 64 +++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 27 deletions(-)


base-commit: c3f5d1bb40b57d467cb4051eafa86f5933ec9003

Comments

Jan Beulich Jan. 21, 2025, 5:04 p.m. UTC | #1
On 21.01.2025 17:56, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>  }
>  
> +static void init_intel_perf(struct cpuinfo_x86 *c)
> +{
> +    uint64_t val;
> +    unsigned int eax, ver, nr_cnt;
> +
> +    if ( c->cpuid_level <= 9 ||
> +         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);

Just curious (not an objection or anything): Is there a reason you have
two padding blanks here instead of just one? (Really we may want to gain
a function-like form to invoke RDMSR, but that's orthogonal to the change
here.)

Jan
Andrew Cooper Jan. 21, 2025, 5:07 p.m. UTC | #2
On 21/01/2025 5:04 pm, Jan Beulich wrote:
> On 21.01.2025 17:56, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/intel.c
>> +++ b/xen/arch/x86/cpu/intel.c
>> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>>  }
>>  
>> +static void init_intel_perf(struct cpuinfo_x86 *c)
>> +{
>> +    uint64_t val;
>> +    unsigned int eax, ver, nr_cnt;
>> +
>> +    if ( c->cpuid_level <= 9 ||
>> +         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);
> Just curious (not an objection or anything): Is there a reason you have
> two padding blanks here instead of just one?

Alignment with the next line.

> (Really we may want to gain
> a function-like form to invoke RDMSR, but that's orthogonal to the change
> here.)

Indeed.

* def0701b5373 - (xen-nj-msr) switch rdmsrl => rdmsr (30 hours ago)
<Andrew Cooper>
* 1a3f92abccf1 - rdmsr (30 hours ago) <Andrew Cooper>
* 01c9ec7d9482 - rdmsr_safe (30 hours ago) <Andrew Cooper>
* 7ec72a0379b2 - fix error printing in write_msr() (30 hours ago)
<Andrew Cooper>
* 3ff3d60835a5 - drop wrmsrl (30 hours ago) <Andrew Cooper>
* 136128799b4a - wrmsr cleanup (30 hours ago) <Andrew Cooper>
* b2ed78d2e7e3 - x86/msr: Move MSR_FEATURE_CONTROL handling to the new
MSR infrastructure (30 hours ago) <Andrew Cooper>
* 7691edea3d67 - x86/msr: Clean up the MSR_DEBUGCTL constants (30 hours
ago) <Andrew Cooper>
* 77ba2827a955 - x86/msr: Clean up the MSR_MISC_ENABLE constants (30
hours ago) <Andrew Cooper>
* 7f2768cfc4b3 - ---upstream--- (30 hours ago) <Andrew Cooper>
* 8b2e048fdd14 - x86/msr: Introduce msr_{set,clear}_bits() helpers (30
hours ago) <Andrew Cooper>
* 562f88503342 - x86/msr: Clean up the MSR_FEATURE_CONTROL constants (30
hours ago) <Andrew Cooper>
* 199888c9e2f8 - x86/msr: Clean up the
MSR_{PLATFORM_INFO,MISC_FEATURES_ENABLES} constants (30 hours ago)
<Andrew Cooper>
* c3f5d1bb40b5 - (tag: 4.20.0-rc2, xenbits/staging, xenbits/master,
upstream/staging, upstream/master, origin/staging, origin/master,
origin/HEAD, staging, pending, master) automation/cirrus-ci: introduce
FreeBSD randconfig builds (4 days ago) <Roger Pau Monne>

That was work I did while sat in an airport unable to leave XenSummit in
Nanjing...

It's blocked on arguments over naming.

~Andrew
Jan Beulich Jan. 22, 2025, 7:12 a.m. UTC | #3
On 21.01.2025 18:07, Andrew Cooper wrote:
> On 21/01/2025 5:04 pm, Jan Beulich wrote:
>> On 21.01.2025 17:56, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu/intel.c
>>> +++ b/xen/arch/x86/cpu/intel.c
>>> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>>>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>>>  }
>>>  
>>> +static void init_intel_perf(struct cpuinfo_x86 *c)
>>> +{
>>> +    uint64_t val;
>>> +    unsigned int eax, ver, nr_cnt;
>>> +
>>> +    if ( c->cpuid_level <= 9 ||
>>> +         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);
>> Just curious (not an objection or anything): Is there a reason you have
>> two padding blanks here instead of just one?
> 
> Alignment with the next line.

Yet that's then merely a matter of removing a blank from that line too,
isn't it?

>> (Really we may want to gain
>> a function-like form to invoke RDMSR, but that's orthogonal to the change
>> here.)
> 
> Indeed.
> 
> * def0701b5373 - (xen-nj-msr) switch rdmsrl => rdmsr (30 hours ago)
> <Andrew Cooper>
> * 1a3f92abccf1 - rdmsr (30 hours ago) <Andrew Cooper>
> * 01c9ec7d9482 - rdmsr_safe (30 hours ago) <Andrew Cooper>
> * 7ec72a0379b2 - fix error printing in write_msr() (30 hours ago)
> <Andrew Cooper>
> * 3ff3d60835a5 - drop wrmsrl (30 hours ago) <Andrew Cooper>
> * 136128799b4a - wrmsr cleanup (30 hours ago) <Andrew Cooper>
> * b2ed78d2e7e3 - x86/msr: Move MSR_FEATURE_CONTROL handling to the new
> MSR infrastructure (30 hours ago) <Andrew Cooper>
> * 7691edea3d67 - x86/msr: Clean up the MSR_DEBUGCTL constants (30 hours
> ago) <Andrew Cooper>
> * 77ba2827a955 - x86/msr: Clean up the MSR_MISC_ENABLE constants (30
> hours ago) <Andrew Cooper>
> * 7f2768cfc4b3 - ---upstream--- (30 hours ago) <Andrew Cooper>
> * 8b2e048fdd14 - x86/msr: Introduce msr_{set,clear}_bits() helpers (30
> hours ago) <Andrew Cooper>
> * 562f88503342 - x86/msr: Clean up the MSR_FEATURE_CONTROL constants (30
> hours ago) <Andrew Cooper>
> * 199888c9e2f8 - x86/msr: Clean up the
> MSR_{PLATFORM_INFO,MISC_FEATURES_ENABLES} constants (30 hours ago)
> <Andrew Cooper>
> * c3f5d1bb40b5 - (tag: 4.20.0-rc2, xenbits/staging, xenbits/master,
> upstream/staging, upstream/master, origin/staging, origin/master,
> origin/HEAD, staging, pending, master) automation/cirrus-ci: introduce
> FreeBSD randconfig builds (4 days ago) <Roger Pau Monne>
> 
> That was work I did while sat in an airport unable to leave XenSummit in
> Nanjing...
> 
> It's blocked on arguments over naming.

Is it? I couldn't find any replies of mine in my outbox (and a quick
attempt to search by subjects on the web also didn't reveal anything),
but then Nanjing also was quite some time ago.

I fear you will not like this, but from a more general perspective:
When you say "blocked" for your own patches, it's usually the case that
the ball is in your court. Interestingly other people's patches (say
Roger's or mine) are, when they are "blocked", also often lacking
feedback from you. While it's apparently close to impossible to get
you to reply on such threads rooting in other people's patches,
shouldn't it at least be entirely in your own interest to keep the ball
rolling when it comes to your own ones? That is, make an attempt (or
perhaps repeated ones) to come to some form of agreement?

Plus for anything you deem blocked, you know where the list of pending
x86 patches is that you could add yours to. Since in that table we
record last posting dates, finding the patches is then much easier as
well.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 6a7347968ba2..6a680ba38dc9 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -535,39 +535,49 @@  static void intel_log_freq(const struct cpuinfo_x86 *c)
     printk("%u MHz\n", (factor * max_ratio + 50) / 100);
 }
 
+static void init_intel_perf(struct cpuinfo_x86 *c)
+{
+    uint64_t val;
+    unsigned int eax, ver, nr_cnt;
+
+    if ( c->cpuid_level <= 9 ||
+         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);
+             !(val & MSR_IA32_MISC_ENABLE_PERF_AVAIL); }) )
+        return;
+
+    eax = cpuid_eax(10);
+    ver = eax & 0xff;
+    nr_cnt = (eax >> 8) & 0xff;
+
+    if ( ver && nr_cnt > 1 && nr_cnt <= 32 )
+    {
+        unsigned int cnt_mask = (1UL << nr_cnt) - 1;
+
+        /*
+         * On (some?) Sapphire/Emerald Rapids platforms each package-BSP
+         * starts with all the enable bits for the general-purpose PMCs
+         * cleared.  Adjust so counters can be enabled from EVNTSEL.
+         */
+        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val);
+
+        if ( (val & cnt_mask) != cnt_mask )
+        {
+            printk("FIRMWARE BUG: CPU%u invalid PERF_GLOBAL_CTRL: %#"PRIx64" adjusting to %#"PRIx64"\n",
+                   smp_processor_id(), val, val | cnt_mask);
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val | cnt_mask);
+        }
+
+        __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
+    }
+}
+
 static void cf_check init_intel(struct cpuinfo_x86 *c)
 {
 	/* Detect the extended topology information if available */
 	detect_extended_topology(c);
 
 	init_intel_cacheinfo(c);
-	if (c->cpuid_level > 9) {
-		unsigned eax = cpuid_eax(10);
-		unsigned int cnt = (eax >> 8) & 0xff;
-
-		/* Check for version and the number of counters */
-		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
-			uint64_t global_ctrl;
-			unsigned int cnt_mask = (1UL << cnt) - 1;
-
-			/*
-			 * On (some?) Sapphire/Emerald Rapids platforms each
-			 * package-BSP starts with all the enable bits for the
-			 * general-purpose PMCs cleared.  Adjust so counters
-			 * can be enabled from EVNTSEL.
-			 */
-			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-			if ((global_ctrl & cnt_mask) != cnt_mask) {
-				printk("CPU%u: invalid PERF_GLOBAL_CTRL: %#"
-				       PRIx64 " adjusting to %#" PRIx64 "\n",
-				       smp_processor_id(), global_ctrl,
-				       global_ctrl | cnt_mask);
-				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-				       global_ctrl | cnt_mask);
-			}
-			__set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
-		}
-	}
+	init_intel_perf(c);
 
 	if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
 	{