diff mbox series

[for-4.20,v2] x86/intel: Fix PERF_GLOBAL fixup when virtualised

Message ID 20250121165626.380627-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series [for-4.20,v2] x86/intel: Fix PERF_GLOBAL fixup when virtualised | expand

Commit Message

Andrew Cooper Jan. 21, 2025, 4:56 p.m. UTC
Logic using performance counters needs to look at
MSR_MISC_ENABLE.PERF_AVAILABLE before touching any other resources.

When virtualised under ESX, Xen dies with a #GP fault trying to read
MSR_CORE_PERF_GLOBAL_CTRL.

Factor this logic out into a separate function (it's already too squashed to
the RHS), and insert a check of MSR_MISC_ENABLE.PERF_AVAILABLE.

This also avoids setting X86_FEATURE_ARCH_PERFMON if MSR_MISC_ENABLE says that
PERF is unavailable, although oprofile (the only consumer of this flag)
cross-checks too.

Fixes: 6bdb965178bb ("x86/intel: ensure Global Performance Counter Control is setup correctly")
Reported-by: Jonathan Katz <jonathan.katz@aptar.com>
Link: https://xcp-ng.org/forum/topic/10286/nesting-xcp-ng-on-esx-8
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>

v2:
 * Drop _safe() on MSR_IA32_MISC_ENABLE read.
 * Fix the setting of X86_FEATURE_ARCH_PERFMON.

Untested, but this is the same pattern used by oprofile and watchdog setup.

I've intentionally stopped using Intel style.  This file is already mixed (as
visible even in context), and it doesn't remotely resemble it's Linux origin
any more.

For 4.20.  This regressions has already been backported.
---
 xen/arch/x86/cpu/intel.c | 64 +++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 27 deletions(-)


base-commit: c3f5d1bb40b57d467cb4051eafa86f5933ec9003

Comments

Jan Beulich Jan. 21, 2025, 5:04 p.m. UTC | #1
On 21.01.2025 17:56, Andrew Cooper wrote:
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>  }
>  
> +static void init_intel_perf(struct cpuinfo_x86 *c)
> +{
> +    uint64_t val;
> +    unsigned int eax, ver, nr_cnt;
> +
> +    if ( c->cpuid_level <= 9 ||
> +         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);

Just curious (not an objection or anything): Is there a reason you have
two padding blanks here instead of just one? (Really we may want to gain
a function-like form to invoke RDMSR, but that's orthogonal to the change
here.)

Jan
Andrew Cooper Jan. 21, 2025, 5:07 p.m. UTC | #2
On 21/01/2025 5:04 pm, Jan Beulich wrote:
> On 21.01.2025 17:56, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/intel.c
>> +++ b/xen/arch/x86/cpu/intel.c
>> @@ -535,39 +535,49 @@ static void intel_log_freq(const struct cpuinfo_x86 *c)
>>      printk("%u MHz\n", (factor * max_ratio + 50) / 100);
>>  }
>>  
>> +static void init_intel_perf(struct cpuinfo_x86 *c)
>> +{
>> +    uint64_t val;
>> +    unsigned int eax, ver, nr_cnt;
>> +
>> +    if ( c->cpuid_level <= 9 ||
>> +         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);
> Just curious (not an objection or anything): Is there a reason you have
> two padding blanks here instead of just one?

Alignment with the next line.

> (Really we may want to gain
> a function-like form to invoke RDMSR, but that's orthogonal to the change
> here.)

Indeed.

* def0701b5373 - (xen-nj-msr) switch rdmsrl => rdmsr (30 hours ago)
<Andrew Cooper>
* 1a3f92abccf1 - rdmsr (30 hours ago) <Andrew Cooper>
* 01c9ec7d9482 - rdmsr_safe (30 hours ago) <Andrew Cooper>
* 7ec72a0379b2 - fix error printing in write_msr() (30 hours ago)
<Andrew Cooper>
* 3ff3d60835a5 - drop wrmsrl (30 hours ago) <Andrew Cooper>
* 136128799b4a - wrmsr cleanup (30 hours ago) <Andrew Cooper>
* b2ed78d2e7e3 - x86/msr: Move MSR_FEATURE_CONTROL handling to the new
MSR infrastructure (30 hours ago) <Andrew Cooper>
* 7691edea3d67 - x86/msr: Clean up the MSR_DEBUGCTL constants (30 hours
ago) <Andrew Cooper>
* 77ba2827a955 - x86/msr: Clean up the MSR_MISC_ENABLE constants (30
hours ago) <Andrew Cooper>
* 7f2768cfc4b3 - ---upstream--- (30 hours ago) <Andrew Cooper>
* 8b2e048fdd14 - x86/msr: Introduce msr_{set,clear}_bits() helpers (30
hours ago) <Andrew Cooper>
* 562f88503342 - x86/msr: Clean up the MSR_FEATURE_CONTROL constants (30
hours ago) <Andrew Cooper>
* 199888c9e2f8 - x86/msr: Clean up the
MSR_{PLATFORM_INFO,MISC_FEATURES_ENABLES} constants (30 hours ago)
<Andrew Cooper>
* c3f5d1bb40b5 - (tag: 4.20.0-rc2, xenbits/staging, xenbits/master,
upstream/staging, upstream/master, origin/staging, origin/master,
origin/HEAD, staging, pending, master) automation/cirrus-ci: introduce
FreeBSD randconfig builds (4 days ago) <Roger Pau Monne>

That was work I did while sat in an airport unable to leave XenSummit in
Nanjing...

It's blocked on arguments over naming.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 6a7347968ba2..6a680ba38dc9 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -535,39 +535,49 @@  static void intel_log_freq(const struct cpuinfo_x86 *c)
     printk("%u MHz\n", (factor * max_ratio + 50) / 100);
 }
 
+static void init_intel_perf(struct cpuinfo_x86 *c)
+{
+    uint64_t val;
+    unsigned int eax, ver, nr_cnt;
+
+    if ( c->cpuid_level <= 9 ||
+         ({  rdmsrl(MSR_IA32_MISC_ENABLE, val);
+             !(val & MSR_IA32_MISC_ENABLE_PERF_AVAIL); }) )
+        return;
+
+    eax = cpuid_eax(10);
+    ver = eax & 0xff;
+    nr_cnt = (eax >> 8) & 0xff;
+
+    if ( ver && nr_cnt > 1 && nr_cnt <= 32 )
+    {
+        unsigned int cnt_mask = (1UL << nr_cnt) - 1;
+
+        /*
+         * On (some?) Sapphire/Emerald Rapids platforms each package-BSP
+         * starts with all the enable bits for the general-purpose PMCs
+         * cleared.  Adjust so counters can be enabled from EVNTSEL.
+         */
+        rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val);
+
+        if ( (val & cnt_mask) != cnt_mask )
+        {
+            printk("FIRMWARE BUG: CPU%u invalid PERF_GLOBAL_CTRL: %#"PRIx64" adjusting to %#"PRIx64"\n",
+                   smp_processor_id(), val, val | cnt_mask);
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, val | cnt_mask);
+        }
+
+        __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
+    }
+}
+
 static void cf_check init_intel(struct cpuinfo_x86 *c)
 {
 	/* Detect the extended topology information if available */
 	detect_extended_topology(c);
 
 	init_intel_cacheinfo(c);
-	if (c->cpuid_level > 9) {
-		unsigned eax = cpuid_eax(10);
-		unsigned int cnt = (eax >> 8) & 0xff;
-
-		/* Check for version and the number of counters */
-		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
-			uint64_t global_ctrl;
-			unsigned int cnt_mask = (1UL << cnt) - 1;
-
-			/*
-			 * On (some?) Sapphire/Emerald Rapids platforms each
-			 * package-BSP starts with all the enable bits for the
-			 * general-purpose PMCs cleared.  Adjust so counters
-			 * can be enabled from EVNTSEL.
-			 */
-			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-			if ((global_ctrl & cnt_mask) != cnt_mask) {
-				printk("CPU%u: invalid PERF_GLOBAL_CTRL: %#"
-				       PRIx64 " adjusting to %#" PRIx64 "\n",
-				       smp_processor_id(), global_ctrl,
-				       global_ctrl | cnt_mask);
-				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-				       global_ctrl | cnt_mask);
-			}
-			__set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
-		}
-	}
+	init_intel_perf(c);
 
 	if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
 	{