diff mbox series

[1/2] x86/hpet: Use another crystalball to evaluate HPET usability

Message ID da80b8dd-177c-d27a-9a00-c9538a139d37@suse.com (mailing list archive)
State New, archived
Headers show
Series x86: HPET PC10 quirk workaround and some cleanup | expand

Commit Message

Jan Beulich Oct. 19, 2021, 7:07 a.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

On recent Intel systems the HPET stops working when the system reaches PC10
idle state.

The approach of adding PCI ids to the early quirks to disable HPET on
these systems is a whack a mole game which makes no sense.

Check for PC10 instead and force disable HPET if supported. The check is
overbroad as it does not take ACPI, mwait-idle enablement and command
line parameters into account. That's fine as long as there is at least
PMTIMER available to calibrate the TSC frequency. The decision can be
overruled by adding "clocksource=hpet" on the Xen command line.

Remove the related PCI quirks for affected Coffee Lake systems as they
are not longer required. That should also cover all other systems, i.e.
Ice Lake, Tiger Lake, and newer generations, which are most likely
affected by this as well.

Fixes: Yet another hardware trainwreck
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Linux commit: 6e3cd95234dc1eda488f4f487c281bac8fef4d9b]

I have to admit that the purpose of checking CPUID5_ECX_INTERRUPT_BREAK
is unclear to me, but I didn't want to diverge in technical aspects from
the Linux commit.

In mwait_pc10_supported(), besides some cosmetic adjustments, avoid UB
from shifting left a signed 4-bit constant by 28 bits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Fully different replacement of "x86: avoid HPET use also on certain
    Coffee Lake H".

Comments

Roger Pau Monné Oct. 19, 2021, 11:30 a.m. UTC | #1
On Tue, Oct 19, 2021 at 09:07:39AM +0200, Jan Beulich wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
> 
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
> 
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, mwait-idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "clocksource=hpet" on the Xen command line.
> 
> Remove the related PCI quirks for affected Coffee Lake systems as they
> are not longer required. That should also cover all other systems, i.e.
> Ice Lake, Tiger Lake, and newer generations, which are most likely
> affected by this as well.
> 
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [Linux commit: 6e3cd95234dc1eda488f4f487c281bac8fef4d9b]
> 
> I have to admit that the purpose of checking CPUID5_ECX_INTERRUPT_BREAK
> is unclear to me, but I didn't want to diverge in technical aspects from
> the Linux commit.
> 
> In mwait_pc10_supported(), besides some cosmetic adjustments, avoid UB
> from shifting left a signed 4-bit constant by 28 bits.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Just one comment below.

> ---
> v2: Fully different replacement of "x86: avoid HPET use also on certain
>     Coffee Lake H".
> 
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -34,6 +34,7 @@
>  #include <asm/fixmap.h>
>  #include <asm/guest.h>
>  #include <asm/mc146818rtc.h>
> +#include <asm/mwait.h>
>  #include <asm/div64.h>
>  #include <asm/acpi.h>
>  #include <asm/hpet.h>
> @@ -395,14 +396,43 @@ static int64_t __init init_hpet(struct p
>              }
>  
>          /*
> -         * Some Coffee Lake platforms have a skewed HPET timer once the SoCs
> -         * entered PC10.
> +         * Some Coffee Lake and later platforms have a skewed HPET timer once
> +         * they entered PC10.
> +         *
> +         * Check whether the system supports PC10. If so force disable HPET as
> +         * that stops counting in PC10. This check is overbroad as it does not
> +         * take any of the following into account:
> +         *
> +         *	- ACPI tables
> +         *	- Enablement of mwait-idle
> +         *	- Command line arguments which limit mwait-idle C-state support
> +         *
> +         * That's perfectly fine. HPET is a piece of hardware designed by
> +         * committee and the only reasons why it is still in use on modern
> +         * systems is the fact that it is impossible to reliably query TSC and
> +         * CPU frequency via CPUID or firmware.
> +         *
> +         * If HPET is functional it is useful for calibrating TSC, but this can
> +         * be done via PMTIMER as well which seems to be the last remaining
> +         * timer on X86/INTEL platforms that has not been completely wreckaged
> +         * by feature creep.
> +         *
> +         * In theory HPET support should be removed altogether, but there are
> +         * older systems out there which depend on it because TSC and APIC timer
> +         * are dysfunctional in deeper C-states.
>           */
> -        if ( pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
> -                             PCI_VENDOR_ID) == PCI_VENDOR_ID_INTEL &&
> -             pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
> -                             PCI_DEVICE_ID) == 0x3ec4 )
> -            hpet_address = 0;
> +        if ( mwait_pc10_supported() )
> +        {
> +            uint64_t pcfg;
> +
> +            rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> +            if ( (pcfg & 0xf) < 8 )
> +                /* nothing */;
> +            else if ( !strcmp(opt_clocksource, pts->id) )
> +                printk("HPET use requested via command line, but dysfunctional in PC10\n");
> +            else
> +                hpet_address = 0;

Should we print a message that HPET is being disabled?

Thanks, Roger.
Jan Beulich Oct. 19, 2021, 12:08 p.m. UTC | #2
On 19.10.2021 13:30, Roger Pau Monné wrote:
> On Tue, Oct 19, 2021 at 09:07:39AM +0200, Jan Beulich wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>>
>> On recent Intel systems the HPET stops working when the system reaches PC10
>> idle state.
>>
>> The approach of adding PCI ids to the early quirks to disable HPET on
>> these systems is a whack a mole game which makes no sense.
>>
>> Check for PC10 instead and force disable HPET if supported. The check is
>> overbroad as it does not take ACPI, mwait-idle enablement and command
>> line parameters into account. That's fine as long as there is at least
>> PMTIMER available to calibrate the TSC frequency. The decision can be
>> overruled by adding "clocksource=hpet" on the Xen command line.
>>
>> Remove the related PCI quirks for affected Coffee Lake systems as they
>> are not longer required. That should also cover all other systems, i.e.
>> Ice Lake, Tiger Lake, and newer generations, which are most likely
>> affected by this as well.
>>
>> Fixes: Yet another hardware trainwreck
>> Reported-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> [Linux commit: 6e3cd95234dc1eda488f4f487c281bac8fef4d9b]
>>
>> I have to admit that the purpose of checking CPUID5_ECX_INTERRUPT_BREAK
>> is unclear to me, but I didn't want to diverge in technical aspects from
>> the Linux commit.
>>
>> In mwait_pc10_supported(), besides some cosmetic adjustments, avoid UB
>> from shifting left a signed 4-bit constant by 28 bits.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> @@ -395,14 +396,43 @@ static int64_t __init init_hpet(struct p
>>              }
>>  
>>          /*
>> -         * Some Coffee Lake platforms have a skewed HPET timer once the SoCs
>> -         * entered PC10.
>> +         * Some Coffee Lake and later platforms have a skewed HPET timer once
>> +         * they entered PC10.
>> +         *
>> +         * Check whether the system supports PC10. If so force disable HPET as
>> +         * that stops counting in PC10. This check is overbroad as it does not
>> +         * take any of the following into account:
>> +         *
>> +         *	- ACPI tables
>> +         *	- Enablement of mwait-idle
>> +         *	- Command line arguments which limit mwait-idle C-state support
>> +         *
>> +         * That's perfectly fine. HPET is a piece of hardware designed by
>> +         * committee and the only reasons why it is still in use on modern
>> +         * systems is the fact that it is impossible to reliably query TSC and
>> +         * CPU frequency via CPUID or firmware.
>> +         *
>> +         * If HPET is functional it is useful for calibrating TSC, but this can
>> +         * be done via PMTIMER as well which seems to be the last remaining
>> +         * timer on X86/INTEL platforms that has not been completely wreckaged
>> +         * by feature creep.
>> +         *
>> +         * In theory HPET support should be removed altogether, but there are
>> +         * older systems out there which depend on it because TSC and APIC timer
>> +         * are dysfunctional in deeper C-states.
>>           */
>> -        if ( pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
>> -                             PCI_VENDOR_ID) == PCI_VENDOR_ID_INTEL &&
>> -             pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
>> -                             PCI_DEVICE_ID) == 0x3ec4 )
>> -            hpet_address = 0;
>> +        if ( mwait_pc10_supported() )
>> +        {
>> +            uint64_t pcfg;
>> +
>> +            rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
>> +            if ( (pcfg & 0xf) < 8 )
>> +                /* nothing */;
>> +            else if ( !strcmp(opt_clocksource, pts->id) )
>> +                printk("HPET use requested via command line, but dysfunctional in PC10\n");
>> +            else
>> +                hpet_address = 0;
> 
> Should we print a message that HPET is being disabled?

There is one, and it was even visible in patch context that you
did strip from your reply:

         if ( !hpet_address )
             printk("Disabling HPET for being unreliable\n");

Jan
Roger Pau Monné Oct. 19, 2021, 12:45 p.m. UTC | #3
On Tue, Oct 19, 2021 at 02:08:38PM +0200, Jan Beulich wrote:
> On 19.10.2021 13:30, Roger Pau Monné wrote:
> > On Tue, Oct 19, 2021 at 09:07:39AM +0200, Jan Beulich wrote:
> >> From: Thomas Gleixner <tglx@linutronix.de>
> >>
> >> On recent Intel systems the HPET stops working when the system reaches PC10
> >> idle state.
> >>
> >> The approach of adding PCI ids to the early quirks to disable HPET on
> >> these systems is a whack a mole game which makes no sense.
> >>
> >> Check for PC10 instead and force disable HPET if supported. The check is
> >> overbroad as it does not take ACPI, mwait-idle enablement and command
> >> line parameters into account. That's fine as long as there is at least
> >> PMTIMER available to calibrate the TSC frequency. The decision can be
> >> overruled by adding "clocksource=hpet" on the Xen command line.
> >>
> >> Remove the related PCI quirks for affected Coffee Lake systems as they
> >> are not longer required. That should also cover all other systems, i.e.
> >> Ice Lake, Tiger Lake, and newer generations, which are most likely
> >> affected by this as well.
> >>
> >> Fixes: Yet another hardware trainwreck
> >> Reported-by: Jakub Kicinski <kuba@kernel.org>
> >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> >> [Linux commit: 6e3cd95234dc1eda488f4f487c281bac8fef4d9b]
> >>
> >> I have to admit that the purpose of checking CPUID5_ECX_INTERRUPT_BREAK
> >> is unclear to me, but I didn't want to diverge in technical aspects from
> >> the Linux commit.
> >>
> >> In mwait_pc10_supported(), besides some cosmetic adjustments, avoid UB
> >> from shifting left a signed 4-bit constant by 28 bits.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> >> @@ -395,14 +396,43 @@ static int64_t __init init_hpet(struct p
> >>              }
> >>  
> >>          /*
> >> -         * Some Coffee Lake platforms have a skewed HPET timer once the SoCs
> >> -         * entered PC10.
> >> +         * Some Coffee Lake and later platforms have a skewed HPET timer once
> >> +         * they entered PC10.
> >> +         *
> >> +         * Check whether the system supports PC10. If so force disable HPET as
> >> +         * that stops counting in PC10. This check is overbroad as it does not
> >> +         * take any of the following into account:
> >> +         *
> >> +         *	- ACPI tables
> >> +         *	- Enablement of mwait-idle
> >> +         *	- Command line arguments which limit mwait-idle C-state support
> >> +         *
> >> +         * That's perfectly fine. HPET is a piece of hardware designed by
> >> +         * committee and the only reasons why it is still in use on modern
> >> +         * systems is the fact that it is impossible to reliably query TSC and
> >> +         * CPU frequency via CPUID or firmware.
> >> +         *
> >> +         * If HPET is functional it is useful for calibrating TSC, but this can
> >> +         * be done via PMTIMER as well which seems to be the last remaining
> >> +         * timer on X86/INTEL platforms that has not been completely wreckaged
> >> +         * by feature creep.
> >> +         *
> >> +         * In theory HPET support should be removed altogether, but there are
> >> +         * older systems out there which depend on it because TSC and APIC timer
> >> +         * are dysfunctional in deeper C-states.
> >>           */
> >> -        if ( pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
> >> -                             PCI_VENDOR_ID) == PCI_VENDOR_ID_INTEL &&
> >> -             pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
> >> -                             PCI_DEVICE_ID) == 0x3ec4 )
> >> -            hpet_address = 0;
> >> +        if ( mwait_pc10_supported() )
> >> +        {
> >> +            uint64_t pcfg;
> >> +
> >> +            rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> >> +            if ( (pcfg & 0xf) < 8 )
> >> +                /* nothing */;
> >> +            else if ( !strcmp(opt_clocksource, pts->id) )
> >> +                printk("HPET use requested via command line, but dysfunctional in PC10\n");
> >> +            else
> >> +                hpet_address = 0;
> > 
> > Should we print a message that HPET is being disabled?
> 
> There is one, and it was even visible in patch context that you
> did strip from your reply:

I meant something about being disabled for PC10, but I think the
generic one is fine enough.

Thanks, Roger.
Andrew Cooper Oct. 25, 2021, 10:53 p.m. UTC | #4
On 19/10/2021 08:07, Jan Beulich wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
>
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
>
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, mwait-idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "clocksource=hpet" on the Xen command line.
>
> Remove the related PCI quirks for affected Coffee Lake systems as they
> are not longer required. That should also cover all other systems, i.e.
> Ice Lake, Tiger Lake, and newer generations, which are most likely
> affected by this as well.
>
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [Linux commit: 6e3cd95234dc1eda488f4f487c281bac8fef4d9b]
>
> I have to admit that the purpose of checking CPUID5_ECX_INTERRUPT_BREAK
> is unclear to me, but I didn't want to diverge in technical aspects from
> the Linux commit.
>
> In mwait_pc10_supported(), besides some cosmetic adjustments, avoid UB
> from shifting left a signed 4-bit constant by 28 bits.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This breaks booting on recent Intel platforms.

Ian: Complete blocker for the release.

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -34,6 +34,7 @@ 
 #include <asm/fixmap.h>
 #include <asm/guest.h>
 #include <asm/mc146818rtc.h>
+#include <asm/mwait.h>
 #include <asm/div64.h>
 #include <asm/acpi.h>
 #include <asm/hpet.h>
@@ -395,14 +396,43 @@  static int64_t __init init_hpet(struct p
             }
 
         /*
-         * Some Coffee Lake platforms have a skewed HPET timer once the SoCs
-         * entered PC10.
+         * Some Coffee Lake and later platforms have a skewed HPET timer once
+         * they entered PC10.
+         *
+         * Check whether the system supports PC10. If so force disable HPET as
+         * that stops counting in PC10. This check is overbroad as it does not
+         * take any of the following into account:
+         *
+         *	- ACPI tables
+         *	- Enablement of mwait-idle
+         *	- Command line arguments which limit mwait-idle C-state support
+         *
+         * That's perfectly fine. HPET is a piece of hardware designed by
+         * committee and the only reasons why it is still in use on modern
+         * systems is the fact that it is impossible to reliably query TSC and
+         * CPU frequency via CPUID or firmware.
+         *
+         * If HPET is functional it is useful for calibrating TSC, but this can
+         * be done via PMTIMER as well which seems to be the last remaining
+         * timer on X86/INTEL platforms that has not been completely wreckaged
+         * by feature creep.
+         *
+         * In theory HPET support should be removed altogether, but there are
+         * older systems out there which depend on it because TSC and APIC timer
+         * are dysfunctional in deeper C-states.
          */
-        if ( pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
-                             PCI_VENDOR_ID) == PCI_VENDOR_ID_INTEL &&
-             pci_conf_read16(PCI_SBDF(0, 0, 0, 0),
-                             PCI_DEVICE_ID) == 0x3ec4 )
-            hpet_address = 0;
+        if ( mwait_pc10_supported() )
+        {
+            uint64_t pcfg;
+
+            rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+            if ( (pcfg & 0xf) < 8 )
+                /* nothing */;
+            else if ( !strcmp(opt_clocksource, pts->id) )
+                printk("HPET use requested via command line, but dysfunctional in PC10\n");
+            else
+                hpet_address = 0;
+        }
 
         if ( !hpet_address )
             printk("Disabling HPET for being unreliable\n");
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -1308,3 +1308,20 @@  int __init mwait_idle_init(struct notifi
 
 	return err;
 }
+
+/* Helper function for HPET. */
+bool __init mwait_pc10_supported(void)
+{
+	unsigned int ecx, edx, dummy;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+	    !cpu_has_monitor ||
+	    boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+		return false;
+
+	cpuid(CPUID_MWAIT_LEAF, &dummy, &dummy, &ecx, &edx);
+
+	return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) &&
+	       (ecx & CPUID5_ECX_INTERRUPT_BREAK) &&
+	       (edx >> 28);
+}
--- a/xen/include/asm-x86/mwait.h
+++ b/xen/include/asm-x86/mwait.h
@@ -1,6 +1,8 @@ 
 #ifndef __ASM_X86_MWAIT_H__
 #define __ASM_X86_MWAIT_H__
 
+#include <xen/types.h>
+
 #define MWAIT_SUBSTATE_MASK		0xf
 #define MWAIT_CSTATE_MASK		0xf
 #define MWAIT_SUBSTATE_SIZE		4
@@ -12,5 +14,6 @@ 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
 
 void mwait_idle_with_hints(unsigned int eax, unsigned int ecx);
+bool mwait_pc10_supported(void);
 
 #endif /* __ASM_X86_MWAIT_H__ */