diff mbox series

[v3,2/2] x86/idle: prevent entering C6 with in service interrupts on Intel

Message ID 20200515135802.63853-3-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/idle: fix for Intel ISR errata | expand

Commit Message

Roger Pau Monne May 15, 2020, 1:58 p.m. UTC
Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125,
BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be
Dispatched Before an Interrupt of The Same Priority Completes".

Apply the errata to all server and client models (big cores) from
Broadwell to Cascade Lake. The workaround is grouped together with the
existing fix for errata AAJ72, and the eoi from the function name is
removed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Use x86_match_cpu and apply the workaround to all models from
   Broadwell to Cascade Lake.
 - Rename command line option to disable-c6-errata.

Changes since v1:
 - Unify workaround with errata_c6_eoi_workaround.
 - Properly check state in both acpi and mwait drivers.
---
 docs/misc/xen-command-line.pandoc |  9 +++++++
 xen/arch/x86/acpi/cpu_idle.c      | 39 +++++++++++++++++++++++++++----
 xen/arch/x86/cpu/mwait-idle.c     |  2 +-
 xen/include/asm-x86/cpuidle.h     |  2 +-
 4 files changed, 46 insertions(+), 6 deletions(-)

Comments

Jan Beulich May 18, 2020, 3:05 p.m. UTC | #1
On 15.05.2020 15:58, Roger Pau Monne wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>  additionally a trace buffer of the specified size is allocated per cpu.
>  The debug trace feature is only enabled in debugging builds of Xen.
>  
> +### disable-c6-errata

Hmm, yes please - a disable for errata! ;-)

How about "avoid-c6-errata", and then perhaps as a sub-option to
"cpuidle="? (If we really want a control for this in the first
place.)

> @@ -573,10 +574,40 @@ bool errata_c6_eoi_workaround(void)
>              INTEL_FAM6_MODEL(0x2f),
>              { }
>          };
> +        /*
> +         * Errata BDX99, CLX30, SKX100, CFW125, BDF104, BDH85, BDM135, KWB131:
> +         * A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of
> +         * The Same Priority Completes.
> +         *
> +         * Resuming from C6 Sleep-State, with Fixed Interrupts of the same
> +         * priority queued (in the corresponding bits of the IRR and ISR APIC
> +         * registers), the processor may dispatch the second interrupt (from
> +         * the IRR bit) before the first interrupt has completed and written to
> +         * the EOI register, causing the first interrupt to never complete.
> +         */
> +        const static struct x86_cpu_id isr_errata[] = {

Same nit as for patch 1 here.

Jan
Roger Pau Monne May 18, 2020, 3:45 p.m. UTC | #2
On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
> 
> On 15.05.2020 15:58, Roger Pau Monne wrote:
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
> >  additionally a trace buffer of the specified size is allocated per cpu.
> >  The debug trace feature is only enabled in debugging builds of Xen.
> >  
> > +### disable-c6-errata
> 
> Hmm, yes please - a disable for errata! ;-)
> 
> How about "avoid-c6-errata", and then perhaps as a sub-option to
> "cpuidle="? (If we really want a control for this in the first
> place.)

Right, I see I'm very bad at naming. Not sure it's even worth it
maybe?

I can remove it completely from the patch if that is OK.

Thanks, Roger.
Jan Beulich May 18, 2020, 3:47 p.m. UTC | #3
On 18.05.2020 17:45, Roger Pau Monné wrote:
> On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
>>
>> On 15.05.2020 15:58, Roger Pau Monne wrote:
>>> --- a/docs/misc/xen-command-line.pandoc
>>> +++ b/docs/misc/xen-command-line.pandoc
>>> @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>>>  additionally a trace buffer of the specified size is allocated per cpu.
>>>  The debug trace feature is only enabled in debugging builds of Xen.
>>>  
>>> +### disable-c6-errata
>>
>> Hmm, yes please - a disable for errata! ;-)
>>
>> How about "avoid-c6-errata", and then perhaps as a sub-option to
>> "cpuidle="? (If we really want a control for this in the first
>> place.)
> 
> Right, I see I'm very bad at naming. Not sure it's even worth it
> maybe?
> 
> I can remove it completely from the patch if that is OK.

I'd be fine without. Andrew?

Jan
Andrew Cooper May 20, 2020, 6:38 p.m. UTC | #4
On 18/05/2020 16:47, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
>
> On 18.05.2020 17:45, Roger Pau Monné wrote:
>> On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
>>>
>>> On 15.05.2020 15:58, Roger Pau Monne wrote:
>>>> --- a/docs/misc/xen-command-line.pandoc
>>>> +++ b/docs/misc/xen-command-line.pandoc
>>>> @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>>>>  additionally a trace buffer of the specified size is allocated per cpu.
>>>>  The debug trace feature is only enabled in debugging builds of Xen.
>>>>  
>>>> +### disable-c6-errata
>>> Hmm, yes please - a disable for errata! ;-)
>>>
>>> How about "avoid-c6-errata", and then perhaps as a sub-option to
>>> "cpuidle="? (If we really want a control for this in the first
>>> place.)
>> Right, I see I'm very bad at naming. Not sure it's even worth it
>> maybe?
>>
>> I can remove it completely from the patch if that is OK.
> I'd be fine without. Andrew?

Yeah - the only thing people can do with this is shoot themselves in the
foot.

There's frankly no need to give them the option in the first place.

~Andrew
Andrew Cooper May 20, 2020, 9:30 p.m. UTC | #5
On 15/05/2020 14:58, Roger Pau Monne wrote:
> Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125,
> BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be
> Dispatched Before an Interrupt of The Same Priority Completes".

HSM175 et al, so presumably a HSD, and HSE as well.

On the broadwell side at least, BDD BDW in addition

~Andrew
Roger Pau Monne May 21, 2020, 8:45 a.m. UTC | #6
On Wed, May 20, 2020 at 10:30:11PM +0100, Andrew Cooper wrote:
> On 15/05/2020 14:58, Roger Pau Monne wrote:
> > Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125,
> > BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be
> > Dispatched Before an Interrupt of The Same Priority Completes".
> 
> HSM175 et al, so presumably a HSD, and HSE as well.
> 
> On the broadwell side at least, BDD BDW in addition

But those are a different errata AFAICT ('An APIC Timer Interrupt
During Core C6 Entry May be Lost') and the workaround should also be
different I think. We should mark the lapic timer as not reliable on
C6 or higher states in lapic_timer_reliable_states, so that it's
disabled before entering sleep?

Thanks, Roger.
Andrew Cooper May 21, 2020, 4:27 p.m. UTC | #7
On 21/05/2020 09:45, Roger Pau Monné wrote:
> On Wed, May 20, 2020 at 10:30:11PM +0100, Andrew Cooper wrote:
>> On 15/05/2020 14:58, Roger Pau Monne wrote:
>>> Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125,
>>> BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be
>>> Dispatched Before an Interrupt of The Same Priority Completes".
>> HSM175 et al, so presumably a HSD, and HSE as well.
>>
>> On the broadwell side at least, BDD BDW in addition
> But those are a different errata AFAICT ('An APIC Timer Interrupt
> During Core C6 Entry May be Lost') and the workaround should also be
> different I think.

Hmm, so it is.

The issue in question here definitely does affect Haswell, because that
is where we first observed it.  There was also a report on xen-devel
against Haswell.

If the errata are missing, then I think Intel needs some more chasing to
work out the real extent of the problems.

> We should mark the lapic timer as not reliable on
> C6 or higher states in lapic_timer_reliable_states, so that it's
> disabled before entering sleep?

Probably should.

~Andrew
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index ee12b0f53f..8dd944b357 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -652,6 +652,15 @@  Specify the size of the console debug trace buffer. By specifying `cpu:`
 additionally a trace buffer of the specified size is allocated per cpu.
 The debug trace feature is only enabled in debugging builds of Xen.
 
+### disable-c6-errata
+> `= <boolean>`
+
+> Default: `true for affected Intel CPUs`
+
+Workaround for Intel errata AAJ72 and BDX99, CLX30, SKX100, CFW125, BDF104,
+BDH85, BDM135, KWB131. Prevent entering C6 idle states when certain conditions
+are meet in order to avoid triggering the listed erratas.
+
 ### dma_bits
 > `= <integer>`
 
diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 0efdaff21b..2fa1ccc031 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -548,9 +548,10 @@  void trace_exit_reason(u32 *irq_traced)
     }
 }
 
-bool errata_c6_eoi_workaround(void)
+bool errata_c6_workaround(void)
 {
     static int8_t __read_mostly fix_needed = -1;
+    boolean_param("disable-c6-errata", fix_needed);
 
     if ( unlikely(fix_needed == -1) )
     {
@@ -573,10 +574,40 @@  bool errata_c6_eoi_workaround(void)
             INTEL_FAM6_MODEL(0x2f),
             { }
         };
+        /*
+         * Errata BDX99, CLX30, SKX100, CFW125, BDF104, BDH85, BDM135, KWB131:
+         * A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of
+         * The Same Priority Completes.
+         *
+         * Resuming from C6 Sleep-State, with Fixed Interrupts of the same
+         * priority queued (in the corresponding bits of the IRR and ISR APIC
+         * registers), the processor may dispatch the second interrupt (from
+         * the IRR bit) before the first interrupt has completed and written to
+         * the EOI register, causing the first interrupt to never complete.
+         */
+        const static struct x86_cpu_id isr_errata[] = {
+            /* Broadwell */
+            INTEL_FAM6_MODEL(0x47),
+            INTEL_FAM6_MODEL(0x3d),
+            INTEL_FAM6_MODEL(0x4f),
+            INTEL_FAM6_MODEL(0x56),
+            /* Skylake (client) */
+            INTEL_FAM6_MODEL(0x5e),
+            INTEL_FAM6_MODEL(0x4e),
+            /* {Sky/Cascade}lake (server) */
+            INTEL_FAM6_MODEL(0x55),
+            /* {Kaby/Coffee/Whiskey/Amber} Lake */
+            INTEL_FAM6_MODEL(0x9e),
+            INTEL_FAM6_MODEL(0x8e),
+            /* Cannon Lake */
+            INTEL_FAM6_MODEL(0x66),
+            { }
+        };
 #undef INTEL_FAM6_MODEL
 
-        fix_needed = cpu_has_apic && !directed_eoi_enabled &&
-                     x86_match_cpu(eoi_errata);
+        fix_needed = cpu_has_apic &&
+                     ((!directed_eoi_enabled && x86_match_cpu(eoi_errata)) ||
+                      x86_match_cpu(isr_errata));
     }
 
     return (fix_needed && cpu_has_pending_apic_eoi());
@@ -685,7 +716,7 @@  static void acpi_processor_idle(void)
         return;
     }
 
-    if ( (cx->type >= ACPI_STATE_C3) && errata_c6_eoi_workaround() )
+    if ( (cx->type >= ACPI_STATE_C3) && errata_c6_workaround() )
         cx = power->safe_state;
 
 
diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 88a3e160c5..52eab81bf8 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -770,7 +770,7 @@  static void mwait_idle(void)
 		return;
 	}
 
-	if ((cx->type >= 3) && errata_c6_eoi_workaround())
+	if ((cx->type >= 3) && errata_c6_workaround())
 		cx = power->safe_state;
 
 	eax = cx->address;
diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h
index 13879f58a1..dc7298a538 100644
--- a/xen/include/asm-x86/cpuidle.h
+++ b/xen/include/asm-x86/cpuidle.h
@@ -26,5 +26,5 @@  void update_idle_stats(struct acpi_processor_power *,
 void update_last_cx_stat(struct acpi_processor_power *,
                          struct acpi_processor_cx *, uint64_t);
 
-bool errata_c6_eoi_workaround(void);
+bool errata_c6_workaround(void);
 #endif /* __X86_ASM_CPUIDLE_H__ */