Message ID | 20200515135802.63853-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/idle: fix for Intel ISR errata | expand |
On 15.05.2020 15:58, Roger Pau Monne wrote: > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` > additionally a trace buffer of the specified size is allocated per cpu. > The debug trace feature is only enabled in debugging builds of Xen. > > +### disable-c6-errata Hmm, yes please - a disable for errata! ;-) How about "avoid-c6-errata", and then perhaps as a sub-option to "cpuidle="? (If we really want a control for this in the first place.) > @@ -573,10 +574,40 @@ bool errata_c6_eoi_workaround(void) > INTEL_FAM6_MODEL(0x2f), > { } > }; > + /* > + * Errata BDX99, CLX30, SKX100, CFW125, BDF104, BDH85, BDM135, KWB131: > + * A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of > + * The Same Priority Completes. > + * > + * Resuming from C6 Sleep-State, with Fixed Interrupts of the same > + * priority queued (in the corresponding bits of the IRR and ISR APIC > + * registers), the processor may dispatch the second interrupt (from > + * the IRR bit) before the first interrupt has completed and written to > + * the EOI register, causing the first interrupt to never complete. > + */ > + const static struct x86_cpu_id isr_errata[] = { Same nit as for patch 1 here. Jan
On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote: > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > > On 15.05.2020 15:58, Roger Pau Monne wrote: > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` > > additionally a trace buffer of the specified size is allocated per cpu. > > The debug trace feature is only enabled in debugging builds of Xen. > > > > +### disable-c6-errata > > Hmm, yes please - a disable for errata! ;-) > > How about "avoid-c6-errata", and then perhaps as a sub-option to > "cpuidle="? (If we really want a control for this in the first > place.) Right, I see I'm very bad at naming. Not sure it's even worth it maybe? I can remove it completely from the patch if that is OK. Thanks, Roger.
On 18.05.2020 17:45, Roger Pau Monné wrote: > On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote: >> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. >> >> On 15.05.2020 15:58, Roger Pau Monne wrote: >>> --- a/docs/misc/xen-command-line.pandoc >>> +++ b/docs/misc/xen-command-line.pandoc >>> @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` >>> additionally a trace buffer of the specified size is allocated per cpu. >>> The debug trace feature is only enabled in debugging builds of Xen. >>> >>> +### disable-c6-errata >> >> Hmm, yes please - a disable for errata! ;-) >> >> How about "avoid-c6-errata", and then perhaps as a sub-option to >> "cpuidle="? (If we really want a control for this in the first >> place.) > > Right, I see I'm very bad at naming. Not sure it's even worth it > maybe? > > I can remove it completely from the patch if that is OK. I'd be fine without. Andrew? Jan
On 18/05/2020 16:47, Jan Beulich wrote: > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > > On 18.05.2020 17:45, Roger Pau Monné wrote: >> On Mon, May 18, 2020 at 05:05:12PM +0200, Jan Beulich wrote: >>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. >>> >>> On 15.05.2020 15:58, Roger Pau Monne wrote: >>>> --- a/docs/misc/xen-command-line.pandoc >>>> +++ b/docs/misc/xen-command-line.pandoc >>>> @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` >>>> additionally a trace buffer of the specified size is allocated per cpu. >>>> The debug trace feature is only enabled in debugging builds of Xen. >>>> >>>> +### disable-c6-errata >>> Hmm, yes please - a disable for errata! ;-) >>> >>> How about "avoid-c6-errata", and then perhaps as a sub-option to >>> "cpuidle="? (If we really want a control for this in the first >>> place.) >> Right, I see I'm very bad at naming. Not sure it's even worth it >> maybe? >> >> I can remove it completely from the patch if that is OK. > I'd be fine without. Andrew? Yeah - the only thing people can do with this is shoot themselves in the foot. There's frankly no need to give them the option in the first place. ~Andrew
On 15/05/2020 14:58, Roger Pau Monne wrote: > Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125, > BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be > Dispatched Before an Interrupt of The Same Priority Completes". HSM175 et al, so presumably a HSD, and HSE as well. On the broadwell side at least, BDD BDW in addition ~Andrew
On Wed, May 20, 2020 at 10:30:11PM +0100, Andrew Cooper wrote: > On 15/05/2020 14:58, Roger Pau Monne wrote: > > Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125, > > BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be > > Dispatched Before an Interrupt of The Same Priority Completes". > > HSM175 et al, so presumably a HSD, and HSE as well. > > On the broadwell side at least, BDD BDW in addition But those are a different errata AFAICT ('An APIC Timer Interrupt During Core C6 Entry May be Lost') and the workaround should also be different I think. We should mark the lapic timer as not reliable on C6 or higher states in lapic_timer_reliable_states, so that it's disabled before entering sleep? Thanks, Roger.
On 21/05/2020 09:45, Roger Pau Monné wrote: > On Wed, May 20, 2020 at 10:30:11PM +0100, Andrew Cooper wrote: >> On 15/05/2020 14:58, Roger Pau Monne wrote: >>> Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125, >>> BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be >>> Dispatched Before an Interrupt of The Same Priority Completes". >> HSM175 et al, so presumably a HSD, and HSE as well. >> >> On the broadwell side at least, BDD BDW in addition > But those are a different errata AFAICT ('An APIC Timer Interrupt > During Core C6 Entry May be Lost') and the workaround should also be > different I think. Hmm, so it is. The issue in question here definitely does affect Haswell, because that is where we first observed it. There was also a report on xen-devel against Haswell. If the errata are missing, then I think Intel needs some more chasing to work out the real extent of the problems. > We should mark the lapic timer as not reliable on > C6 or higher states in lapic_timer_reliable_states, so that it's > disabled before entering sleep? Probably should. ~Andrew
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index ee12b0f53f..8dd944b357 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -652,6 +652,15 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` additionally a trace buffer of the specified size is allocated per cpu. The debug trace feature is only enabled in debugging builds of Xen. +### disable-c6-errata +> `= <boolean>` + +> Default: `true for affected Intel CPUs` + +Workaround for Intel errata AAJ72 and BDX99, CLX30, SKX100, CFW125, BDF104, +BDH85, BDM135, KWB131. Prevent entering C6 idle states when certain conditions +are meet in order to avoid triggering the listed erratas. + ### dma_bits > `= <integer>` diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index 0efdaff21b..2fa1ccc031 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -548,9 +548,10 @@ void trace_exit_reason(u32 *irq_traced) } } -bool errata_c6_eoi_workaround(void) +bool errata_c6_workaround(void) { static int8_t __read_mostly fix_needed = -1; + boolean_param("disable-c6-errata", fix_needed); if ( unlikely(fix_needed == -1) ) { @@ -573,10 +574,40 @@ bool errata_c6_eoi_workaround(void) INTEL_FAM6_MODEL(0x2f), { } }; + /* + * Errata BDX99, CLX30, SKX100, CFW125, BDF104, BDH85, BDM135, KWB131: + * A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of + * The Same Priority Completes. + * + * Resuming from C6 Sleep-State, with Fixed Interrupts of the same + * priority queued (in the corresponding bits of the IRR and ISR APIC + * registers), the processor may dispatch the second interrupt (from + * the IRR bit) before the first interrupt has completed and written to + * the EOI register, causing the first interrupt to never complete. + */ + const static struct x86_cpu_id isr_errata[] = { + /* Broadwell */ + INTEL_FAM6_MODEL(0x47), + INTEL_FAM6_MODEL(0x3d), + INTEL_FAM6_MODEL(0x4f), + INTEL_FAM6_MODEL(0x56), + /* Skylake (client) */ + INTEL_FAM6_MODEL(0x5e), + INTEL_FAM6_MODEL(0x4e), + /* {Sky/Cascade}lake (server) */ + INTEL_FAM6_MODEL(0x55), + /* {Kaby/Coffee/Whiskey/Amber} Lake */ + INTEL_FAM6_MODEL(0x9e), + INTEL_FAM6_MODEL(0x8e), + /* Cannon Lake */ + INTEL_FAM6_MODEL(0x66), + { } + }; #undef INTEL_FAM6_MODEL - fix_needed = cpu_has_apic && !directed_eoi_enabled && - x86_match_cpu(eoi_errata); + fix_needed = cpu_has_apic && + ((!directed_eoi_enabled && x86_match_cpu(eoi_errata)) || + x86_match_cpu(isr_errata)); } return (fix_needed && cpu_has_pending_apic_eoi()); @@ -685,7 +716,7 @@ static void acpi_processor_idle(void) return; } - if ( (cx->type >= ACPI_STATE_C3) && errata_c6_eoi_workaround() ) + if ( (cx->type >= ACPI_STATE_C3) && errata_c6_workaround() ) cx = power->safe_state; diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c index 88a3e160c5..52eab81bf8 100644 --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -770,7 +770,7 @@ static void mwait_idle(void) return; } - if ((cx->type >= 3) && errata_c6_eoi_workaround()) + if ((cx->type >= 3) && errata_c6_workaround()) cx = power->safe_state; eax = cx->address; diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h index 13879f58a1..dc7298a538 100644 --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -26,5 +26,5 @@ void update_idle_stats(struct acpi_processor_power *, void update_last_cx_stat(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t); -bool errata_c6_eoi_workaround(void); +bool errata_c6_workaround(void); #endif /* __X86_ASM_CPUIDLE_H__ */
Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125, BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of The Same Priority Completes". Apply the errata to all server and client models (big cores) from Broadwell to Cascade Lake. The workaround is grouped together with the existing fix for errata AAJ72, and the eoi from the function name is removed. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v2: - Use x86_match_cpu and apply the workaround to all models from Broadwell to Cascade Lake. - Rename command line option to disable-c6-errata. Changes since v1: - Unify workaround with errata_c6_eoi_workaround. - Properly check state in both acpi and mwait drivers. --- docs/misc/xen-command-line.pandoc | 9 +++++++ xen/arch/x86/acpi/cpu_idle.c | 39 +++++++++++++++++++++++++++---- xen/arch/x86/cpu/mwait-idle.c | 2 +- xen/include/asm-x86/cpuidle.h | 2 +- 4 files changed, 46 insertions(+), 6 deletions(-)