diff mbox

[1/4] KVM: svm: prevent MWAIT in guest with erratum 400

Message ID 20170503193733.13409-2-rkrcmar@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář May 3, 2017, 7:37 p.m. UTC
The host might miss APIC timer interrupts if the guest enters a specific
C-state.  Quoting the erratum:

  400 APIC Timer Interrupt Does Not Occur in Processor C-States

  Description

  An APIC timer interrupt that becomes pending in low-power states C1E
  or C3 will not cause the processor to enter the C0 state even if the
  interrupt is enabled by Timer Local Vector Table Entry[Mask],
  APIC320[16]). APIC timer functionality is otherwise unaffected.

  Potential Effect on System

  System hang may occur provided that the operating system has not
  configured another interrupt source.  APIC timer interrupts may be
  delayed or, when the APIC timer is configured in rollover mode
  (APIC320[17]), the APIC timer may roll over multiple times in the
  low-power state with only one interrupt presented after the processor
  resumes. The standard use of the APIC timer does not make this effect
  significant.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/x86.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Borislav Petkov May 3, 2017, 8:11 p.m. UTC | #1
On Wed, May 03, 2017 at 09:37:30PM +0200, Radim Krčmář wrote:
> The host might miss APIC timer interrupts if the guest enters a specific
> C-state.  Quoting the erratum:
> 
>   400 APIC Timer Interrupt Does Not Occur in Processor C-States
> 
>   Description
> 
>   An APIC timer interrupt that becomes pending in low-power states C1E
>   or C3 will not cause the processor to enter the C0 state even if the
>   interrupt is enabled by Timer Local Vector Table Entry[Mask],
>   APIC320[16]). APIC timer functionality is otherwise unaffected.
> 
>   Potential Effect on System
> 
>   System hang may occur provided that the operating system has not
>   configured another interrupt source.  APIC timer interrupts may be
>   delayed or, when the APIC timer is configured in rollover mode
>   (APIC320[17]), the APIC timer may roll over multiple times in the
>   low-power state with only one interrupt presented after the processor
>   resumes. The standard use of the APIC timer does not make this effect
>   significant.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/x86.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 612067074905..3ed7dd8737ab 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -223,8 +223,7 @@ static inline bool kvm_mwait_in_guest(void)
>  
>  	switch (boot_cpu_data.x86_vendor) {
>  	case X86_VENDOR_AMD:
> -		/* All AMD CPUs have a working MWAIT implementation */
> -		return true;
> +		return !boot_cpu_has_bug(X86_BUG_AMD_E400);

Well, this looks wrong: it is X86_BUG_AMD_APIC_C1E, which actually
denotes that we must enable the E400 workaround because the platform
actually goes into C1E.

X86_BUG_AMD_E400 gets set only on the affected f/m/s range but if the
BIOS doesn't put the CPU in C1E, we don't hit the erratum and all is
peachy.

Also, what do APIC timer interrupts even have to do with MWAIT-ing in
the guest, especially if we enable the workaround and switch to HPET on
the host? Maybe I'm missing something here...
Radim Krčmář May 4, 2017, 2:02 p.m. UTC | #2
2017-05-03 22:11+0200, Borislav Petkov:
> On Wed, May 03, 2017 at 09:37:30PM +0200, Radim Krčmář wrote:
>> The host might miss APIC timer interrupts if the guest enters a specific
>> C-state.  Quoting the erratum:
>> 
>>   400 APIC Timer Interrupt Does Not Occur in Processor C-States
>> 
>>   Description
>> 
>>   An APIC timer interrupt that becomes pending in low-power states C1E
>>   or C3 will not cause the processor to enter the C0 state even if the
>>   interrupt is enabled by Timer Local Vector Table Entry[Mask],
>>   APIC320[16]). APIC timer functionality is otherwise unaffected.
>> 
>>   Potential Effect on System
>> 
>>   System hang may occur provided that the operating system has not
>>   configured another interrupt source.  APIC timer interrupts may be
>>   delayed or, when the APIC timer is configured in rollover mode
>>   (APIC320[17]), the APIC timer may roll over multiple times in the
>>   low-power state with only one interrupt presented after the processor
>>   resumes. The standard use of the APIC timer does not make this effect
>>   significant.
>> 
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  arch/x86/kvm/x86.h | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 612067074905..3ed7dd8737ab 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -223,8 +223,7 @@ static inline bool kvm_mwait_in_guest(void)
>>  
>>  	switch (boot_cpu_data.x86_vendor) {
>>  	case X86_VENDOR_AMD:
>> -		/* All AMD CPUs have a working MWAIT implementation */
>> -		return true;
>> +		return !boot_cpu_has_bug(X86_BUG_AMD_E400);
> 
> Well, this looks wrong: it is X86_BUG_AMD_APIC_C1E, which actually
> denotes that we must enable the E400 workaround because the platform
> actually goes into C1E.

X86_BUG_AMD_APIC_C1E doesn't cover C3, which is why I used
X86_BUG_AMD_E400.

> X86_BUG_AMD_E400 gets set only on the affected f/m/s range but if the
> BIOS doesn't put the CPU in C1E, we don't hit the erratum and all is
> peachy.
> 
> Also, what do APIC timer interrupts even have to do with MWAIT-ing in
> the guest, especially if we enable the workaround and switch to HPET on
> the host? Maybe I'm missing something here...

The host uses APIC timer when entering a guest and I assumed that MWAIT
can change C states, but it seems that affected AMD models do not even
support MWAIT hints and the package is in C0 the whole time.

I'll drop this patch if this is what you meant, thanks.
Borislav Petkov May 4, 2017, 4:45 p.m. UTC | #3
On Thu, May 04, 2017 at 04:02:44PM +0200, Radim Krčmář wrote:
> X86_BUG_AMD_APIC_C1E doesn't cover C3, which is why I used
> X86_BUG_AMD_E400.

Practically, the CPUs which even support C3 with E400 are single-core
devices and C3 support was removed with revision C3 (yap, the same as
the ACPI state) so I think we can safely ignore C3 here.

I mean, otherwise, we would be seeing E400 in action as our workaround
would not be sufficient. And even then, I don't think we ever go into C3
on those machines as we do HLT which enters C1.

> The host uses APIC timer when entering a guest and I assumed that MWAIT
> can change C states, but it seems that affected AMD models do not even
> support MWAIT hints and the package is in C0 the whole time.

Right, we never do MWAIT when idle on AMD but HLT.

I notice now that MWAIT text has received recent additions talking
about C-state hints but we'd need to evaluate that first and how power
consumption behaves and it probably would work the same way as it does
on Intel.

Thanks.
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 612067074905..3ed7dd8737ab 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -223,8 +223,7 @@  static inline bool kvm_mwait_in_guest(void)
 
 	switch (boot_cpu_data.x86_vendor) {
 	case X86_VENDOR_AMD:
-		/* All AMD CPUs have a working MWAIT implementation */
-		return true;
+		return !boot_cpu_has_bug(X86_BUG_AMD_E400);
 	case X86_VENDOR_INTEL:
 		/* Handle Intel below */
 		break;