diff mbox

Host latency peaks due to kvm-intel

Message ID 4A6AD69E.7030201@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka July 25, 2009, 9:55 a.m. UTC
Avi Kivity wrote:
> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
>> I vaguely recall that someone promised to add a feature reporting
>> facility for all those nice things, modern VM-extensions may or may not
>> support (something like or even an extension of /proc/cpuinfo). What is
>> the state of this plan? Would be specifically interesting for Intel CPUs
>> as there seem to be many of them out there with restrictions for special
>> use cases - like real-time.
>>    
> 
> Newer kernels do report some vmx features (like flexpriority) in
> /proc/cpuinfo but not all.
> 

Ah, nice. Then we just need this?

------------>

From: Jan Kiszka <jan.kiszka@siemens.com>
Subject: [PATCH] x86: Report VMX feature vwbinvd

Not all VMX-capable CPUs support guest exists on wbinvd execution. If
this is not supported, the instruction will run natively on behalf of
the guest. This can cause multi-millisecond latencies to the host which
is very problematic in real-time scenarios.

Report the wbinvd trapping feature along with other VMX feature flags,
calling it 'vwbinvd' ('virtual wbinvd').

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/kernel/cpu/intel.c       |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

Comments

Avi Kivity July 25, 2009, 1:27 p.m. UTC | #1
On 07/25/2009 12:55 PM, Jan Kiszka wrote:
> Avi Kivity wrote:
>    
>> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
>>      
>>> I vaguely recall that someone promised to add a feature reporting
>>> facility for all those nice things, modern VM-extensions may or may not
>>> support (something like or even an extension of /proc/cpuinfo). What is
>>> the state of this plan? Would be specifically interesting for Intel CPUs
>>> as there seem to be many of them out there with restrictions for special
>>> use cases - like real-time.
>>>
>>>        
>> Newer kernels do report some vmx features (like flexpriority) in
>> /proc/cpuinfo but not all.
>>
>>      
>
> Ah, nice. Then we just need this?
>
> ------------>
>
> From: Jan Kiszka<jan.kiszka@siemens.com>
> Subject: [PATCH] x86: Report VMX feature vwbinvd
>
> Not all VMX-capable CPUs support guest exists on wbinvd execution. If
> this is not supported, the instruction will run natively on behalf of
> the guest. This can cause multi-millisecond latencies to the host which
> is very problematic in real-time scenarios.
>
> Report the wbinvd trapping feature along with other VMX feature flags,
> calling it 'vwbinvd' ('virtual wbinvd').
>
>    

What about AMD cpus that can always trap wbinvd?  do we set the bit or 
do we trust the user to know that it isn't needed on AMD (I suppose the 
latter)?

This should go in via tip.git, it isn't really kvm related (except that 
kvm should start reading these caps one day instead of querying the 
hardware directly).
Jan Kiszka July 26, 2009, 2:23 p.m. UTC | #2
Avi Kivity wrote:
> On 07/25/2009 12:55 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>   
>>> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
>>>     
>>>> I vaguely recall that someone promised to add a feature reporting
>>>> facility for all those nice things, modern VM-extensions may or may not
>>>> support (something like or even an extension of /proc/cpuinfo). What is
>>>> the state of this plan? Would be specifically interesting for Intel
>>>> CPUs
>>>> as there seem to be many of them out there with restrictions for
>>>> special
>>>> use cases - like real-time.
>>>>
>>>>        
>>> Newer kernels do report some vmx features (like flexpriority) in
>>> /proc/cpuinfo but not all.
>>>
>>>      
>>
>> Ah, nice. Then we just need this?
>>
>> ------------>
>>
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>> Subject: [PATCH] x86: Report VMX feature vwbinvd
>>
>> Not all VMX-capable CPUs support guest exists on wbinvd execution. If
>> this is not supported, the instruction will run natively on behalf of
>> the guest. This can cause multi-millisecond latencies to the host which
>> is very problematic in real-time scenarios.
>>
>> Report the wbinvd trapping feature along with other VMX feature flags,
>> calling it 'vwbinvd' ('virtual wbinvd').
>>
>>    
> 
> What about AMD cpus that can always trap wbinvd?  do we set the bit or
> do we trust the user to know that it isn't needed on AMD (I suppose the
> latter)?

I also think that the feature flags should remain vendor-specific.

> 
> This should go in via tip.git, it isn't really kvm related (except that
> kvm should start reading these caps one day instead of querying the
> hardware directly).
> 

OK, will go that way. Probably I will also add some flags for AMD's NPT,
Intel's EPT and they new unrestricted guest mode at this chance.

Jan
H. Peter Anvin July 26, 2009, 7:16 p.m. UTC | #3
Jan Kiszka wrote:
> Avi Kivity wrote:
>> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
>>> I vaguely recall that someone promised to add a feature reporting
>>> facility for all those nice things, modern VM-extensions may or may not
>>> support (something like or even an extension of /proc/cpuinfo). What is
>>> the state of this plan? Would be specifically interesting for Intel CPUs
>>> as there seem to be many of them out there with restrictions for special
>>> use cases - like real-time.
>>>    
>> Newer kernels do report some vmx features (like flexpriority) in
>> /proc/cpuinfo but not all.
>>
> 
> Ah, nice. Then we just need this?
> 

Fine with me.

Acked-by: H. Peter Anvin <hpa@zytor.com>

However, I guess the real question if we shouldn't export ALL VMX
features in a consistent way instead?

	-hpa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang, Sheng July 27, 2009, 1:11 a.m. UTC | #4
On Monday 27 July 2009 03:16:27 H. Peter Anvin wrote:
> Jan Kiszka wrote:
> > Avi Kivity wrote:
> >> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
> >>> I vaguely recall that someone promised to add a feature reporting
> >>> facility for all those nice things, modern VM-extensions may or may not
> >>> support (something like or even an extension of /proc/cpuinfo). What is
> >>> the state of this plan? Would be specifically interesting for Intel
> >>> CPUs as there seem to be many of them out there with restrictions for
> >>> special use cases - like real-time.
> >>
> >> Newer kernels do report some vmx features (like flexpriority) in
> >> /proc/cpuinfo but not all.
> >
> > Ah, nice. Then we just need this?
>
> Fine with me.
>
> Acked-by: H. Peter Anvin <hpa@zytor.com>
>
> However, I guess the real question if we shouldn't export ALL VMX
> features in a consistent way instead?
>
When I add feature reporting to cpuinfo, I just put highlight features there, 
otherwise the VMX feature list would at least as long as CPU one.

I have also suggested another field for virtualization feature for it, but 
some concern again userspace tools raised.

For we got indeed quite a lot features, and would get more, would it better to 
export the part of struct vmcs_config entries(that's pin_based_exec_ctrl, 
cpu_based_exec_ctrl, and cpu_based_2nd_exec_ctrl) through 
sys/module/kvm_intel/? Put every feature to cpuinfo seems not that necessary 
for such a big list.
Jan Kiszka July 27, 2009, 9:08 a.m. UTC | #5
[ carrying this to LKML ]

Yang, Sheng wrote:
> On Monday 27 July 2009 03:16:27 H. Peter Anvin wrote:
>> Jan Kiszka wrote:
>>> Avi Kivity wrote:
>>>> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
>>>>> I vaguely recall that someone promised to add a feature reporting
>>>>> facility for all those nice things, modern VM-extensions may or may not
>>>>> support (something like or even an extension of /proc/cpuinfo). What is
>>>>> the state of this plan? Would be specifically interesting for Intel
>>>>> CPUs as there seem to be many of them out there with restrictions for
>>>>> special use cases - like real-time.
>>>> Newer kernels do report some vmx features (like flexpriority) in
>>>> /proc/cpuinfo but not all.
>>> Ah, nice. Then we just need this?
>> Fine with me.
>>
>> Acked-by: H. Peter Anvin <hpa@zytor.com>
>>
>> However, I guess the real question if we shouldn't export ALL VMX
>> features in a consistent way instead?
>>
> When I add feature reporting to cpuinfo, I just put highlight features there, 
> otherwise the VMX feature list would at least as long as CPU one.

That could become true. But the question is always what the highlights
are. Often this depends on the hypervisor as it may implement
workarounds for missing features differently (or not at all). So I'm
also for exposing feature information consistently.

> 
> I have also suggested another field for virtualization feature for it, but 
> some concern again userspace tools raised.
> 
> For we got indeed quite a lot features, and would get more, would it better to 
> export the part of struct vmcs_config entries(that's pin_based_exec_ctrl, 
> cpu_based_exec_ctrl, and cpu_based_2nd_exec_ctrl) through 
> sys/module/kvm_intel/? Put every feature to cpuinfo seems not that necessary 
> for such a big list.

I don't think this information should only come from KVM. Consider you
didn't build it into some kernel but still want to find out what your
system is able to provide.

What about adding some dedicated /proc entry for CPU virtualization
features, say /proc/hvminfo?

Jan
Yang, Sheng July 27, 2009, 9:29 a.m. UTC | #6
On Monday 27 July 2009 17:08:42 Jan Kiszka wrote:
> [ carrying this to LKML ]
>
> Yang, Sheng wrote:
> > On Monday 27 July 2009 03:16:27 H. Peter Anvin wrote:
> >> Jan Kiszka wrote:
> >>> Avi Kivity wrote:
> >>>> On 07/24/2009 12:41 PM, Jan Kiszka wrote:
> >>>>> I vaguely recall that someone promised to add a feature reporting
> >>>>> facility for all those nice things, modern VM-extensions may or may
> >>>>> not support (something like or even an extension of /proc/cpuinfo).
> >>>>> What is the state of this plan? Would be specifically interesting for
> >>>>> Intel CPUs as there seem to be many of them out there with
> >>>>> restrictions for special use cases - like real-time.
> >>>>
> >>>> Newer kernels do report some vmx features (like flexpriority) in
> >>>> /proc/cpuinfo but not all.
> >>>
> >>> Ah, nice. Then we just need this?
> >>
> >> Fine with me.
> >>
> >> Acked-by: H. Peter Anvin <hpa@zytor.com>
> >>
> >> However, I guess the real question if we shouldn't export ALL VMX
> >> features in a consistent way instead?
> >
> > When I add feature reporting to cpuinfo, I just put highlight features
> > there, otherwise the VMX feature list would at least as long as CPU one.
>
> That could become true. But the question is always what the highlights
> are. Often this depends on the hypervisor as it may implement
> workarounds for missing features differently (or not at all). So I'm
> also for exposing feature information consistently.

(CC Andi and Ingo)

The highlight means the feature we would gain a lot, like FlexPriority, EPT, 
VPID. They can be vendor specific. And I am talking about hardware capability 
here, so what's hypervisor did for workaround is not in scope.
>
> > I have also suggested another field for virtualization feature for it,
> > but some concern again userspace tools raised.
> >
> > For we got indeed quite a lot features, and would get more, would it
> > better to export the part of struct vmcs_config entries(that's
> > pin_based_exec_ctrl, cpu_based_exec_ctrl, and cpu_based_2nd_exec_ctrl)
> > through
> > sys/module/kvm_intel/? Put every feature to cpuinfo seems not that
> > necessary for such a big list.
>
> I don't think this information should only come from KVM. Consider you
> didn't build it into some kernel but still want to find out what your
> system is able to provide.

Yes, agree.
>
> What about adding some dedicated /proc entry for CPU virtualization
> features, say /proc/hvminfo?

Well, compared to this, I may still prefer a new item in /proc/cpuinfo, for 
it's still CPU feature, like Andi did for power management(IIRC).

Any more preferred location?
Avi Kivity July 27, 2009, 10:31 a.m. UTC | #7
On 07/27/2009 12:08 PM, Jan Kiszka wrote:
>> When I add feature reporting to cpuinfo, I just put highlight features there,
>> otherwise the VMX feature list would at least as long as CPU one.
>>      
>
> That could become true. But the question is always what the highlights
> are. Often this depends on the hypervisor as it may implement
> workarounds for missing features differently (or not at all). So I'm
> also for exposing feature information consistently.
>    


I'd put everything in there.  It's information that is often useful.  
Even minor features can expose bugs in the hypervisor.

>> I have also suggested another field for virtualization feature for it, but
>> some concern again userspace tools raised.
>>
>> For we got indeed quite a lot features, and would get more, would it better to
>> export the part of struct vmcs_config entries(that's pin_based_exec_ctrl,
>> cpu_based_exec_ctrl, and cpu_based_2nd_exec_ctrl) through
>> sys/module/kvm_intel/? Put every feature to cpuinfo seems not that necessary
>> for such a big list.
>>      
>
> I don't think this information should only come from KVM. Consider you
> didn't build it into some kernel but still want to find out what your
> system is able to provide.
>
> What about adding some dedicated /proc entry for CPU virtualization
> features, say /proc/hvminfo?
>    

The flags line is already very long, and already has some virt features, 
so I see no problem extending it.  If we don't want that. I'd prefer a 
virtualization  line in /proc/cpuinfo rather than a new file.
diff mbox

Patch

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 4a28d22..8647524 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -165,6 +165,7 @@ 
 #define X86_FEATURE_FLEXPRIORITY (8*32+ 2) /* Intel FlexPriority */
 #define X86_FEATURE_EPT         (8*32+ 3) /* Intel Extended Page Table */
 #define X86_FEATURE_VPID        (8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_VWBINVD     (8*32+ 5) /* Guest Exiting on WBINVD */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 3260ab0..2d921b0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -297,6 +297,7 @@  static void __cpuinit detect_vmx_virtcap(struct cpuinfo_x86 *c)
 #define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC	0x00000001
 #define X86_VMX_FEATURE_PROC_CTLS2_EPT		0x00000002
 #define X86_VMX_FEATURE_PROC_CTLS2_VPID		0x00000020
+#define X86_VMX_FEATURE_PROC_CTLS2_VWBINVD	0x00000040
 
 	u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
 
@@ -305,6 +306,7 @@  static void __cpuinit detect_vmx_virtcap(struct cpuinfo_x86 *c)
 	clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
 	clear_cpu_cap(c, X86_FEATURE_EPT);
 	clear_cpu_cap(c, X86_FEATURE_VPID);
+	clear_cpu_cap(c, X86_FEATURE_VWBINVD);
 
 	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
 	msr_ctl = vmx_msr_high | vmx_msr_low;
@@ -323,6 +325,8 @@  static void __cpuinit detect_vmx_virtcap(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EPT);
 		if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
 			set_cpu_cap(c, X86_FEATURE_VPID);
+		if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VWBINVD)
+			set_cpu_cap(c, X86_FEATURE_VWBINVD);
 	}
 }