diff mbox

EOI acceleration for high bandwidth IO

Message ID 9832F13BD22FB94A829F798DA4A8280501B9CF542E@pdsmsx503.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dong, Eddie July 6, 2009, 1:42 p.m. UTC
EOI is one of key VM Exit at high bandwidth IO such as VT-d with 10Gb/s NIC.
    This patch accelerate guest EOI emulation utilizing HW VM Exit
    information.
    
    Signed-off-by: Eddie Dong <eddie.dong@intel.com>

Comments

Avi Kivity July 6, 2009, 2:03 p.m. UTC | #1
On 07/06/2009 04:42 PM, Dong, Eddie wrote:
>      EOI is one of key VM Exit at high bandwidth IO such as VT-d with 10Gb/s NIC.
>      This patch accelerate guest EOI emulation utilizing HW VM Exit
>      information.
>    

Won't this fail if the guest uses STOSD to issue the EOI?

(of course, no guest does this, just looking for potential problems)
Dong, Eddie July 6, 2009, 2:34 p.m. UTC | #2
Avi Kivity wrote:
> On 07/06/2009 04:42 PM, Dong, Eddie wrote:
>>      EOI is one of key VM Exit at high bandwidth IO such as VT-d
>>      with 10Gb/s NIC. This patch accelerate guest EOI emulation
>> utilizing HW VM Exit      information. 
>> 
> 
> Won't this fail if the guest uses STOSD to issue the EOI?
> 
Good catch, should we use an exclusion list for the opcode?
Or use decode cache for hot IP (RO in EPT for gip)?

We noticed huge amount of vEOI in 10Gb/s NIC which is ~70KHZ for EOI.
With SR-IOV, it could go up much more to even million level. Decode and
emulation cost 7K cycles, while short path may only spend 3-4K cycles.

Eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 6, 2009, 2:53 p.m. UTC | #3
On 07/06/2009 05:34 PM, Dong, Eddie wrote:
> Avi Kivity wrote:
>    
>> On 07/06/2009 04:42 PM, Dong, Eddie wrote:
>>      
>>>       EOI is one of key VM Exit at high bandwidth IO such as VT-d
>>>       with 10Gb/s NIC. This patch accelerate guest EOI emulation
>>> utilizing HW VM Exit      information.
>>>
>>>        
>> Won't this fail if the guest uses STOSD to issue the EOI?
>>
>>      
> Good catch, should we use an exclusion list for the opcode?
>    

That means fetching the opcode and doing partial decoding, which will 
negate the advantage.

> Or use decode cache for hot IP (RO in EPT for gip)?
>    

How can you tell if the code did not change?

I think it's reasonable to assume that the guest won't use STOSD for EOI 
though, and to apply your patch.  There's no risk to the host.

> We noticed huge amount of vEOI in 10Gb/s NIC which is ~70KHZ for EOI.
> With SR-IOV, it could go up much more to even million level. Decode and
> emulation cost 7K cycles, while short path may only spend 3-4K cycles.
>    

Yes, and I think we can drop the short path further to almost zero by 
using paravirtualization.  It would work for Linux and Windows x86 (with 
something similar to tpr patching).  Unfortunately it won't work on 
Windows x64 since it doesn't allow patching.

We can also expose x2apic (already merged) or Hyper-V enlightenment 
which converts EOI to MSR write which is fairly fast.
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ccafe0d..b63138f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -875,6 +875,15 @@  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
 		     | (apic_get_reg(apic, APIC_TASKPRI) & 4));
 }
 
+void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	if (apic)
+		apic_set_eoi(apic);
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
+
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..3a7a29a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -27,6 +27,7 @@  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
+void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3a75db3..6eea29d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3125,6 +3125,12 @@  static int handle_apic_access(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 	offset = exit_qualification & 0xffful;
+	if ((exit_qualification >> 12) & 0xf == 1 &&
+		offset == APIC_EOI) {	/* EOI write */
+		kvm_lapic_set_eoi(vcpu);
+		skip_emulated_instruction(vcpu);
+		return 1;
+	}
 
 	er = emulate_instruction(vcpu, kvm_run, 0, 0, 0);