KVM: nVMX: nested VPID emulation

Message ID	BLU436-SMTP172159ECF89EA5C21830024805D0@phx.gbl (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Message-ID: <BLU436-SMTP172159ECF89EA5C21830024805D0@phx.gbl> From: Wanpeng Li <wanpeng.li@hotmail.com> To: Paolo Bonzini <pbonzini@redhat.com> CC: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Wanpeng Li <wanpeng.li@hotmail.com> Subject: [PATCH] KVM: nVMX: nested VPID emulation Date: Mon, 14 Sep 2015 20:52:23 +0800 MIME-Version: 1.0 Content-Type: text/plain Sender: kvm-owner@vger.kernel.org Precedence: bulk

Wanpeng Li Sept. 14, 2015, 12:52 p.m. UTC

VPID is used to tag address space and avoid a TLB flush. Currently L0 use 
the same VPID to run L1 and all its guests. KVM flushes VPID when switching 
between L1 and L2. 

This patch advertises VPID to the L1 hypervisor, then address space of L1 and 
L2 can be separately treated and avoid TLB flush when swithing between L1 and 
L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 7 deletions(-)

Jan Kiszka Sept. 14, 2015, 2:54 p.m. UTC | #1

On 2015-09-14 14:52, Wanpeng Li wrote:
> VPID is used to tag address space and avoid a TLB flush. Currently L0 use 
> the same VPID to run L1 and all its guests. KVM flushes VPID when switching 
> between L1 and L2. 
> 
> This patch advertises VPID to the L1 hypervisor, then address space of L1 and 
> L2 can be separately treated and avoid TLB flush when swithing between L1 and 
> L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.
> 
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index da1590e..06bc31e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1157,6 +1157,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
>  }
>  
> +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
> +{
> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
> +}
> +
>  static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
>  {
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
> @@ -2471,6 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>  		SECONDARY_EXEC_RDTSCP |
>  		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> +		SECONDARY_EXEC_ENABLE_VPID |
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>  		SECONDARY_EXEC_WBINVD_EXITING |
> @@ -4160,7 +4166,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
>  	int vpid;
>  
>  	vmx->vpid = 0;
> -	if (!enable_vpid)
> +	if (!enable_vpid || is_guest_mode(&vmx->vcpu))
>  		return;
>  	spin_lock(&vmx_vpid_lock);
>  	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> @@ -6738,6 +6744,14 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>  	}
>  	vmcs12 = kmap(page);
>  	vmcs12->launch_state = 0;
> +	if (enable_vpid) {
> +		if (nested_cpu_has_vpid(vmcs12)) {
> +			spin_lock(&vmx_vpid_lock);
> +			if (vmcs12->virtual_processor_id != 0)
> +				__clear_bit(vmcs12->virtual_processor_id, vmx_vpid_bitmap);
> +			spin_unlock(&vmx_vpid_lock);

Maybe enhance free_vpid (and also allocate_vpid) to work generically and
let the caller decide where to take the vpid from or where to store it?

> +		}
> +	}
>  	kunmap(page);
>  	nested_release_page(page);
>  
> @@ -9189,6 +9203,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 exec_control;
> +	int vpid;
>  
>  	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>  	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -9438,13 +9453,21 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  	else
>  		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>  
> +
>  	if (enable_vpid) {
> -		/*
> -		 * Trivially support vpid by letting L2s share their parent
> -		 * L1's vpid. TODO: move to a more elaborate solution, giving
> -		 * each L2 its own vpid and exposing the vpid feature to L1.
> -		 */
> -		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> +		if (nested_cpu_has_vpid(vmcs12)) {
> +			if (vmcs12->virtual_processor_id == 0) {
> +				spin_lock(&vmx_vpid_lock);
> +				vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> +				if (vpid < VMX_NR_VPIDS)
> +					__set_bit(vpid, vmx_vpid_bitmap);
> +				spin_unlock(&vmx_vpid_lock);
> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vpid);

It's a bit non-obvious that vpid == VMX_NR_VPIDS (no free vpids) will
lead to vpid == 0 when writing VIRTUAL_PROCESSOR_ID. You should leave at
least a comment. Or generalize allocate_vpid as that one is already
clearer in this regard.

> +			} else
> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vmcs12->virtual_processor_id);
> +		} else
> +			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> +
>  		vmx_flush_tlb(vcpu);
>  	}
>  
> @@ -9973,6 +9996,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  		vmcs12_save_pending_event(vcpu, vmcs12);
>  	}
>  
> +	if (nested_cpu_has_vpid(vmcs12))
> +		vmcs12->virtual_processor_id = vmcs_read16(VIRTUAL_PROCESSOR_ID);
>  	/*
>  	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
>  	 * preserved above and would only end up incorrectly in L1.
> 

Last but not least: the guest can now easily exhaust the host's pool of
vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
or should there be some limit?

Jan

Bandan Das Sept. 14, 2015, 4:08 p.m. UTC | #2

Wanpeng Li <wanpeng.li@hotmail.com> writes:

> VPID is used to tag address space and avoid a TLB flush. Currently L0 use 
> the same VPID to run L1 and all its guests. KVM flushes VPID when switching 
> between L1 and L2. 
>
> This patch advertises VPID to the L1 hypervisor, then address space of L1 and 
> L2 can be separately treated and avoid TLB flush when swithing between L1 and 
> L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.

TLB flush does context invalidation and while that should result in
some improvement, I never expected a 3x improvement for any workload!
Interesting :)

> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index da1590e..06bc31e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1157,6 +1157,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
>  }
>  
> +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
> +{
> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
> +}
> +
>  static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
>  {
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
> @@ -2471,6 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>  		SECONDARY_EXEC_RDTSCP |
>  		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
> +		SECONDARY_EXEC_ENABLE_VPID |
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>  		SECONDARY_EXEC_WBINVD_EXITING |
> @@ -4160,7 +4166,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
>  	int vpid;
>  
>  	vmx->vpid = 0;
> -	if (!enable_vpid)
> +	if (!enable_vpid || is_guest_mode(&vmx->vcpu))
>  		return;
>  	spin_lock(&vmx_vpid_lock);
>  	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> @@ -6738,6 +6744,14 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>  	}
>  	vmcs12 = kmap(page);
>  	vmcs12->launch_state = 0;
> +	if (enable_vpid) {
> +		if (nested_cpu_has_vpid(vmcs12)) {
> +			spin_lock(&vmx_vpid_lock);
> +			if (vmcs12->virtual_processor_id != 0)
> +				__clear_bit(vmcs12->virtual_processor_id, vmx_vpid_bitmap);
> +			spin_unlock(&vmx_vpid_lock);
> +		}
> +	}
>  	kunmap(page);
>  	nested_release_page(page);

I don't think this is enough, we should also check for set "nested" bits
in free_vpid() and clear them. There should be some sort of a mapping between the
nested guest vpid and the actual vpid so that we can just clear those bits.

> @@ -9189,6 +9203,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 exec_control;
> +	int vpid;
>  
>  	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>  	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -9438,13 +9453,21 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  	else
>  		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>  
> +

Empty space here.

>  	if (enable_vpid) {
> -		/*
> -		 * Trivially support vpid by letting L2s share their parent
> -		 * L1's vpid. TODO: move to a more elaborate solution, giving
> -		 * each L2 its own vpid and exposing the vpid feature to L1.
> -		 */
> -		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> +		if (nested_cpu_has_vpid(vmcs12)) {
> +			if (vmcs12->virtual_processor_id == 0) {

Ok, so if we advertise vpid to the nested hypervisor, isn't it going to
attempt writing this field when setting up ? Atleast
that's what Linux does, no ?

> +				spin_lock(&vmx_vpid_lock);
> +				vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> +				if (vpid < VMX_NR_VPIDS)
> +					__set_bit(vpid, vmx_vpid_bitmap);
> +				spin_unlock(&vmx_vpid_lock);
> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vpid);
> +			} else
> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vmcs12->virtual_processor_id);
> +		} else
> +			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
> +

I guess L1 shouldn't know what vpid L0 chose to run L2. If L1 vmreads,
it should get what it expects for the value of vpid, not the one L0 chose.

>  		vmx_flush_tlb(vcpu);
>  	}

So, this isn't removed ? I thought it's not needed anymore ?

> @@ -9973,6 +9996,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  		vmcs12_save_pending_event(vcpu, vmcs12);
>  	}
>  
> +	if (nested_cpu_has_vpid(vmcs12))
> +		vmcs12->virtual_processor_id = vmcs_read16(VIRTUAL_PROCESSOR_ID);
>  	/*
>  	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
>  	 * preserved above and would only end up incorrectly in L1.
Date: Mon, 14 Sep 2015 21:37:52 +0530
In-Reply-To: <BLU436-SMTP172159ECF89EA5C21830024805D0@phx.gbl> (Wanpeng Li's
	message of "Mon, 14 Sep 2015 20:52:23 +0800")
Message-ID: <jpgh9mxt25j.fsf@linux.bootlegged.copy>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Sept. 15, 2015, 10:14 a.m. UTC | #3

On 9/14/15 10:54 PM, Jan Kiszka wrote:
> On 2015-09-14 14:52, Wanpeng Li wrote:
>> VPID is used to tag address space and avoid a TLB flush. Currently L0 use
>> the same VPID to run L1 and all its guests. KVM flushes VPID when switching
>> between L1 and L2.
>>
>> This patch advertises VPID to the L1 hypervisor, then address space of L1 and
>> L2 can be separately treated and avoid TLB flush when swithing between L1 and
>> L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>   arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
>>   1 file changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index da1590e..06bc31e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -1157,6 +1157,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
>>   }
>>   
>> +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
>> +{
>> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
>> +}
>> +
>>   static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
>>   {
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
>> @@ -2471,6 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>>   		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>   		SECONDARY_EXEC_RDTSCP |
>>   		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>> +		SECONDARY_EXEC_ENABLE_VPID |
>>   		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>   		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>   		SECONDARY_EXEC_WBINVD_EXITING |
>> @@ -4160,7 +4166,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
>>   	int vpid;
>>   
>>   	vmx->vpid = 0;
>> -	if (!enable_vpid)
>> +	if (!enable_vpid || is_guest_mode(&vmx->vcpu))
>>   		return;
>>   	spin_lock(&vmx_vpid_lock);
>>   	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> @@ -6738,6 +6744,14 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>>   	}
>>   	vmcs12 = kmap(page);
>>   	vmcs12->launch_state = 0;
>> +	if (enable_vpid) {
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			spin_lock(&vmx_vpid_lock);
>> +			if (vmcs12->virtual_processor_id != 0)
>> +				__clear_bit(vmcs12->virtual_processor_id, vmx_vpid_bitmap);
>> +			spin_unlock(&vmx_vpid_lock);
> Maybe enhance free_vpid (and also allocate_vpid) to work generically and
> let the caller decide where to take the vpid from or where to store it?

Good idea.

>
>> +		}
>> +	}
>>   	kunmap(page);
>>   	nested_release_page(page);
>>   
>> @@ -9189,6 +9203,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>   	u32 exec_control;
>> +	int vpid;
>>   
>>   	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>>   	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> @@ -9438,13 +9453,21 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   	else
>>   		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>>   
>> +
>>   	if (enable_vpid) {
>> -		/*
>> -		 * Trivially support vpid by letting L2s share their parent
>> -		 * L1's vpid. TODO: move to a more elaborate solution, giving
>> -		 * each L2 its own vpid and exposing the vpid feature to L1.
>> -		 */
>> -		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			if (vmcs12->virtual_processor_id == 0) {
>> +				spin_lock(&vmx_vpid_lock);
>> +				vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> +				if (vpid < VMX_NR_VPIDS)
>> +					__set_bit(vpid, vmx_vpid_bitmap);
>> +				spin_unlock(&vmx_vpid_lock);
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vpid);
> It's a bit non-obvious that vpid == VMX_NR_VPIDS (no free vpids) will
> lead to vpid == 0 when writing VIRTUAL_PROCESSOR_ID. You should leave at
> least a comment. Or generalize allocate_vpid as that one is already
> clearer in this regard.

Ditto.

>
>> +			} else
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vmcs12->virtual_processor_id);
>> +		} else
>> +			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +
>>   		vmx_flush_tlb(vcpu);
>>   	}
>>   
>> @@ -9973,6 +9996,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>   		vmcs12_save_pending_event(vcpu, vmcs12);
>>   	}
>>   
>> +	if (nested_cpu_has_vpid(vmcs12))
>> +		vmcs12->virtual_processor_id = vmcs_read16(VIRTUAL_PROCESSOR_ID);
>>   	/*
>>   	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
>>   	 * preserved above and would only end up incorrectly in L1.
>>
> Last but not least: the guest can now easily exhaust the host's pool of
> vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
> or should there be some limit?

I reuse the value of vpid02 while vpid12 changed w/ one invvpid in v2, 
and the scenario which you pointed out can be avoid.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Sept. 15, 2015, 10:18 a.m. UTC | #4

On 9/15/15 12:08 AM, Bandan Das wrote:
> Wanpeng Li <wanpeng.li@hotmail.com> writes:
>
>> VPID is used to tag address space and avoid a TLB flush. Currently L0 use
>> the same VPID to run L1 and all its guests. KVM flushes VPID when switching
>> between L1 and L2.
>>
>> This patch advertises VPID to the L1 hypervisor, then address space of L1 and
>> L2 can be separately treated and avoid TLB flush when swithing between L1 and
>> L2. This patch gets ~3x performance improvement for lmbench 8p/64k ctxsw.
> TLB flush does context invalidation and while that should result in
> some improvement, I never expected a 3x improvement for any workload!
> Interesting :)

The result still looks good when test v2.

>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>   arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++++++++-------
>>   1 file changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index da1590e..06bc31e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -1157,6 +1157,11 @@ static inline bool nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
>>   }
>>   
>> +static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
>> +{
>> +	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
>> +}
>> +
>>   static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
>>   {
>>   	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
>> @@ -2471,6 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>>   		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>   		SECONDARY_EXEC_RDTSCP |
>>   		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
>> +		SECONDARY_EXEC_ENABLE_VPID |
>>   		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>   		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>   		SECONDARY_EXEC_WBINVD_EXITING |
>> @@ -4160,7 +4166,7 @@ static void allocate_vpid(struct vcpu_vmx *vmx)
>>   	int vpid;
>>   
>>   	vmx->vpid = 0;
>> -	if (!enable_vpid)
>> +	if (!enable_vpid || is_guest_mode(&vmx->vcpu))
>>   		return;
>>   	spin_lock(&vmx_vpid_lock);
>>   	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> @@ -6738,6 +6744,14 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>>   	}
>>   	vmcs12 = kmap(page);
>>   	vmcs12->launch_state = 0;
>> +	if (enable_vpid) {
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			spin_lock(&vmx_vpid_lock);
>> +			if (vmcs12->virtual_processor_id != 0)
>> +				__clear_bit(vmcs12->virtual_processor_id, vmx_vpid_bitmap);
>> +			spin_unlock(&vmx_vpid_lock);
>> +		}
>> +	}
>>   	kunmap(page);
>>   	nested_release_page(page);
> I don't think this is enough, we should also check for set "nested" bits
> in free_vpid() and clear them. There should be some sort of a mapping between the
> nested guest vpid and the actual vpid so that we can just clear those bits.

Agreed.

>
>> @@ -9189,6 +9203,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   {
>>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>   	u32 exec_control;
>> +	int vpid;
>>   
>>   	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>>   	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> @@ -9438,13 +9453,21 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>   	else
>>   		vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>>   
>> +
> Empty space here.
>
>>   	if (enable_vpid) {
>> -		/*
>> -		 * Trivially support vpid by letting L2s share their parent
>> -		 * L1's vpid. TODO: move to a more elaborate solution, giving
>> -		 * each L2 its own vpid and exposing the vpid feature to L1.
>> -		 */
>> -		vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +		if (nested_cpu_has_vpid(vmcs12)) {
>> +			if (vmcs12->virtual_processor_id == 0) {
> Ok, so if we advertise vpid to the nested hypervisor, isn't it going to
> attempt writing this field when setting up ? Atleast
> that's what Linux does, no ?

Agreed, I do the allocation of vpid02 during initialization in v2.

>
>> +				spin_lock(&vmx_vpid_lock);
>> +				vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
>> +				if (vpid < VMX_NR_VPIDS)
>> +					__set_bit(vpid, vmx_vpid_bitmap);
>> +				spin_unlock(&vmx_vpid_lock);
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vpid);
>> +			} else
>> +				vmcs_write16(VIRTUAL_PROCESSOR_ID, vmcs12->virtual_processor_id);
>> +		} else
>> +			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>> +
> I guess L1 shouldn't know what vpid L0 chose to run L2. If L1 vmreads,
> it should get what it expects for the value of vpid, not the one L0 chose.

Agreed.

>
>>   		vmx_flush_tlb(vcpu);
>>   	}
> So, this isn't removed ? I thought it's not needed anymore ?

Please review v2. :-)

Regards,
Wanpeng Li

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka Sept. 15, 2015, 5:32 p.m. UTC | #5

On 2015-09-15 12:14, Wanpeng Li wrote:
> On 9/14/15 10:54 PM, Jan Kiszka wrote:
>> Last but not least: the guest can now easily exhaust the host's pool of
>> vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
>> or should there be some limit?
> 
> I reuse the value of vpid02 while vpid12 changed w/ one invvpid in v2,
> and the scenario which you pointed out can be avoid.

I cannot yet follow why there is no chance for L1 to consume all vpids
that the host manages in that single, global bitmap by simply spawning a
lot of nested VCPUs for some L2. What is enforcing L1 to call nested
vmclear - apparently the only way, besides destructing nested VCPUs, to
release such vpids again?

Jan

Wanpeng Li Sept. 16, 2015, 2:36 a.m. UTC | #6

On 9/16/15 1:32 AM, Jan Kiszka wrote:
> On 2015-09-15 12:14, Wanpeng Li wrote:
>> On 9/14/15 10:54 PM, Jan Kiszka wrote:
>>> Last but not least: the guest can now easily exhaust the host's pool of
>>> vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
>>> or should there be some limit?
>> I reuse the value of vpid02 while vpid12 changed w/ one invvpid in v2,
>> and the scenario which you pointed out can be avoid.
> I cannot yet follow why there is no chance for L1 to consume all vpids
> that the host manages in that single, global bitmap by simply spawning a
> lot of nested VCPUs for some L2. What is enforcing L1 to call nested
> vmclear - apparently the only way, besides destructing nested VCPUs, to
> release such vpids again?

In v2, there is no direct mapping between vpid02 and vpid12, the vpid02 
is per-vCPU for L0 and reused while the value of vpid12 is changed w/ 
one invvpid during nested vmentry. The vpid12 is allocated by L1 for L2, 
so it will not influence global bitmap(for vpid01 and vpid02 allocation) 
even if spawn a lot of nested vCPUs.

Regards,
Wanpeng Li

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka Sept. 16, 2015, 5:20 a.m. UTC | #7

On 2015-09-16 04:36, Wanpeng Li wrote:
> On 9/16/15 1:32 AM, Jan Kiszka wrote:
>> On 2015-09-15 12:14, Wanpeng Li wrote:
>>> On 9/14/15 10:54 PM, Jan Kiszka wrote:
>>>> Last but not least: the guest can now easily exhaust the host's pool of
>>>> vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
>>>> or should there be some limit?
>>> I reuse the value of vpid02 while vpid12 changed w/ one invvpid in v2,
>>> and the scenario which you pointed out can be avoid.
>> I cannot yet follow why there is no chance for L1 to consume all vpids
>> that the host manages in that single, global bitmap by simply spawning a
>> lot of nested VCPUs for some L2. What is enforcing L1 to call nested
>> vmclear - apparently the only way, besides destructing nested VCPUs, to
>> release such vpids again?
> 
> In v2, there is no direct mapping between vpid02 and vpid12, the vpid02
> is per-vCPU for L0 and reused while the value of vpid12 is changed w/
> one invvpid during nested vmentry. The vpid12 is allocated by L1 for L2,
> so it will not influence global bitmap(for vpid01 and vpid02 allocation)
> even if spawn a lot of nested vCPUs.

Ah, I see, you limit allocation to one additional host-side vpid per
VCPU, for nesting. That looks better. That also means all vpids for L2
will be folded on that single vpid in hardware, right? So the major
benefit comes from having separate vpids when switching between L1 and
L2, in fact.

Jan

Wanpeng Li Sept. 16, 2015, 6:10 a.m. UTC | #8

On 9/16/15 1:20 PM, Jan Kiszka wrote:
> On 2015-09-16 04:36, Wanpeng Li wrote:
>> On 9/16/15 1:32 AM, Jan Kiszka wrote:
>>> On 2015-09-15 12:14, Wanpeng Li wrote:
>>>> On 9/14/15 10:54 PM, Jan Kiszka wrote:
>>>>> Last but not least: the guest can now easily exhaust the host's pool of
>>>>> vpid by simply spawning plenty of VCPUs for L2, no? Is this acceptable
>>>>> or should there be some limit?
>>>> I reuse the value of vpid02 while vpid12 changed w/ one invvpid in v2,
>>>> and the scenario which you pointed out can be avoid.
>>> I cannot yet follow why there is no chance for L1 to consume all vpids
>>> that the host manages in that single, global bitmap by simply spawning a
>>> lot of nested VCPUs for some L2. What is enforcing L1 to call nested
>>> vmclear - apparently the only way, besides destructing nested VCPUs, to
>>> release such vpids again?
>> In v2, there is no direct mapping between vpid02 and vpid12, the vpid02
>> is per-vCPU for L0 and reused while the value of vpid12 is changed w/
>> one invvpid during nested vmentry. The vpid12 is allocated by L1 for L2,
>> so it will not influence global bitmap(for vpid01 and vpid02 allocation)
>> even if spawn a lot of nested vCPUs.
> Ah, I see, you limit allocation to one additional host-side vpid per
> VCPU, for nesting. That looks better. That also means all vpids for L2
> will be folded on that single vpid in hardware, right? So the major

Exactly.

> benefit comes from having separate vpids when switching between L1 and
> L2, in fact.

And also when L2's vCPUs not sched in/out on L1. Btw, your review of v3 
is a great appreciated. :-)

Regards,
Wanpeng Li

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: nVMX: nested VPID emulation

Commit Message

Comments

Patch