diff mbox

KVM: nVMX: nested TPR shadow/threshold emulation

Message ID 1406721880-5490-1-git-send-email-wanpeng.li@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li July 30, 2014, 12:04 p.m. UTC
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest. 
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If 
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested 
in, we inject it into L1 VMM for handling.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
 arch/x86/kvm/vmx.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Comments

Paolo Bonzini July 30, 2014, 3:20 p.m. UTC | #1
Il 30/07/2014 14:04, Wanpeng Li ha scritto:
> @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  		if (!vmx->rdtscp_enabled)
>  			exec_control &= ~SECONDARY_EXEC_RDTSCP;
>  		/* Take the following fields only from vmcs12 */
> -		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> -				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> +		exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>                                    SECONDARY_EXEC_APIC_REGISTER_VIRT);

This change is wrong.  You don't have to take L0's "virtualize APIC
accesses" setting into account, because while running L2 you cannot
modify L1's CR8 (only the virtual nested one).

> +
> +			virtual_apic_page = nested_get_page(vcpu,
> +						vmcs12->virtual_apic_page_addr);
> +			if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
> +					page_to_phys(virtual_apic_page))
> +				vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
> +					page_to_phys(virtual_apic_page));
> +			nested_release_page(virtual_apic_page);
> +

You cannot release this page here.  You need to the exactly the same
thing that is done for apic_access_page.

One thing:

> +	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
> +		vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);

I think you can just do this write unconditionally, since most
hypervisors will enable this.  Also, you probably can add the tpr
threshold field to the read-write fields for shadow VMCS.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 31, 2014, 8:03 a.m. UTC | #2
Hi Paolo,
On Wed, Jul 30, 2014 at 05:20:58PM +0200, Paolo Bonzini wrote:
>Il 30/07/2014 14:04, Wanpeng Li ha scritto:
>> @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  		if (!vmx->rdtscp_enabled)
>>  			exec_control &= ~SECONDARY_EXEC_RDTSCP;
>>  		/* Take the following fields only from vmcs12 */
>> -		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>> -				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>> +		exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>                                    SECONDARY_EXEC_APIC_REGISTER_VIRT);
>
>This change is wrong.  You don't have to take L0's "virtualize APIC
>accesses" setting into account, because while running L2 you cannot
>modify L1's CR8 (only the virtual nested one).
>

Agreed.

>> +
>> +			virtual_apic_page = nested_get_page(vcpu,
>> +						vmcs12->virtual_apic_page_addr);
>> +			if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
>> +					page_to_phys(virtual_apic_page))
>> +				vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
>> +					page_to_phys(virtual_apic_page));
>> +			nested_release_page(virtual_apic_page);
>> +
>
>You cannot release this page here.  You need to the exactly the same
>thing that is done for apic_access_page.
>

Agreed.

>One thing:
>
>> +	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>> +		vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>
>I think you can just do this write unconditionally, since most
>hypervisors will enable this.  Also, you probably can add the tpr

What will happen if a hypervisor doesn't enable it? I make it more 
cleaner in version two.

>threshold field to the read-write fields for shadow VMCS.
>

Agreed.

Regards,
Wanpeng Li 

>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 31, 2014, 9:25 a.m. UTC | #3
Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>> One thing:
>>
>>> +	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>>> +		vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>
>> I think you can just do this write unconditionally, since most
>> hypervisors will enable this.  Also, you probably can add the tpr
> 
> What will happen if a hypervisor doesn't enable it? I make it more 
> cleaner in version two.

TPR_THRESHOLD will be likely written as zero, but the processor will
never use it anyway.  It's just a small optimization because
nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Paolo

>> threshold field to the read-write fields for shadow VMCS.
> 
> Agreed.
> 
> Regards,
> Wanpeng Li 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 1, 2014, 12:57 a.m. UTC | #4
Paolo Bonzini wrote on 2014-07-31:
> Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>>> One thing:
>>> 
>>>> +	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>>>> +		vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>> 
>>> I think you can just do this write unconditionally, since most
>>> hypervisors will enable this.  Also, you probably can add the tpr
>> 
>> What will happen if a hypervisor doesn't enable it? I make it more
>> cleaner in version two.
> 
> TPR_THRESHOLD will be likely written as zero, but the processor will
> never use it anyway.  It's just a small optimization because
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Theoretically, you are right. But we should not expect all VMMs follow it. It is not worth to violate the SDM just for saving two or three instructions' cost.

> 
> Paolo
> 
>>> threshold field to the read-write fields for shadow VMCS.
>> 
>> Agreed.
>> 
>> Regards,
>> Wanpeng Li


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Aug. 1, 2014, 6:35 a.m. UTC | #5
Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
> > TPR_THRESHOLD will be likely written as zero, but the processor will
> > never use it anyway.  It's just a small optimization because
> > nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.
> 
> Theoretically, you are right. But we should not expect all VMMs
> follow it. It is not worth to violate the SDM just for saving two or
> three instructions' cost.

Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
vmcs_write32.  But still, checking nested_cpu_has is not strictly
necessary.  Right now they both are a single AND, but I have plans to
change all of the cpu_has_*() checks to static keys.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 1, 2014, 6:44 a.m. UTC | #6
Paolo Bonzini wrote on 2014-08-01:
> Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
>>> TPR_THRESHOLD will be likely written as zero, but the processor
>>> will never use it anyway.  It's just a small optimization because
>>> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always
> be true.
>> 
>> Theoretically, you are right. But we should not expect all VMMs
>> follow it. It is not worth to violate the SDM just for saving two or
>> three instructions' cost.
> 
> Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
> vmcs_write32.  But still, checking nested_cpu_has is not strictly necessary.
> Right now they both are a single AND, but I have plans to change all
> of the
> cpu_has_*() checks to static keys.

See v2 patch. It isn't a problem anymore.

> 
> Paolo


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a3845b8..f60846c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2331,7 +2331,7 @@  static __init void nested_vmx_setup_ctls_msrs(void)
 		CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
 		CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
 		CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
-		CPU_BASED_PAUSE_EXITING |
+		CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW |
 		CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
 	/*
 	 * We can allow some features even when not supported by the
@@ -6937,7 +6937,7 @@  static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	case EXIT_REASON_MCE_DURING_VMENTRY:
 		return 0;
 	case EXIT_REASON_TPR_BELOW_THRESHOLD:
-		return 1;
+		return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW);
 	case EXIT_REASON_APIC_ACCESS:
 		return nested_cpu_has2(vmcs12,
 			SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
@@ -7058,6 +7058,9 @@  static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
+	if (is_guest_mode(vcpu))
+		return;
+
 	if (irr == -1 || tpr < irr) {
 		vmcs_write32(TPR_THRESHOLD, 0);
 		return;
@@ -7962,14 +7965,14 @@  static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		if (!vmx->rdtscp_enabled)
 			exec_control &= ~SECONDARY_EXEC_RDTSCP;
 		/* Take the following fields only from vmcs12 */
-		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
-				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+		exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
                                   SECONDARY_EXEC_APIC_REGISTER_VIRT);
 		if (nested_cpu_has(vmcs12,
 				CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;
 
 		if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+			struct page *virtual_apic_page;
 			/*
 			 * Translate L1 physical address to host physical
 			 * address for vmcs02. Keep the page pinned, so this
@@ -7992,6 +7995,15 @@  static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 			else
 				vmcs_write64(APIC_ACCESS_ADDR,
 				  page_to_phys(vmx->nested.apic_access_page));
+
+			virtual_apic_page = nested_get_page(vcpu,
+						vmcs12->virtual_apic_page_addr);
+			if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
+					page_to_phys(virtual_apic_page))
+				vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+					page_to_phys(virtual_apic_page));
+			nested_release_page(virtual_apic_page);
+
 		} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
 			exec_control |=
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
@@ -8002,6 +8014,8 @@  static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
+		vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
 
 	/*
 	 * Set host-state according to L0's settings (vmcs12 is irrelevant here)