diff mbox

KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT

Message ID 1489761691-11441-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li March 17, 2017, 2:41 p.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that 
L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:

qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14

Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is 
uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE 
paging and EPT enabled. However, there is a progress to switch from Legacy 
mode's such-mode Protected mode to Long mode during system boot, the check 
in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in 
Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually 
the original commit should just intended to prevent to dereference L2's CR3 
if the L1 hypervisor emulates L2's real mode through vm8086.  

This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and 
!vm86_active.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini March 17, 2017, 2:47 p.m. UTC | #1
On 17/03/2017 15:41, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that 
> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
> 
> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
> 
> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is 
> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE 
> paging and EPT enabled. However, there is a progress to switch from Legacy 
> mode's such-mode Protected mode to Long mode during system boot, the check 
> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in 
> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually 
> the original commit should just intended to prevent to dereference L2's CR3 
> if the L1 hypervisor emulates L2's real mode through vm8086.  
> 
> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and 
> !vm86_active.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

Please provide a testcase.  I know this is a regression, but I'm not
going to merge the fix without a corresponding patch to kvm-unit-tests.

Paolo

> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..2b2a05f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>  			       u32 *entry_failure_code)
>  {
> -	if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +	if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>  		if (!nested_cr3_valid(vcpu, cr3)) {
>  			*entry_failure_code = ENTRY_FAIL_DEFAULT;
>  			return 1;
> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  		 * must not be dereferenced.
>  		 */
>  		if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -		    !nested_ept) {
> +		    !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>  			if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>  				*entry_failure_code = ENTRY_FAIL_PDPTE;
>  				return 1;
>
Ladi Prosek March 17, 2017, 5:28 p.m. UTC | #2
On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>
> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>
> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
> paging and EPT enabled. However, there is a progress to switch from Legacy
> mode's such-mode Protected mode to Long mode during system boot, the check
> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
> the original commit should just intended to prevent to dereference L2's CR3
> if the L1 hypervisor emulates L2's real mode through vm8086.
>
> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
> !vm86_active.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..2b2a05f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>                                u32 *entry_failure_code)
>  {
> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>                 if (!nested_cr3_valid(vcpu, cr3)) {
>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>                         return 1;
> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>                  * must not be dereferenced.
>                  */
>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -                   !nested_ept) {
> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {

This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
as before 7ca29de21362.

I'll take a closer look next week. Is there an easy way for me to
reproduce the issue you're seeing?

>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>                                 return 1;
> --
> 2.7.4
>
Paolo Bonzini March 17, 2017, 5:33 p.m. UTC | #3
On 17/03/2017 18:28, Ladi Prosek wrote:
> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>
>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>
>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>> paging and EPT enabled. However, there is a progress to switch from Legacy
>> mode's such-mode Protected mode to Long mode during system boot, the check
>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>> the original commit should just intended to prevent to dereference L2's CR3
>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>
>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>> !vm86_active.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Ladi Prosek <lprosek@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/kvm/vmx.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c664365..2b2a05f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>                                u32 *entry_failure_code)
>>  {
>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>                         return 1;
>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>                  * must not be dereferenced.
>>                  */
>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>> -                   !nested_ept) {
>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
> 
> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
> as before 7ca29de21362.

Looks like we need _two_ testcases then... :)

Paolo

> I'll take a closer look next week. Is there an easy way for me to
> reproduce the issue you're seeing?
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c664365..2b2a05f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9933,7 +9933,7 @@  static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
 static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
 			       u32 *entry_failure_code)
 {
-	if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
+	if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
 		if (!nested_cr3_valid(vcpu, cr3)) {
 			*entry_failure_code = ENTRY_FAIL_DEFAULT;
 			return 1;
@@ -9944,7 +9944,7 @@  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		 * must not be dereferenced.
 		 */
 		if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
-		    !nested_ept) {
+		    !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
 			if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
 				*entry_failure_code = ENTRY_FAIL_PDPTE;
 				return 1;