diff mbox

KVM: nVMX: do not pin the VMCS12

Message ID 1501163686-13648-1-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini July 27, 2017, 1:54 p.m. UTC
Since the current implementation of VMCS12 does a memcpy in and out
of guest memory, we do not need current_vmcs12 and current_vmcs12_page
anymore.  current_vmptr is enough to read and write the VMCS12.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

Comments

David Matlack July 27, 2017, 5:20 p.m. UTC | #1
On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Since the current implementation of VMCS12 does a memcpy in and out
> of guest memory, we do not need current_vmcs12 and current_vmcs12_page
> anymore.  current_vmptr is enough to read and write the VMCS12.

This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
VMCS12 page by using kvm_write_guest. nested_release_page() only marks
the struct page dirty.

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 23 ++++++-----------------
>  1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index b37161808352..142f16ebdca2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -416,9 +416,6 @@ struct nested_vmx {
>
>         /* The guest-physical address of the current VMCS L1 keeps for L2 */
>         gpa_t current_vmptr;
> -       /* The host-usable pointer to the above */
> -       struct page *current_vmcs12_page;
> -       struct vmcs12 *current_vmcs12;
>         /*
>          * Cache of the guest's VMCS, existing outside of guest memory.
>          * Loaded from guest memory during VMPTRLD. Flushed to guest
> @@ -7183,10 +7180,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>         if (vmx->nested.current_vmptr == -1ull)
>                 return;
>
> -       /* current_vmptr and current_vmcs12 are always set/reset together */
> -       if (WARN_ON(vmx->nested.current_vmcs12 == NULL))
> -               return;
> -
>         if (enable_shadow_vmcs) {
>                 /* copy to memory all shadowed fields in case
>                    they were modified */
> @@ -7199,13 +7192,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>         vmx->nested.posted_intr_nv = -1;
>
>         /* Flush VMCS12 to guest memory */
> -       memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
> -              VMCS12_SIZE);
> +       kvm_vcpu_write_guest_page(&vmx->vcpu,
> +                                 vmx->nested.current_vmptr >> PAGE_SHIFT,
> +                                 vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);

Have you hit any "suspicious RCU usage" error messages during VM
teardown with this patch? We did when we replaced memcpy with
kvm_write_guest a while back. IIRC it was due to kvm->srcu not being
held in one of the teardown paths. kvm_write_guest() expects it to be
held in order to access memslots.

We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send
that patch along with a few other nVMX dirty tracking related patches
I've been meaning to get upstreamed.

>
> -       kunmap(vmx->nested.current_vmcs12_page);
> -       nested_release_page(vmx->nested.current_vmcs12_page);
>         vmx->nested.current_vmptr = -1ull;
> -       vmx->nested.current_vmcs12 = NULL;
>  }
>
>  /*
> @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
>                 }
>
>                 nested_release_vmcs12(vmx);
> -               vmx->nested.current_vmcs12 = new_vmcs12;
> -               vmx->nested.current_vmcs12_page = page;
>                 /*
>                  * Load VMCS12 from guest memory since it is not already
>                  * cached.
>                  */
> -               memcpy(vmx->nested.cached_vmcs12,
> -                      vmx->nested.current_vmcs12, VMCS12_SIZE);
> +               memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
> +               kunmap(page);

+ nested_release_page_clean(page);

> +
>                 set_current_vmptr(vmx, vmptr);
>         }
>
> @@ -9354,7 +9344,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>
>         vmx->nested.posted_intr_nv = -1;
>         vmx->nested.current_vmptr = -1ull;
> -       vmx->nested.current_vmcs12 = NULL;
>
>         vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;
>
> --
> 1.8.3.1
>
David Hildenbrand July 27, 2017, 5:54 p.m. UTC | #2
On 27.07.2017 15:54, Paolo Bonzini wrote:
> Since the current implementation of VMCS12 does a memcpy in and out
> of guest memory, we do not need current_vmcs12 and current_vmcs12_page
> anymore.  current_vmptr is enough to read and write the VMCS12.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This looks like the right thing to do!

(and as mentioned, also properly marks the page as dirty)

Reviewed-by: David Hildenbrand <david@redhat.com>
Wanpeng Li July 28, 2017, 1:28 a.m. UTC | #3
2017-07-28 1:20 GMT+08:00 David Matlack <dmatlack@google.com>:
> On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Since the current implementation of VMCS12 does a memcpy in and out
>> of guest memory, we do not need current_vmcs12 and current_vmcs12_page
>> anymore.  current_vmptr is enough to read and write the VMCS12.
>
> This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
> VMCS12 page by using kvm_write_guest. nested_release_page() only marks
> the struct page dirty.
>
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/vmx.c | 23 ++++++-----------------
>>  1 file changed, 6 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b37161808352..142f16ebdca2 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -416,9 +416,6 @@ struct nested_vmx {
>>
>>         /* The guest-physical address of the current VMCS L1 keeps for L2 */
>>         gpa_t current_vmptr;
>> -       /* The host-usable pointer to the above */
>> -       struct page *current_vmcs12_page;
>> -       struct vmcs12 *current_vmcs12;
>>         /*
>>          * Cache of the guest's VMCS, existing outside of guest memory.
>>          * Loaded from guest memory during VMPTRLD. Flushed to guest
>> @@ -7183,10 +7180,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>>         if (vmx->nested.current_vmptr == -1ull)
>>                 return;
>>
>> -       /* current_vmptr and current_vmcs12 are always set/reset together */
>> -       if (WARN_ON(vmx->nested.current_vmcs12 == NULL))
>> -               return;
>> -
>>         if (enable_shadow_vmcs) {
>>                 /* copy to memory all shadowed fields in case
>>                    they were modified */
>> @@ -7199,13 +7192,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>>         vmx->nested.posted_intr_nv = -1;
>>
>>         /* Flush VMCS12 to guest memory */
>> -       memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
>> -              VMCS12_SIZE);
>> +       kvm_vcpu_write_guest_page(&vmx->vcpu,
>> +                                 vmx->nested.current_vmptr >> PAGE_SHIFT,
>> +                                 vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
>
> Have you hit any "suspicious RCU usage" error messages during VM

Yeah, I observe this splat when testing Paolo's patch today.

[87214.855344] =============================
[87214.855346] WARNING: suspicious RCU usage
[87214.855348] 4.13.0-rc2+ #2 Tainted: G           OE
[87214.855350] -----------------------------
[87214.855352] ./include/linux/kvm_host.h:573 suspicious
rcu_dereference_check() usage!
[87214.855353]
other info that might help us debug this:

[87214.855355]
rcu_scheduler_active = 2, debug_locks = 1
[87214.855357] 1 lock held by qemu-system-x86/17059:
[87214.855359]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc051bb12>]
vcpu_load+0x22/0x80 [kvm]
[87214.855396]
stack backtrace:
[87214.855399] CPU: 3 PID: 17059 Comm: qemu-system-x86 Tainted: G
     OE   4.13.0-rc2+ #2
[87214.855401] Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY,
BIOS FBKTC1AUS 02/16/2016
[87214.855403] Call Trace:
[87214.855408]  dump_stack+0x99/0xce
[87214.855413]  lockdep_rcu_suspicious+0xc5/0x100
[87214.855423]  kvm_vcpu_gfn_to_memslot+0x166/0x180 [kvm]
[87214.855432]  kvm_vcpu_write_guest_page+0x24/0x50 [kvm]
[87214.855438]  free_nested.part.76+0x76/0x270 [kvm_intel]
[87214.855443]  vmx_free_vcpu+0x7a/0xc0 [kvm_intel]
[87214.855454]  kvm_arch_destroy_vm+0x104/0x1d0 [kvm]
[87214.855463]  kvm_put_kvm+0x17a/0x2b0 [kvm]
[87214.855473]  kvm_vm_release+0x21/0x30 [kvm]
[87214.855477]  __fput+0xfb/0x240
[87214.855482]  ____fput+0xe/0x10
[87214.855485]  task_work_run+0x7e/0xb0
[87214.855490]  do_exit+0x323/0xcf0
[87214.855494]  ? get_signal+0x318/0x930
[87214.855498]  ? _raw_spin_unlock_irq+0x2c/0x60
[87214.855503]  do_group_exit+0x50/0xd0
[87214.855507]  get_signal+0x24f/0x930
[87214.855514]  do_signal+0x37/0x750
[87214.855518]  ? __might_fault+0x3e/0x90
[87214.855523]  ? __might_fault+0x85/0x90
[87214.855527]  ? exit_to_usermode_loop+0x2b/0x100
[87214.855531]  ? __this_cpu_preempt_check+0x13/0x20
[87214.855535]  exit_to_usermode_loop+0xab/0x100
[87214.855539]  syscall_return_slowpath+0x153/0x160
[87214.855542]  entry_SYSCALL_64_fastpath+0xc0/0xc2
[87214.855545] RIP: 0033:0x7ff40d24a26d


Regards,
Wanpeng Li

> teardown with this patch? We did when we replaced memcpy with
> kvm_write_guest a while back. IIRC it was due to kvm->srcu not being
> held in one of the teardown paths. kvm_write_guest() expects it to be
> held in order to access memslots.
>
> We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send
> that patch along with a few other nVMX dirty tracking related patches
> I've been meaning to get upstreamed.
>
>>
>> -       kunmap(vmx->nested.current_vmcs12_page);
>> -       nested_release_page(vmx->nested.current_vmcs12_page);
>>         vmx->nested.current_vmptr = -1ull;
>> -       vmx->nested.current_vmcs12 = NULL;
>>  }
>>
>>  /*
>> @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
>>                 }
>>
>>                 nested_release_vmcs12(vmx);
>> -               vmx->nested.current_vmcs12 = new_vmcs12;
>> -               vmx->nested.current_vmcs12_page = page;
>>                 /*
>>                  * Load VMCS12 from guest memory since it is not already
>>                  * cached.
>>                  */
>> -               memcpy(vmx->nested.cached_vmcs12,
>> -                      vmx->nested.current_vmcs12, VMCS12_SIZE);
>> +               memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
>> +               kunmap(page);
>
> + nested_release_page_clean(page);
>
>> +
>>                 set_current_vmptr(vmx, vmptr);
>>         }
>>
>> @@ -9354,7 +9344,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>>
>>         vmx->nested.posted_intr_nv = -1;
>>         vmx->nested.current_vmptr = -1ull;
>> -       vmx->nested.current_vmcs12 = NULL;
>>
>>         vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;
>>
>> --
>> 1.8.3.1
>>
Radim Krčmář Aug. 2, 2017, 8:36 p.m. UTC | #4
2017-07-27 10:20-0700, David Matlack:
> On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Since the current implementation of VMCS12 does a memcpy in and out
> > of guest memory, we do not need current_vmcs12 and current_vmcs12_page
> > anymore.  current_vmptr is enough to read and write the VMCS12.
> 
> This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
> VMCS12 page by using kvm_write_guest. nested_release_page() only marks
> the struct page dirty.
> 
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
> >                 }
> >
> >                 nested_release_vmcs12(vmx);
> > -               vmx->nested.current_vmcs12 = new_vmcs12;
> > -               vmx->nested.current_vmcs12_page = page;
> >                 /*
> >                  * Load VMCS12 from guest memory since it is not already
> >                  * cached.
> >                  */
> > -               memcpy(vmx->nested.cached_vmcs12,
> > -                      vmx->nested.current_vmcs12, VMCS12_SIZE);
> > +               memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
> > +               kunmap(page);
> 
> + nested_release_page_clean(page);

Added this and your note about the dirty bit when applying,

thanks.
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b37161808352..142f16ebdca2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -416,9 +416,6 @@  struct nested_vmx {
 
 	/* The guest-physical address of the current VMCS L1 keeps for L2 */
 	gpa_t current_vmptr;
-	/* The host-usable pointer to the above */
-	struct page *current_vmcs12_page;
-	struct vmcs12 *current_vmcs12;
 	/*
 	 * Cache of the guest's VMCS, existing outside of guest memory.
 	 * Loaded from guest memory during VMPTRLD. Flushed to guest
@@ -7183,10 +7180,6 @@  static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 	if (vmx->nested.current_vmptr == -1ull)
 		return;
 
-	/* current_vmptr and current_vmcs12 are always set/reset together */
-	if (WARN_ON(vmx->nested.current_vmcs12 == NULL))
-		return;
-
 	if (enable_shadow_vmcs) {
 		/* copy to memory all shadowed fields in case
 		   they were modified */
@@ -7199,13 +7192,11 @@  static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 	vmx->nested.posted_intr_nv = -1;
 
 	/* Flush VMCS12 to guest memory */
-	memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
-	       VMCS12_SIZE);
+	kvm_vcpu_write_guest_page(&vmx->vcpu,
+				  vmx->nested.current_vmptr >> PAGE_SHIFT,
+				  vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
 
-	kunmap(vmx->nested.current_vmcs12_page);
-	nested_release_page(vmx->nested.current_vmcs12_page);
 	vmx->nested.current_vmptr = -1ull;
-	vmx->nested.current_vmcs12 = NULL;
 }
 
 /*
@@ -7623,14 +7614,13 @@  static int handle_vmptrld(struct kvm_vcpu *vcpu)
 		}
 
 		nested_release_vmcs12(vmx);
-		vmx->nested.current_vmcs12 = new_vmcs12;
-		vmx->nested.current_vmcs12_page = page;
 		/*
 		 * Load VMCS12 from guest memory since it is not already
 		 * cached.
 		 */
-		memcpy(vmx->nested.cached_vmcs12,
-		       vmx->nested.current_vmcs12, VMCS12_SIZE);
+		memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
+		kunmap(page);
+
 		set_current_vmptr(vmx, vmptr);
 	}
 
@@ -9354,7 +9344,6 @@  static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
-	vmx->nested.current_vmcs12 = NULL;
 
 	vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;