diff mbox series

kvm: call kvm_arch_destroy_vm if vm creation fails

Message ID 20191023171435.46287-1-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series kvm: call kvm_arch_destroy_vm if vm creation fails | expand

Commit Message

Jim Mattson Oct. 23, 2019, 5:14 p.m. UTC
From: John Sperbeck <jsperbeck@google.com>

In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
then fail later in the function, we need to call kvm_arch_destroy_vm()
so that it can do any necessary cleanup (like freeing memory).

Fixes: 44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 virt/kvm/kvm_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Oct. 23, 2019, 6:21 p.m. UTC | #1
On Wed, Oct 23, 2019 at 10:14:35AM -0700, Jim Mattson wrote:
> From: John Sperbeck <jsperbeck@google.com>
> 
> In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
> then fail later in the function, we need to call kvm_arch_destroy_vm()
> so that it can do any necessary cleanup (like freeing memory).
> 
> Fixes: 44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support")
> Signed-off-by: John Sperbeck <jsperbeck@google.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  virt/kvm/kvm_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fd68fbe0a75d2..10ac7ae03677b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -645,7 +645,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  
>  	r = kvm_arch_init_vm(kvm, type);
>  	if (r)
> -		goto out_err_no_disable;
> +		goto out_err_no_arch_destroy_vm;
>  
>  	r = hardware_enable_all();
>  	if (r)
> @@ -698,10 +698,12 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	hardware_disable_all();
>  out_err_no_disable:
>  	refcount_set(&kvm->users_count, 0);
> +	kvm_arch_destroy_vm(kvm);

Calling destroy_vm() after zeroing the refcount could lead to a refcount
underrun (and a WARN with CONFIG_REFCOUNT_FULL=y) if an arch were to do
kvm_put_kvm() in destroy_vm() to pair with a kvm_get_kvm() in create_vm().
I doubt any arch actually does that, but it's technically possible since
kvm_arch_create_vm() is called with users_count=1.

If we wanted to be paranoid, a follow-up patch could change refcount_set()
to WARN_ON(!refcount_dec_and_dest()), e.g.:

	kvm_arch_destroy_vm(kvm);
	WARN_ON(!refcount_dec_and_dest(&kvm->users_count));

>  	for (i = 0; i < KVM_NR_BUSES; i++)
>  		kfree(kvm_get_bus(kvm, i));
>  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
>  		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
> +out_err_no_arch_destroy_vm:
>  	kvm_arch_free_vm(kvm);
>  	mmdrop(current->mm);
>  	return ERR_PTR(r);
> -- 
> 2.23.0.866.gb869b98d4c-goog
>
Junaid Shahid Oct. 24, 2019, 2:59 a.m. UTC | #2
On 10/23/19 11:21 AM, Sean Christopherson wrote:
>>  out_err_no_disable:
>>  	refcount_set(&kvm->users_count, 0);
>> +	kvm_arch_destroy_vm(kvm);
> 
> Calling destroy_vm() after zeroing the refcount could lead to a refcount
> underrun (and a WARN with CONFIG_REFCOUNT_FULL=y) if an arch were to do
> kvm_put_kvm() in destroy_vm() to pair with a kvm_get_kvm() in create_vm().
> I doubt any arch actually does that, but it's technically possible since
> kvm_arch_create_vm() is called with users_count=1.
> 
> If we wanted to be paranoid, a follow-up patch could change refcount_set()
> to WARN_ON(!refcount_dec_and_dest()), e.g.:
> 
> 	kvm_arch_destroy_vm(kvm);
> 	WARN_ON(!refcount_dec_and_dest(&kvm->users_count));
> 

AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm() is called from kvm_destroy_vm() in the normal case. So there really shouldn't be any arch that does a kvm_put_kvm() inside kvm_arch_destroy_vm(). I think it might be better to keep the kvm_arch_destroy_vm() call after the refcount_set() to be consistent with the normal path.
Paolo Bonzini Oct. 24, 2019, 10:08 a.m. UTC | #3
On 24/10/19 04:59, Junaid Shahid wrote:
> AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm()
> is called from kvm_destroy_vm() in the normal case.

Yes:

        if (refcount_dec_and_test(&kvm->users_count))
                kvm_destroy_vm(kvm);

where

| int atomic_inc_and_test(atomic_t *v);
| int atomic_dec_and_test(atomic_t *v);
|
| These two routines increment and decrement by 1, respectively, the
| given atomic counter.  They return a boolean indicating whether the
| resulting counter value was zero or not.

> So there really
> shouldn't be any arch that does a kvm_put_kvm() inside
> kvm_arch_destroy_vm(). I think it might be better to keep the
> kvm_arch_destroy_vm() call after the refcount_set() to be consistent
> with the normal path.

I agree, so I am applying Jim's patch.  If anything, we may want to WARN
if the refcount is not 1 before the refcount_set.

Paolo
Sean Christopherson Oct. 24, 2019, 6:14 p.m. UTC | #4
On Thu, Oct 24, 2019 at 12:08:29PM +0200, Paolo Bonzini wrote:
> On 24/10/19 04:59, Junaid Shahid wrote:
> > AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm()
> > is called from kvm_destroy_vm() in the normal case.
> 
> Yes:
> 
>         if (refcount_dec_and_test(&kvm->users_count))
>                 kvm_destroy_vm(kvm);
> 
> where
> 
> | int atomic_inc_and_test(atomic_t *v);
> | int atomic_dec_and_test(atomic_t *v);
> |
> | These two routines increment and decrement by 1, respectively, the
> | given atomic counter.  They return a boolean indicating whether the
> | resulting counter value was zero or not.
> 
> > So there really
> > shouldn't be any arch that does a kvm_put_kvm() inside
> > kvm_arch_destroy_vm(). I think it might be better to keep the
> > kvm_arch_destroy_vm() call after the refcount_set() to be consistent
> > with the normal path.
> 
> I agree, so I am applying Jim's patch.

Junaid also pointed out that x86 will dereference a NULL kvm->memslots[].

> If anything, we may want to WARN if the refcount is not 1 before the
> refcount_set.

What about moving "refcount_set(&kvm->users_count, 1)" to right before the
VM is added to vm_list, i.e. after arch code and init'ing the mmu_notifier?
Along with a comment explaining the kvm_get_kvm() is illegal while the VM
is being created.

That'd eliminate the atmoic_set() in the error path, which is confusing,
at least for me.  It'd also obviate the need for an explicit WARN since
running with refcount debugging would immediately flag any arch that
tried to use kvm_get_kvm() during kvm_arch_create_vm().

Moving the refcount_set() could be done along with rearranging the memslots
and buses allocation/cleanup in a preparatory patch before adding the call
to kvm_arch_destroy_vm().
Paolo Bonzini Oct. 24, 2019, 6:55 p.m. UTC | #5
On 24/10/19 20:14, Sean Christopherson wrote:
> On Thu, Oct 24, 2019 at 12:08:29PM +0200, Paolo Bonzini wrote:
>> On 24/10/19 04:59, Junaid Shahid wrote:
>>> AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm()
>>> is called from kvm_destroy_vm() in the normal case.
>>
>> Yes:
>>
>>         if (refcount_dec_and_test(&kvm->users_count))
>>                 kvm_destroy_vm(kvm);
>>
>> where
>>
>> | int atomic_inc_and_test(atomic_t *v);
>> | int atomic_dec_and_test(atomic_t *v);
>> |
>> | These two routines increment and decrement by 1, respectively, the
>> | given atomic counter.  They return a boolean indicating whether the
>> | resulting counter value was zero or not.
>>
>>> So there really
>>> shouldn't be any arch that does a kvm_put_kvm() inside
>>> kvm_arch_destroy_vm(). I think it might be better to keep the
>>> kvm_arch_destroy_vm() call after the refcount_set() to be consistent
>>> with the normal path.
>>
>> I agree, so I am applying Jim's patch.
> 
> Junaid also pointed out that x86 will dereference a NULL kvm->memslots[].
> 
>> If anything, we may want to WARN if the refcount is not 1 before the
>> refcount_set.
> 
> What about moving "refcount_set(&kvm->users_count, 1)" to right before the
> VM is added to vm_list, i.e. after arch code and init'ing the mmu_notifier?
> Along with a comment explaining the kvm_get_kvm() is illegal while the VM
> is being created.
> 
> That'd eliminate the atmoic_set() in the error path, which is confusing,
> at least for me.  It'd also obviate the need for an explicit WARN since
> running with refcount debugging would immediately flag any arch that
> tried to use kvm_get_kvm() during kvm_arch_create_vm().
> 
> Moving the refcount_set() could be done along with rearranging the memslots
> and buses allocation/cleanup in a preparatory patch before adding the call
> to kvm_arch_destroy_vm().

Sounds good.

Paolo
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fd68fbe0a75d2..10ac7ae03677b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -645,7 +645,7 @@  static struct kvm *kvm_create_vm(unsigned long type)
 
 	r = kvm_arch_init_vm(kvm, type);
 	if (r)
-		goto out_err_no_disable;
+		goto out_err_no_arch_destroy_vm;
 
 	r = hardware_enable_all();
 	if (r)
@@ -698,10 +698,12 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	hardware_disable_all();
 out_err_no_disable:
 	refcount_set(&kvm->users_count, 0);
+	kvm_arch_destroy_vm(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++)
 		kfree(kvm_get_bus(kvm, i));
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
+out_err_no_arch_destroy_vm:
 	kvm_arch_free_vm(kvm);
 	mmdrop(current->mm);
 	return ERR_PTR(r);