Message ID | 20191023171435.46287-1-jmattson@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: call kvm_arch_destroy_vm if vm creation fails | expand |
On Wed, Oct 23, 2019 at 10:14:35AM -0700, Jim Mattson wrote: > From: John Sperbeck <jsperbeck@google.com> > > In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but > then fail later in the function, we need to call kvm_arch_destroy_vm() > so that it can do any necessary cleanup (like freeing memory). > > Fixes: 44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support") > Signed-off-by: John Sperbeck <jsperbeck@google.com> > Signed-off-by: Jim Mattson <jmattson@google.com> > --- > virt/kvm/kvm_main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index fd68fbe0a75d2..10ac7ae03677b 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -645,7 +645,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > > r = kvm_arch_init_vm(kvm, type); > if (r) > - goto out_err_no_disable; > + goto out_err_no_arch_destroy_vm; > > r = hardware_enable_all(); > if (r) > @@ -698,10 +698,12 @@ static struct kvm *kvm_create_vm(unsigned long type) > hardware_disable_all(); > out_err_no_disable: > refcount_set(&kvm->users_count, 0); > + kvm_arch_destroy_vm(kvm); Calling destroy_vm() after zeroing the refcount could lead to a refcount underrun (and a WARN with CONFIG_REFCOUNT_FULL=y) if an arch were to do kvm_put_kvm() in destroy_vm() to pair with a kvm_get_kvm() in create_vm(). I doubt any arch actually does that, but it's technically possible since kvm_arch_create_vm() is called with users_count=1. If we wanted to be paranoid, a follow-up patch could change refcount_set() to WARN_ON(!refcount_dec_and_dest()), e.g.: kvm_arch_destroy_vm(kvm); WARN_ON(!refcount_dec_and_dest(&kvm->users_count)); > for (i = 0; i < KVM_NR_BUSES; i++) > kfree(kvm_get_bus(kvm, i)); > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > kvm_free_memslots(kvm, __kvm_memslots(kvm, i)); > +out_err_no_arch_destroy_vm: > kvm_arch_free_vm(kvm); > mmdrop(current->mm); > return ERR_PTR(r); > -- > 2.23.0.866.gb869b98d4c-goog >
On 10/23/19 11:21 AM, Sean Christopherson wrote: >> out_err_no_disable: >> refcount_set(&kvm->users_count, 0); >> + kvm_arch_destroy_vm(kvm); > > Calling destroy_vm() after zeroing the refcount could lead to a refcount > underrun (and a WARN with CONFIG_REFCOUNT_FULL=y) if an arch were to do > kvm_put_kvm() in destroy_vm() to pair with a kvm_get_kvm() in create_vm(). > I doubt any arch actually does that, but it's technically possible since > kvm_arch_create_vm() is called with users_count=1. > > If we wanted to be paranoid, a follow-up patch could change refcount_set() > to WARN_ON(!refcount_dec_and_dest()), e.g.: > > kvm_arch_destroy_vm(kvm); > WARN_ON(!refcount_dec_and_dest(&kvm->users_count)); > AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm() is called from kvm_destroy_vm() in the normal case. So there really shouldn't be any arch that does a kvm_put_kvm() inside kvm_arch_destroy_vm(). I think it might be better to keep the kvm_arch_destroy_vm() call after the refcount_set() to be consistent with the normal path.
On 24/10/19 04:59, Junaid Shahid wrote: > AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm() > is called from kvm_destroy_vm() in the normal case. Yes: if (refcount_dec_and_test(&kvm->users_count)) kvm_destroy_vm(kvm); where | int atomic_inc_and_test(atomic_t *v); | int atomic_dec_and_test(atomic_t *v); | | These two routines increment and decrement by 1, respectively, the | given atomic counter. They return a boolean indicating whether the | resulting counter value was zero or not. > So there really > shouldn't be any arch that does a kvm_put_kvm() inside > kvm_arch_destroy_vm(). I think it might be better to keep the > kvm_arch_destroy_vm() call after the refcount_set() to be consistent > with the normal path. I agree, so I am applying Jim's patch. If anything, we may want to WARN if the refcount is not 1 before the refcount_set. Paolo
On Thu, Oct 24, 2019 at 12:08:29PM +0200, Paolo Bonzini wrote: > On 24/10/19 04:59, Junaid Shahid wrote: > > AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm() > > is called from kvm_destroy_vm() in the normal case. > > Yes: > > if (refcount_dec_and_test(&kvm->users_count)) > kvm_destroy_vm(kvm); > > where > > | int atomic_inc_and_test(atomic_t *v); > | int atomic_dec_and_test(atomic_t *v); > | > | These two routines increment and decrement by 1, respectively, the > | given atomic counter. They return a boolean indicating whether the > | resulting counter value was zero or not. > > > So there really > > shouldn't be any arch that does a kvm_put_kvm() inside > > kvm_arch_destroy_vm(). I think it might be better to keep the > > kvm_arch_destroy_vm() call after the refcount_set() to be consistent > > with the normal path. > > I agree, so I am applying Jim's patch. Junaid also pointed out that x86 will dereference a NULL kvm->memslots[]. > If anything, we may want to WARN if the refcount is not 1 before the > refcount_set. What about moving "refcount_set(&kvm->users_count, 1)" to right before the VM is added to vm_list, i.e. after arch code and init'ing the mmu_notifier? Along with a comment explaining the kvm_get_kvm() is illegal while the VM is being created. That'd eliminate the atmoic_set() in the error path, which is confusing, at least for me. It'd also obviate the need for an explicit WARN since running with refcount debugging would immediately flag any arch that tried to use kvm_get_kvm() during kvm_arch_create_vm(). Moving the refcount_set() could be done along with rearranging the memslots and buses allocation/cleanup in a preparatory patch before adding the call to kvm_arch_destroy_vm().
On 24/10/19 20:14, Sean Christopherson wrote: > On Thu, Oct 24, 2019 at 12:08:29PM +0200, Paolo Bonzini wrote: >> On 24/10/19 04:59, Junaid Shahid wrote: >>> AFAICT the kvm->users_count is already 0 before kvm_arch_destroy_vm() >>> is called from kvm_destroy_vm() in the normal case. >> >> Yes: >> >> if (refcount_dec_and_test(&kvm->users_count)) >> kvm_destroy_vm(kvm); >> >> where >> >> | int atomic_inc_and_test(atomic_t *v); >> | int atomic_dec_and_test(atomic_t *v); >> | >> | These two routines increment and decrement by 1, respectively, the >> | given atomic counter. They return a boolean indicating whether the >> | resulting counter value was zero or not. >> >>> So there really >>> shouldn't be any arch that does a kvm_put_kvm() inside >>> kvm_arch_destroy_vm(). I think it might be better to keep the >>> kvm_arch_destroy_vm() call after the refcount_set() to be consistent >>> with the normal path. >> >> I agree, so I am applying Jim's patch. > > Junaid also pointed out that x86 will dereference a NULL kvm->memslots[]. > >> If anything, we may want to WARN if the refcount is not 1 before the >> refcount_set. > > What about moving "refcount_set(&kvm->users_count, 1)" to right before the > VM is added to vm_list, i.e. after arch code and init'ing the mmu_notifier? > Along with a comment explaining the kvm_get_kvm() is illegal while the VM > is being created. > > That'd eliminate the atmoic_set() in the error path, which is confusing, > at least for me. It'd also obviate the need for an explicit WARN since > running with refcount debugging would immediately flag any arch that > tried to use kvm_get_kvm() during kvm_arch_create_vm(). > > Moving the refcount_set() could be done along with rearranging the memslots > and buses allocation/cleanup in a preparatory patch before adding the call > to kvm_arch_destroy_vm(). Sounds good. Paolo
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fd68fbe0a75d2..10ac7ae03677b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -645,7 +645,7 @@ static struct kvm *kvm_create_vm(unsigned long type) r = kvm_arch_init_vm(kvm, type); if (r) - goto out_err_no_disable; + goto out_err_no_arch_destroy_vm; r = hardware_enable_all(); if (r) @@ -698,10 +698,12 @@ static struct kvm *kvm_create_vm(unsigned long type) hardware_disable_all(); out_err_no_disable: refcount_set(&kvm->users_count, 0); + kvm_arch_destroy_vm(kvm); for (i = 0; i < KVM_NR_BUSES; i++) kfree(kvm_get_bus(kvm, i)); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) kvm_free_memslots(kvm, __kvm_memslots(kvm, i)); +out_err_no_arch_destroy_vm: kvm_arch_free_vm(kvm); mmdrop(current->mm); return ERR_PTR(r);