Message ID | 20220303183328.1499189-1-dmatlack@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 3/3/22 19:33, David Matlack wrote: > [Resending with --cc-cover to fix CCs.] > > This series fixes a long-standing theoretical bug where the KVM module > can be unloaded while there are still references to a struct kvm. This > can be reproduced with a simple patch [1] to run some delayed work 10 > seconds after a VM file descriptor is released. > > This bug was originally fixed by Ben Gardon <bgardon@google.com> in > Google's kernel due to a race with an internal kernel daemon. > > KVM's async_pf code looks susceptible to this race since its inception, > but clearly no one has noticed. Upcoming changes to the TDP MMU will > move zapping to asynchronous workers which is much more likely to hit > this issue. Fix it now to close the gap in async_pf and prepare for the > TDP MMU zapping changes. > > While here I noticed some further cleanups that could be done in the > async_pf code. It seems unnecessary for the async_pf code to grab a > reference via kvm_get_kvm() because there's code to synchronously drain > its queue of work in kvm_destroy_vm() -> ... -> > kvm_clear_async_pf_completion_queue() (at least on x86). This is > actually dead code because kvm_destroy_vm() will never be called while > there are references to kvm.users_count (which the async_pf callbacks > themselves hold). It seems we could either drop the reference grabbing > in async_pf.c or drop the call to kvm_clear_async_pf_completion_queue(). Adding a WARN makes sense, but then you'd probably keep the call anyway just to avoid a use-after-free in case the WARN triggers. Queued with a note on patch 1: + /* + * When the fd passed to this ioctl() is opened it pins the module, + * but try_module_get() also prevents getting a reference if the module + * is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"). + */ Thanks, Paolo > These patches apply on the latest kvm/queue commit b13a3befc815 ("KVM: > selftests: Add test to populate a VM with the max possible guest mem") > after reverting commit c9bdd0aa6800 ("KVM: allow struct kvm to outlive > the file descriptors").
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9536ffa0473b..db827cf6a7a7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -771,6 +771,8 @@ struct kvm { struct notifier_block pm_notifier; #endif char stats_id[KVM_STATS_NAME_SIZE]; + + struct delayed_work run_after_vm_release_work; }; #define kvm_err(fmt, ...) \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 64eb99444688..35ae6d32dae5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1258,12 +1258,25 @@ void kvm_put_kvm_no_destroy(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); +static void run_after_vm_release(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct kvm *kvm = container_of(dwork, struct kvm, run_after_vm_release_work); + + pr_info("I'm still alive!\n"); + kvm_put_kvm(kvm); +} + static int kvm_vm_release(struct inode *inode, struct file *filp) { struct kvm *kvm = filp->private_data; kvm_irqfd_release(kvm); + kvm_get_kvm(kvm); + INIT_DELAYED_WORK(&kvm->run_after_vm_release_work, run_after_vm_release); + schedule_delayed_work(&kvm->run_after_vm_release_work, 10 * HZ); + kvm_put_kvm(kvm); return 0; }