diff mbox series

[2/3] KVM: Unconditionally get a ref to /dev/kvm module when creating a VM

Message ID 20220816053937.2477106-3-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: kvm_create_vm() bug fixes and cleanup | expand

Commit Message

Sean Christopherson Aug. 16, 2022, 5:39 a.m. UTC
Unconditionally get a reference to the /dev/kvm module when creating a VM
instead of using try_get_module(), which will fail if the module is in
the process of being forcefully unloaded.  The error handling when
try_get_module() fails doesn't properly unwind all that has been done,
e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
from the global list.  Not removing VMs from the global list tends to be
fatal, e.g. leads to use-after-free explosions.

The obvious alternative would be to add proper unwinding, but the
justification for using try_get_module(), "rmmod --wait", is completely
bogus as support for "rmmod --wait", i.e. delete_module() without
O_NONBLOCK, was removed by commit 3f2b9c9cdf38 ("module: remove rmmod
--wait option.") nearly a decade ago.

It's still possible for try_get_module() to fail due to the module dying
(more like being killed), as the module will be tagged MODULE_STATE_GOING
by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
with forced unloading is an exercise in futility and gives a falsea sense
of security.  Using try_get_module() only prevents acquiring _new_
references, it doesn't magically put the references held by other VMs,
and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
guaranteed to cause spectacular fireworks; the window where KVM will fail
try_get_module() is tiny compared to the window where KVM is building and
running the VM with an elevated module refcount.

Addressing KVM's inability to play nice with "rmmod --force" is firmly
out-of-scope.  Forcefully unloading any module taints kernel (for obvious
reasons)  _and_ requires the kernel to be built with
CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
amusing disclaimer that it's "mainly for kernel developers and desperate
users".  In other words, KVM is free to scoff at bug reports due to using
"rmmod --force" while VMs may be running.

Fixes: 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

Comments

David Matlack Aug. 16, 2022, 5:01 p.m. UTC | #1
On Tue, Aug 16, 2022 at 05:39:36AM +0000, Sean Christopherson wrote:
> Unconditionally get a reference to the /dev/kvm module when creating a VM
> instead of using try_get_module(), which will fail if the module is in
> the process of being forcefully unloaded.  The error handling when
> try_get_module() fails doesn't properly unwind all that has been done,
> e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
> from the global list.  Not removing VMs from the global list tends to be
> fatal, e.g. leads to use-after-free explosions.
> 
> The obvious alternative would be to add proper unwinding, but the
> justification for using try_get_module(), "rmmod --wait", is completely
> bogus as support for "rmmod --wait", i.e. delete_module() without
> O_NONBLOCK, was removed by commit 3f2b9c9cdf38 ("module: remove rmmod
> --wait option.") nearly a decade ago.

Ah! include/linux/module.h may also need a cleanup then. The comment
above __module_get() explicitly mentions "rmmod --wait", which is what
led me to use try_module_get() for commit 5f6de5cbebee ("KVM: Prevent
module exit until all VMs are freed").
Sean Christopherson Aug. 16, 2022, 9:43 p.m. UTC | #2
On Tue, Aug 16, 2022, David Matlack wrote:
> On Tue, Aug 16, 2022 at 05:39:36AM +0000, Sean Christopherson wrote:
> > Unconditionally get a reference to the /dev/kvm module when creating a VM
> > instead of using try_get_module(), which will fail if the module is in
> > the process of being forcefully unloaded.  The error handling when
> > try_get_module() fails doesn't properly unwind all that has been done,
> > e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
> > from the global list.  Not removing VMs from the global list tends to be
> > fatal, e.g. leads to use-after-free explosions.
> > 
> > The obvious alternative would be to add proper unwinding, but the
> > justification for using try_get_module(), "rmmod --wait", is completely
> > bogus as support for "rmmod --wait", i.e. delete_module() without
> > O_NONBLOCK, was removed by commit 3f2b9c9cdf38 ("module: remove rmmod
> > --wait option.") nearly a decade ago.
> 
> Ah! include/linux/module.h may also need a cleanup then. The comment
> above __module_get() explicitly mentions "rmmod --wait", which is what
> led me to use try_module_get() for commit 5f6de5cbebee ("KVM: Prevent
> module exit until all VMs are freed").

Ugh, I didn't see that one.  The whole thing is a mess.  try_module_get() also
has a comment (just below the "rmmod --wait" comment) saying that it's the one
true way of doing things, but that's at best misleading for cases like this where
a module is taking a reference of _itself_.

The man pages are also woefully out of date :-/
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ee5f48cc100b..15e304e059d4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1134,6 +1134,9 @@  static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
+	/* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */
+	__module_get(kvm_chardev_ops.owner);
+
 	KVM_MMU_LOCK_INIT(kvm);
 	mmgrab(current->mm);
 	kvm->mm = current->mm;
@@ -1226,16 +1229,6 @@  static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	preempt_notifier_inc();
 	kvm_init_pm_notifier(kvm);
 
-	/*
-	 * When the fd passed to this ioctl() is opened it pins the module,
-	 * but try_module_get() also prevents getting a reference if the module
-	 * is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait").
-	 */
-	if (!try_module_get(kvm_chardev_ops.owner)) {
-		r = -ENODEV;
-		goto out_err;
-	}
-
 	return kvm;
 
 out_err:
@@ -1259,6 +1252,7 @@  static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 out_err_no_srcu:
 	kvm_arch_free_vm(kvm);
 	mmdrop(current->mm);
+	module_put(kvm_chardev_ops.owner);
 	return ERR_PTR(r);
 }