diff mbox series

[7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier

Message ID c3128665745b58500f71f46db6969d02cabcc8db.1692119201.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: gmem: Adding hooks for SEV and TDX | expand

Commit Message

Isaku Yamahata Aug. 15, 2023, 5:18 p.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
fput() and kvm_mmu_notifier_release() via mmput() can be called
simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
to access process memory.  mmput() can defer after closing the file.

kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.
With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
releasing by kvm_gmem_release() can race.  Add slots_lock to
kvm_mmu_notifier_release().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/kvm_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Sean Christopherson Aug. 18, 2023, 6:15 p.m. UTC | #1
On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
> fput() and kvm_mmu_notifier_release() via mmput() can be called
> simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
> delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
> kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
> to access process memory.  mmput() can defer after closing the file.
> 
> kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.

KVM shouldn't reclaim memory on file release, it should instead do that on the
inode being "evicted": https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@google.com

> With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
> releasing by kvm_gmem_release() can race.  Add slots_lock to
> kvm_mmu_notifier_release().

No, the right answer is to not release the HKID until the VM is destroyed.  gmem
has a reference to its associated kvm instance, and so that will naturally ensure
memory all memory encrypted with the HKID is freed before the HKID is released.
kvm_flush_shadow_all() should only tear down page tables, it shouldn't be freeing
guest_memfd memory.

Then patches 6-8 go away.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 49380cd62367..4855d0b7a859 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -927,9 +927,16 @@  static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int idx;
 
+	/*
+	 * Avoide race with kvm_gmem_release().
+	 * This function is called via mmu notifier, mmu_release().
+	 * kvm_gmem_release() is called via fput() on process exit.
+	 */
+	mutex_lock(&kvm->slots_lock);
 	idx = srcu_read_lock(&kvm->srcu);
 	kvm_flush_shadow_all(kvm);
 	srcu_read_unlock(&kvm->srcu, idx);
+	mutex_unlock(&kvm->slots_lock);
 }
 
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {