Message ID | 20211223123011.41044-12-chao.p.peng@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: mm: fd-based approach for supporting KVM guest private memory | expand |
On Thu, Dec 23, 2021, Chao Peng wrote: > This new function establishes the mapping in KVM page tables for a > given gfn range. It can be used in the memory fallocate callback for > memfd based memory to establish the mapping for KVM secondary MMU when > the pages are allocated in the memory backend. NAK, under no circumstance should KVM install SPTEs in response to allocating memory in a file. The correct thing to do is to invalidate the gfn range associated with the newly mapped range, i.e. wipe out any shared SPTEs associated with the memslot.
On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > On Thu, Dec 23, 2021, Chao Peng wrote: > > This new function establishes the mapping in KVM page tables for a > > given gfn range. It can be used in the memory fallocate callback for > > memfd based memory to establish the mapping for KVM secondary MMU when > > the pages are allocated in the memory backend. > > NAK, under no circumstance should KVM install SPTEs in response to allocating > memory in a file. The correct thing to do is to invalidate the gfn range > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > with the memslot. Right, thanks.
On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote: > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > This new function establishes the mapping in KVM page tables for a > > > given gfn range. It can be used in the memory fallocate callback for > > > memfd based memory to establish the mapping for KVM secondary MMU when > > > the pages are allocated in the memory backend. > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating > > memory in a file. The correct thing to do is to invalidate the gfn range > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > > with the memslot. > > Right, thanks. BTW, I think the current fallocate() callback is just useless as long as we don't want to install KVM SPTEs in response to allocating memory in a file. The invalidation of the shared SPTEs should be notified through mmu_notifier of the shared memory backend, not memfd_notifier of the private memory backend. Thanks, Chao
On Fri, Dec 31, 2021, Chao Peng wrote: > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote: > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > > This new function establishes the mapping in KVM page tables for a > > > > given gfn range. It can be used in the memory fallocate callback for > > > > memfd based memory to establish the mapping for KVM secondary MMU when > > > > the pages are allocated in the memory backend. > > > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating > > > memory in a file. The correct thing to do is to invalidate the gfn range > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > > > with the memslot. > > > > Right, thanks. > > BTW, I think the current fallocate() callback is just useless as long as > we don't want to install KVM SPTEs in response to allocating memory in a > file. The invalidation of the shared SPTEs should be notified through > mmu_notifier of the shared memory backend, not memfd_notifier of the > private memory backend. No, because the private fd is the final source of truth as to whether or not a GPA is private, e.g. userspace may choose to not unmap the shared backing. KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private memslot and is present in the private backing store. And the other core rule is that KVM must never map both the private and shared variants of a GPA into the guest.
On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote: > On Fri, Dec 31, 2021, Chao Peng wrote: > > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote: > > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > > > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > > > This new function establishes the mapping in KVM page tables for a > > > > > given gfn range. It can be used in the memory fallocate callback for > > > > > memfd based memory to establish the mapping for KVM secondary MMU when > > > > > the pages are allocated in the memory backend. > > > > > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating > > > > memory in a file. The correct thing to do is to invalidate the gfn range > > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > > > > with the memslot. > > > > > > Right, thanks. > > > > BTW, I think the current fallocate() callback is just useless as long as > > we don't want to install KVM SPTEs in response to allocating memory in a > > file. The invalidation of the shared SPTEs should be notified through > > mmu_notifier of the shared memory backend, not memfd_notifier of the > > private memory backend. > > No, because the private fd is the final source of truth as to whether or not a > GPA is private, e.g. userspace may choose to not unmap the shared backing. > KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private > memslot and is present in the private backing store. And the other core rule is > that KVM must never map both the private and shared variants of a GPA into the > guest. That's true, but I'm wondering if zapping the shared variant can be handled at the time when the private one gets mapped in the KVM page fault. No bothering the backing store to dedicate a callback to tell KVM. Chao
On Wed, Jan 05, 2022, Chao Peng wrote: > On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote: > > On Fri, Dec 31, 2021, Chao Peng wrote: > > > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote: > > > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > > > > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > > > > This new function establishes the mapping in KVM page tables for a > > > > > > given gfn range. It can be used in the memory fallocate callback for > > > > > > memfd based memory to establish the mapping for KVM secondary MMU when > > > > > > the pages are allocated in the memory backend. > > > > > > > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating > > > > > memory in a file. The correct thing to do is to invalidate the gfn range > > > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > > > > > with the memslot. > > > > > > > > Right, thanks. > > > > > > BTW, I think the current fallocate() callback is just useless as long as > > > we don't want to install KVM SPTEs in response to allocating memory in a > > > file. The invalidation of the shared SPTEs should be notified through > > > mmu_notifier of the shared memory backend, not memfd_notifier of the > > > private memory backend. > > > > No, because the private fd is the final source of truth as to whether or not a > > GPA is private, e.g. userspace may choose to not unmap the shared backing. > > KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private > > memslot and is present in the private backing store. And the other core rule is > > that KVM must never map both the private and shared variants of a GPA into the > > guest. > > That's true, but I'm wondering if zapping the shared variant can be > handled at the time when the private one gets mapped in the KVM page > fault. No bothering the backing store to dedicate a callback to tell > KVM. Hmm, I don't think that would work for the TDP MMU due to page faults taking mmu_lock for read. E.g. if two vCPUs concurrently fault in both the shared and private variants, a race could exist where the private page fault sees the gfn as private and the shared page fault sees it as shared. In that case, both faults will install a SPTE and KVM would end up running with both variants mapped into the guest. There's also a performance penalty, as KVM would need to walk the shared EPT tree on every private page fault.
On Wed, Jan 05, 2022 at 05:03:23PM +0000, Sean Christopherson wrote: > On Wed, Jan 05, 2022, Chao Peng wrote: > > On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote: > > > On Fri, Dec 31, 2021, Chao Peng wrote: > > > > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote: > > > > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote: > > > > > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > > > > > This new function establishes the mapping in KVM page tables for a > > > > > > > given gfn range. It can be used in the memory fallocate callback for > > > > > > > memfd based memory to establish the mapping for KVM secondary MMU when > > > > > > > the pages are allocated in the memory backend. > > > > > > > > > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating > > > > > > memory in a file. The correct thing to do is to invalidate the gfn range > > > > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated > > > > > > with the memslot. > > > > > > > > > > Right, thanks. > > > > > > > > BTW, I think the current fallocate() callback is just useless as long as > > > > we don't want to install KVM SPTEs in response to allocating memory in a > > > > file. The invalidation of the shared SPTEs should be notified through > > > > mmu_notifier of the shared memory backend, not memfd_notifier of the > > > > private memory backend. > > > > > > No, because the private fd is the final source of truth as to whether or not a > > > GPA is private, e.g. userspace may choose to not unmap the shared backing. > > > KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private > > > memslot and is present in the private backing store. And the other core rule is > > > that KVM must never map both the private and shared variants of a GPA into the > > > guest. > > > > That's true, but I'm wondering if zapping the shared variant can be > > handled at the time when the private one gets mapped in the KVM page > > fault. No bothering the backing store to dedicate a callback to tell > > KVM. > > Hmm, I don't think that would work for the TDP MMU due to page faults taking > mmu_lock for read. E.g. if two vCPUs concurrently fault in both the shared and > private variants, a race could exist where the private page fault sees the gfn > as private and the shared page fault sees it as shared. In that case, both faults > will install a SPTE and KVM would end up running with both variants mapped into the > guest. > > There's also a performance penalty, as KVM would need to walk the shared EPT tree > on every private page fault. Make sense. Thanks, Chao
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1d275e9d76b5..2856eb662a21 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1568,6 +1568,53 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm, return ret; } +bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + struct kvm_vcpu *vcpu; + kvm_pfn_t pfn; + gfn_t gfn; + int idx; + bool ret = true; + + /* Need vcpu context for kvm_mmu_do_page_fault. */ + vcpu = kvm_get_vcpu(kvm, 0); + if (mutex_lock_killable(&vcpu->mutex)) + return false; + + vcpu_load(vcpu); + idx = srcu_read_lock(&kvm->srcu); + + kvm_mmu_reload(vcpu); + + gfn = range->start; + while (gfn < range->end) { + if (signal_pending(current)) { + ret = false; + break; + } + + if (need_resched()) + cond_resched(); + + pfn = kvm_mmu_do_page_fault(vcpu, gfn << PAGE_SHIFT, + PFERR_WRITE_MASK | PFERR_USER_MASK, + false); + if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) { + ret = false; + break; + } + + gfn++; + } + + srcu_read_unlock(&kvm->srcu, idx); + vcpu_put(vcpu); + + mutex_unlock(&vcpu->mutex); + + return ret; +} + bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { bool flush = false; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index be567925831b..8c2359175509 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -241,6 +241,8 @@ struct kvm_gfn_range { pte_t pte; bool may_block; }; + +bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f495c1a313bd..660ce15973ad 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -471,6 +471,11 @@ EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); #if defined(CONFIG_MEMFD_OPS) ||\ (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) +bool __weak kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,