From patchwork Thu Nov 11 14:13:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04663C4332F for ; Thu, 11 Nov 2021 14:21:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9CDFD61106 for ; Thu, 11 Nov 2021 14:21:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9CDFD61106 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:56612 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mlAxF-00027m-Rl for qemu-devel@archiver.kernel.org; Thu, 11 Nov 2021 09:21:29 -0500 Received: from eggs.gnu.org ([209.51.188.92]:40496) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mlArW-00007K-A2 for qemu-devel@nongnu.org; Thu, 11 Nov 2021 09:15:34 -0500 Received: from mga11.intel.com ([192.55.52.93]:59141) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mlArT-0006Ub-Py for qemu-devel@nongnu.org; Thu, 11 Nov 2021 09:15:34 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="230378088" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="230378088" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555579" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:12 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Subject: [RFC PATCH 3/6] kvm: x86: add private_ops to memslot Date: Thu, 11 Nov 2021 22:13:42 +0800 Message-Id: <20211111141352.26311-4-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Received-SPF: none client-ip=192.55.52.93; envelope-from=chao.p.peng@linux.intel.com; helo=mga11.intel.com X-Spam_score_int: -68 X-Spam_score: -6.9 X-Spam_bar: ------ X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , jun.nakajima@intel.com, david@redhat.com, "J . Bruce Fields" , dave.hansen@intel.com, "H . Peter Anvin" , Chao Peng , ak@linux.intel.com, Jonathan Corbet , Joerg Roedel , x86@kernel.org, Hugh Dickins , Ingo Molnar , Borislav Petkov , luto@kernel.org, Thomas Gleixner , Vitaly Kuznetsov , Jim Mattson , Sean Christopherson , susie.li@intel.com, Jeff Layton , john.ji@intel.com, Yu Zhang , Paolo Bonzini , Andrew Morton , "Kirill A . Shutemov" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Guest memory for guest private memslot is designed to be backed by an "enlightened" file descriptor(fd). Some callbacks (working on the fd) are implemented by some other kernel subsystems who want to provide guest private memory to help KVM to establish the memory mapping. Signed-off-by: Sean Christopherson Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- Documentation/virt/kvm/api.rst | 1 + arch/x86/kvm/mmu/mmu.c | 47 ++++++++++++++++++++++++++++++---- arch/x86/kvm/mmu/paging_tmpl.h | 3 ++- include/linux/kvm_host.h | 8 ++++++ include/uapi/linux/kvm.h | 3 +++ 5 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 47054a79d395..16c06bf10302 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1260,6 +1260,7 @@ yet and must be cleared on entry. __u64 guest_phys_addr; __u64 memory_size; /* bytes */ __u64 userspace_addr; /* start of the userspace allocated memory */ + __u32 fd; /* memory fd that provides guest memory */ }; /* for kvm_memory_region::flags */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8483c15eac6f..af5ecf4ef62a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2932,6 +2932,19 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, return level; } +static int host_private_pfn_mapping_level(const struct kvm_memory_slot *slot, + gfn_t gfn) +{ + kvm_pfn_t pfn; + int page_level = PG_LEVEL_4K; + + pfn = slot->private_ops->get_lock_pfn(slot, gfn, &page_level); + if (pfn >= 0) + slot->private_ops->put_unlock_pfn(pfn); + + return page_level; +} + int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t pfn, int max_level) { @@ -2947,6 +2960,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; + if (memslot_is_private(slot)) + return host_private_pfn_mapping_level(slot, gfn); + return host_pfn_mapping_level(kvm, gfn, pfn, slot); } @@ -3926,10 +3942,13 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva, - bool write, bool *writable) + bool write, bool *writable, bool private) { - struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + struct kvm_memory_slot *slot; bool async; + int page_level; + + slot = __kvm_vcpu_gfn_to_memslot(vcpu, gfn, private); /* Don't expose aliases for no slot GFNs or private memslots */ if ((cr2_or_gpa & vcpu_gpa_stolen_mask(vcpu)) && @@ -3945,6 +3964,17 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } + if (private) { + *pfn = slot->private_ops->get_lock_pfn(slot, gfn, &page_level); + if (*pfn < 0) + *pfn = KVM_PFN_ERR_FAULT; + if (writable) + *writable = slot->flags & KVM_MEM_READONLY ? + false : true; + + return false; + } + async = false; *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable, hva); @@ -3971,7 +4001,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, kvm_pfn_t *pfn) { bool write = error_code & PFERR_WRITE_MASK; + bool private = is_private_gfn(vcpu, gpa >> PAGE_SHIFT); bool map_writable; + struct kvm_memory_slot *slot; gfn_t gfn = vcpu_gpa_to_gfn_unalias(vcpu, gpa); unsigned long mmu_seq; @@ -3995,7 +4027,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, smp_rmb(); if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, &hva, - write, &map_writable)) + write, &map_writable, private)) return RET_PF_RETRY; if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, *pfn, ACC_ALL, &r)) @@ -4008,7 +4040,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, else write_lock(&vcpu->kvm->mmu_lock); - if (!is_noslot_pfn(*pfn) && + if (!private && !is_noslot_pfn(*pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) goto out_unlock; r = make_mmu_pages_available(vcpu); @@ -4027,7 +4059,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(*pfn); + + if (!private) + kvm_release_pfn_clean(*pfn); + else + slot->private_ops->put_unlock_pfn(*pfn); + return r; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 1fc3a0826072..5ffeb9c85fba 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -799,6 +799,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, { bool write_fault = error_code & PFERR_WRITE_MASK; bool user_fault = error_code & PFERR_USER_MASK; + bool private = is_private_gfn(vcpu, addr >> PAGE_SHIFT); struct guest_walker walker; int r; kvm_pfn_t pfn; @@ -854,7 +855,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, smp_rmb(); if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva, - write_fault, &map_writable)) + write_fault, &map_writable, private)) return RET_PF_RETRY; if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r)) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8e5b197230ed..83345460c5f5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -347,6 +347,12 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE); } +struct kvm_private_memory_ops { + unsigned long (*get_lock_pfn)(const struct kvm_memory_slot *slot, + gfn_t gfn, int *page_level); + void (*put_unlock_pfn)(unsigned long pfn); +}; + /* * Some of the bitops functions do not support too long bitmaps. * This number must be determined not to exceed such limits. @@ -362,6 +368,8 @@ struct kvm_memory_slot { u32 flags; short id; u16 as_id; + struct file *file; + struct kvm_private_memory_ops *private_ops; }; static inline bool kvm_slot_dirty_track_enabled(struct kvm_memory_slot *slot) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d3c9caf86d80..8d20caae9180 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -100,6 +100,9 @@ struct kvm_userspace_memory_region { __u64 guest_phys_addr; __u64 memory_size; /* bytes */ __u64 userspace_addr; /* start of the userspace allocated memory */ +#ifdef KVM_PRIVATE_ADDRESS_SPACE + __u32 fd; /* valid if memslot is guest private memory */ +#endif }; /*