Message ID | 20240904030751.117579-20-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX MMU Part 2 | expand |
On Tue, Sep 03, 2024 at 08:07:49PM -0700, Rick Edgecombe wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Add a new ioctl for the user space VMM to initialize guest memory with the > specified memory contents. > > Because TDX protects the guest's memory, the creation of the initial guest > memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of > directly copying the memory contents into the guest's memory in the case of > the default VM type. > > Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped > KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign > the guest page in Secure-EPT, copy the initial memory contents into the > guest memory, and encrypt the guest memory. Optionally, extend the memory > measurement of the TDX guest. > > Discussion history: > - Originally, KVM_TDX_INIT_MEM_REGION used the callback of the TDP MMU of > the KVM page fault handler. It issues TDX SEAMCALL deep in the call > stack, and the ioctl passes down the necessary parameters. [2] rejected > it. [3] suggests that the call to the TDX module should be invoked in a > shallow call stack. > > - Instead, introduce guest memory pre-population [1] that doesn't update > vendor-specific part (Secure-EPT in TDX case) and the vendor-specific > code (KVM_TDX_INIT_MEM_REGION) updates only vendor-specific parts without > modifying the KVM TDP MMU suggested at [4] > > Crazy idea. For TDX S-EPT, what if KVM_MAP_MEMORY does all of the > SEPT.ADD stuff, which doesn't affect the measurement, and even fills in > KVM's copy of the leaf EPTE, but tdx_sept_set_private_spte() doesn't do > anything if the TD isn't finalized? > > Then KVM provides a dedicated TDX ioctl(), i.e. what is/was > KVM_TDX_INIT_MEM_REGION, to do PAGE.ADD. KVM_TDX_INIT_MEM_REGION > wouldn't need to map anything, it would simply need to verify that the > pfn from guest_memfd() is the same as what's in the TDP MMU. > > - Use the common guest_memfd population function, kvm_gmem_populate() > instead of a custom function. It should check whether the PFN > from TDP MMU is the same as the one from guest_memfd. [1] > > - Instead of forcing userspace to do two passes, pre-map the guest > initial memory in tdx_gmem_post_populate. [5] > > Link: https://lore.kernel.org/kvm/20240419085927.3648704-1-pbonzini@redhat.com/ [1] > Link: https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ [2] > Link: https://lore.kernel.org/kvm/Zh8DHbb8FzoVErgX@google.com/ [3] > Link: https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ [4] > Link: https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ [5] > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > TDX MMU part 2 v1: > - Update the code according to latest gmem update. > https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ > - Fixup a aligment bug reported by Binbin. > - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean) > - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to > KVM_TDX_INIT_MEM_REGION. (Sean) > - Added nr_premapped to track the number of premapped pages > - Drop tdx_post_mmu_map_page(). > - Drop kvm_slot_can_be_private() check (Paolo) > - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo) > > v19: > - Switched to use KVM_MEMORY_MAPPING > - Dropped measurement extension > - updated commit message. private_page_add() => set_private_spte() > --- > arch/x86/include/uapi/asm/kvm.h | 9 ++ > arch/x86/kvm/vmx/tdx.c | 150 ++++++++++++++++++++++++++++++++ > virt/kvm/kvm_main.c | 1 + > 3 files changed, 160 insertions(+) > > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index 39636be5c891..789d1d821b4f 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { > KVM_TDX_CAPABILITIES = 0, > KVM_TDX_INIT_VM, > KVM_TDX_INIT_VCPU, > + KVM_TDX_INIT_MEM_REGION, > KVM_TDX_GET_CPUID, > > KVM_TDX_CMD_NR_MAX, > @@ -996,4 +997,12 @@ struct kvm_tdx_init_vm { > struct kvm_cpuid2 cpuid; > }; > > +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) > + > +struct kvm_tdx_init_mem_region { > + __u64 source_addr; > + __u64 gpa; > + __u64 nr_pages; > +}; > + > #endif /* _ASM_X86_KVM_H */ > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 50ce24905062..796d1a495a66 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -8,6 +8,7 @@ > #include "tdx_ops.h" > #include "vmx.h" > #include "mmu/spte.h" > +#include "common.h" > > #undef pr_fmt > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > @@ -1586,6 +1587,152 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) > return 0; > } > > +struct tdx_gmem_post_populate_arg { > + struct kvm_vcpu *vcpu; > + __u32 flags; > +}; > + > +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > + void __user *src, int order, void *_arg) > +{ > + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + struct tdx_gmem_post_populate_arg *arg = _arg; > + struct kvm_vcpu *vcpu = arg->vcpu; > + gpa_t gpa = gfn_to_gpa(gfn); > + u8 level = PG_LEVEL_4K; > + struct page *page; > + int ret, i; > + u64 err, entry, level_state; > + > + /* > + * Get the source page if it has been faulted in. Return failure if the > + * source page has been swapped out or unmapped in primary memory. > + */ > + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); > + if (ret < 0) > + return ret; > + if (ret != 1) > + return -ENOMEM; > + > + if (!kvm_mem_is_private(kvm, gfn)) { > + ret = -EFAULT; > + goto out_put_page; > + } > + > + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); > + if (ret < 0) > + goto out_put_page; > + > + read_lock(&kvm->mmu_lock); Although mirrored root can't be zapped with shared lock currently, is it better to hold write_lock() here? It should bring no extra overhead in a normal condition when the tdx_gmem_post_populate() is called. > + > + if (!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { > + ret = -ENOENT; > + goto out; > + } > + > + ret = 0; > + do { > + err = tdh_mem_page_add(kvm_tdx, gpa, pfn_to_hpa(pfn), > + pfn_to_hpa(page_to_pfn(page)), > + &entry, &level_state); > + } while (err == TDX_ERROR_SEPT_BUSY); > + if (err) { > + ret = -EIO; > + goto out; > + } > + > + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); > + atomic64_dec(&kvm_tdx->nr_premapped); > + > + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { > + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { > + err = tdh_mr_extend(kvm_tdx, gpa + i, &entry, > + &level_state); > + if (err) { > + ret = -EIO; > + break; > + } > + } > + } > + > +out: > + read_unlock(&kvm->mmu_lock); > +out_put_page: > + put_page(page); > + return ret; > +} > + > +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) > +{ > + struct kvm *kvm = vcpu->kvm; > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + struct kvm_tdx_init_mem_region region; > + struct tdx_gmem_post_populate_arg arg; > + long gmem_ret; > + int ret; > + > + if (!to_tdx(vcpu)->initialized) > + return -EINVAL; > + > + /* Once TD is finalized, the initial guest memory is fixed. */ > + if (is_td_finalized(kvm_tdx)) > + return -EINVAL; > + > + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) > + return -EINVAL; > + > + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) > + return -EFAULT; > + > + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || > + !region.nr_pages || > + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || > + !kvm_is_private_gpa(kvm, region.gpa) || > + !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) > + return -EINVAL; > + > + mutex_lock(&kvm->slots_lock); > + > + kvm_mmu_reload(vcpu); > + ret = 0; > + while (region.nr_pages) { > + if (signal_pending(current)) { > + ret = -EINTR; > + break; > + } > + > + arg = (struct tdx_gmem_post_populate_arg) { > + .vcpu = vcpu, > + .flags = cmd->flags, > + }; > + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), > + u64_to_user_ptr(region.source_addr), > + 1, tdx_gmem_post_populate, &arg); > + if (gmem_ret < 0) { > + ret = gmem_ret; > + break; > + } > + > + if (gmem_ret != 1) { > + ret = -EIO; > + break; > + } > + > + region.source_addr += PAGE_SIZE; > + region.gpa += PAGE_SIZE; > + region.nr_pages--; > + > + cond_resched(); > + } > + > + mutex_unlock(&kvm->slots_lock); > + > + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) > + ret = -EFAULT; > + return ret; > +} > + > int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) > { > struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); > @@ -1605,6 +1752,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) > case KVM_TDX_INIT_VCPU: > ret = tdx_vcpu_init(vcpu, &cmd); > break; > + case KVM_TDX_INIT_MEM_REGION: > + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); > + break; > case KVM_TDX_GET_CPUID: > ret = tdx_vcpu_get_cpuid(vcpu, &cmd); > break; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 73fc3334721d..0822db480719 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2639,6 +2639,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn > > return NULL; > } > +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); > > bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) > { > -- > 2.34.1 >
On Tue, 2024-09-03 at 20:07 -0700, Rick Edgecombe wrote: > +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > + void __user *src, int order, void *_arg) > +{ > + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > + struct tdx_gmem_post_populate_arg *arg = _arg; > + struct kvm_vcpu *vcpu = arg->vcpu; > + gpa_t gpa = gfn_to_gpa(gfn); > + u8 level = PG_LEVEL_4K; > + struct page *page; > + int ret, i; > + u64 err, entry, level_state; > + > + /* > + * Get the source page if it has been faulted in. Return failure if > the > + * source page has been swapped out or unmapped in primary memory. > + */ > + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); > + if (ret < 0) > + return ret; > + if (ret != 1) > + return -ENOMEM; > + > + if (!kvm_mem_is_private(kvm, gfn)) { > + ret = -EFAULT; > + goto out_put_page; > + } Paulo had said he was going to add this check in gmem code. I thought it was not added but it actually is. So we can drop this check.
On Wed, 2024-09-04 at 12:53 +0800, Yan Zhao wrote: > > + if (!kvm_mem_is_private(kvm, gfn)) { > > + ret = -EFAULT; > > + goto out_put_page; > > + } > > + > > + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); > > + if (ret < 0) > > + goto out_put_page; > > + > > + read_lock(&kvm->mmu_lock); > Although mirrored root can't be zapped with shared lock currently, is it > better to hold write_lock() here? > > It should bring no extra overhead in a normal condition when the > tdx_gmem_post_populate() is called. I think we should hold the weakest lock we can. Otherwise someday someone could run into it and think the write_lock() is required. It will add confusion. What was the benefit of a write lock? Just in case we got it wrong?
On Wed, 2024-09-04 at 07:01 -0700, Rick Edgecombe wrote: > On Wed, 2024-09-04 at 12:53 +0800, Yan Zhao wrote: > > > + if (!kvm_mem_is_private(kvm, gfn)) { > > > + ret = -EFAULT; > > > + goto out_put_page; > > > + } > > > + > > > + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); > > > + if (ret < 0) > > > + goto out_put_page; > > > + > > > + read_lock(&kvm->mmu_lock); > > Although mirrored root can't be zapped with shared lock currently, is it > > better to hold write_lock() here? > > > > It should bring no extra overhead in a normal condition when the > > tdx_gmem_post_populate() is called. > > I think we should hold the weakest lock we can. Otherwise someday someone > could > run into it and think the write_lock() is required. It will add confusion. > > What was the benefit of a write lock? Just in case we got it wrong? I just tried to draft a comment to make it look less weird, but I think actually even the mmu_read lock is technically unnecessary because we hold both filemap_invalidate_lock() and slots_lock. The cases we care about: memslot deletion - slots_lock protects gmem hole punch - filemap_invalidate_lock() protects set attributes - slots_lock protects others? So I guess all the mirror zapping cases that could execute concurrently are already covered by other locks. If we skipped grabbing the mmu lock completely it would trigger the assertion in kvm_tdp_mmu_gpa_is_mapped(). Removing the assert would probably make kvm_tdp_mmu_gpa_is_mapped() a bit dangerous. Hmm. Maybe a comment like this: /* * The case to care about here is a PTE getting zapped concurrently and * this function erroneously thinking a page is mapped in the mirror EPT. * The private mem zapping paths are already covered by other locks held * here, but grab an mmu read_lock to not trigger the assert in * kvm_tdp_mmu_gpa_is_mapped(). */ Yan, do you think it is sufficient?
On Sat, Sep 07, 2024 at 12:30:00AM +0800, Edgecombe, Rick P wrote: > On Wed, 2024-09-04 at 07:01 -0700, Rick Edgecombe wrote: > > On Wed, 2024-09-04 at 12:53 +0800, Yan Zhao wrote: > > > > + if (!kvm_mem_is_private(kvm, gfn)) { > > > > + ret = -EFAULT; > > > > + goto out_put_page; > > > > + } > > > > + > > > > + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); > > > > + if (ret < 0) > > > > + goto out_put_page; > > > > + > > > > + read_lock(&kvm->mmu_lock); > > > Although mirrored root can't be zapped with shared lock currently, is it > > > better to hold write_lock() here? > > > > > > It should bring no extra overhead in a normal condition when the > > > tdx_gmem_post_populate() is called. > > > > I think we should hold the weakest lock we can. Otherwise someday someone > > could > > run into it and think the write_lock() is required. It will add confusion. > > > > What was the benefit of a write lock? Just in case we got it wrong? > > I just tried to draft a comment to make it look less weird, but I think actually > even the mmu_read lock is technically unnecessary because we hold both > filemap_invalidate_lock() and slots_lock. The cases we care about: > memslot deletion - slots_lock protects > gmem hole punch - filemap_invalidate_lock() protects > set attributes - slots_lock protects > others? > > So I guess all the mirror zapping cases that could execute concurrently are > already covered by other locks. If we skipped grabbing the mmu lock completely > it would trigger the assertion in kvm_tdp_mmu_gpa_is_mapped(). Removing the > assert would probably make kvm_tdp_mmu_gpa_is_mapped() a bit dangerous. Hmm. > > Maybe a comment like this: > /* > * The case to care about here is a PTE getting zapped concurrently and > * this function erroneously thinking a page is mapped in the mirror EPT. > * The private mem zapping paths are already covered by other locks held > * here, but grab an mmu read_lock to not trigger the assert in > * kvm_tdp_mmu_gpa_is_mapped(). > */ > > Yan, do you think it is sufficient? Yes, with current code base, I think it's sufficient. Thanks! I asked that question was just to confirm whether we need to guard against the potential removal of SPTE under a shared lock, given the change is small and KVM_TDX_INIT_MEM_REGION() is not on performance critical path.
On 9/6/24 18:30, Edgecombe, Rick P wrote: > /* > * The case to care about here is a PTE getting zapped concurrently and > * this function erroneously thinking a page is mapped in the mirror EPT. > * The private mem zapping paths are already covered by other locks held > * here, but grab an mmu read_lock to not trigger the assert in > * kvm_tdp_mmu_gpa_is_mapped(). > */ > > Yan, do you think it is sufficient? If you're actually requiring that the other locks are sufficient, then there can be no ENOENT. Maybe: /* * The private mem cannot be zapped after kvm_tdp_map_page() * because all paths are covered by slots_lock and the * filemap invalidate lock. Check that they are indeed enough. */ if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { scoped_guard(read_lock, &kvm->mmu_lock) { if (KVM_BUG_ON(kvm, !kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { ret = -EIO; goto out; } } } Paolo
On 9/4/24 05:07, Rick Edgecombe wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Add a new ioctl for the user space VMM to initialize guest memory with the > specified memory contents. > > Because TDX protects the guest's memory, the creation of the initial guest > memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of > directly copying the memory contents into the guest's memory in the case of > the default VM type. > > Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped > KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign > the guest page in Secure-EPT, copy the initial memory contents into the > guest memory, and encrypt the guest memory. Optionally, extend the memory > measurement of the TDX guest. > > Discussion history: While useful for the reviewers, in the end this is the simplest possible userspace API (the one that we started with) and the objections just went away because it reuses the infrastructure that was introduced for pre-faulting memory. So I'd replace everything with: --- The ioctl uses the vCPU file descriptor because of the TDX module's requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD) prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In fact, the post-populate callback is able to reuse the same logic used by KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single ioctl. Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that point. This ensures that there cannot be pages in the S-EPT awaiting TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT, but the corresponding S-EPT entry will be !PRESENT). --- Part of the second paragraph comes from your link [4], https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/, but updated for recent changes to KVM_PRE_FAULT_MEMORY. This drops the historical information that is not particularly relevant for the future, it updates what's relevant to mention changes done for SEV-SNP, and also preserves most of the other information: * why the vCPU file descriptor * the desirability of a single ioctl for userspace * the relationship between KVM_TDX_INIT_MEM_REGION and KVM_PRE_FAULT_MEMORY Paolo
On Tue, 2024-09-10 at 12:13 +0200, Paolo Bonzini wrote: > > Yan, do you think it is sufficient? > > If you're actually requiring that the other locks are sufficient, then > there can be no ENOENT. > > Maybe: > > /* > * The private mem cannot be zapped after kvm_tdp_map_page() > * because all paths are covered by slots_lock and the > * filemap invalidate lock. Check that they are indeed enough. > */ > if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { > scoped_guard(read_lock, &kvm->mmu_lock) { > if (KVM_BUG_ON(kvm, > !kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { > ret = -EIO; > goto out; > } > } > } True. We can put it behind CONFIG_KVM_PROVE_MMU.
On Tue, 2024-09-10 at 12:16 +0200, Paolo Bonzini wrote: > While useful for the reviewers, in the end this is the simplest possible > userspace API (the one that we started with) and the objections just > went away because it reuses the infrastructure that was introduced for > pre-faulting memory. > > So I'd replace everything with: Sure, thanks.
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 39636be5c891..789d1d821b4f 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, @@ -996,4 +997,12 @@ struct kvm_tdx_init_vm { struct kvm_cpuid2 cpuid; }; +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 50ce24905062..796d1a495a66 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "tdx_ops.h" #include "vmx.h" #include "mmu/spte.h" +#include "common.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1586,6 +1587,152 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) return 0; } +struct tdx_gmem_post_populate_arg { + struct kvm_vcpu *vcpu; + __u32 flags; +}; + +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *_arg) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct tdx_gmem_post_populate_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + gpa_t gpa = gfn_to_gpa(gfn); + u8 level = PG_LEVEL_4K; + struct page *page; + int ret, i; + u64 err, entry, level_state; + + /* + * Get the source page if it has been faulted in. Return failure if the + * source page has been swapped out or unmapped in primary memory. + */ + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); + if (ret < 0) + return ret; + if (ret != 1) + return -ENOMEM; + + if (!kvm_mem_is_private(kvm, gfn)) { + ret = -EFAULT; + goto out_put_page; + } + + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); + if (ret < 0) + goto out_put_page; + + read_lock(&kvm->mmu_lock); + + if (!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { + ret = -ENOENT; + goto out; + } + + ret = 0; + do { + err = tdh_mem_page_add(kvm_tdx, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); + } while (err == TDX_ERROR_SEPT_BUSY); + if (err) { + ret = -EIO; + goto out; + } + + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); + + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(kvm_tdx, gpa + i, &entry, + &level_state); + if (err) { + ret = -EIO; + break; + } + } + } + +out: + read_unlock(&kvm->mmu_lock); +out_put_page: + put_page(page); + return ret; +} + +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct tdx_gmem_post_populate_arg arg; + long gmem_ret; + int ret; + + if (!to_tdx(vcpu)->initialized) + return -EINVAL; + + /* Once TD is finalized, the initial guest memory is fixed. */ + if (is_td_finalized(kvm_tdx)) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) + return -EFAULT; + + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !kvm_is_private_gpa(kvm, region.gpa) || + !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + + kvm_mmu_reload(vcpu); + ret = 0; + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + arg = (struct tdx_gmem_post_populate_arg) { + .vcpu = vcpu, + .flags = cmd->flags, + }; + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), + u64_to_user_ptr(region.source_addr), + 1, tdx_gmem_post_populate, &arg); + if (gmem_ret < 0) { + ret = gmem_ret; + break; + } + + if (gmem_ret != 1) { + ret = -EIO; + break; + } + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + + cond_resched(); + } + + mutex_unlock(&kvm->slots_lock); + + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) + ret = -EFAULT; + return ret; +} + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1605,6 +1752,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_INIT_MEM_REGION: + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); + break; case KVM_TDX_GET_CPUID: ret = tdx_vcpu_get_cpuid(vcpu, &cmd); break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 73fc3334721d..0822db480719 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2639,6 +2639,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn return NULL; } +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) {