[RFC,v6,058/104] KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX

Message ID	fb8f699b7cdd1dc54c13b663e66dfa2cc82c5cd3.1651774250.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com> Subject: [RFC PATCH v6 058/104] KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX Date: Thu, 5 May 2022 11:14:52 -0700 Message-Id: <fb8f699b7cdd1dc54c13b663e66dfa2cc82c5cd3.1651774250.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com> References: <cover.1651774250.git.isaku.yamahata@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM TDX basic feature support \| expand [RFC,v6,000/104] KVM TDX basic feature support [RFC,v6,001/104] KVM: x86: Move check_processor_compatibility from init ops to runtime ops [RFC,v6,002/104] Partially revert "KVM: Pass kvm_init()'s opaque param to additional arch funcs" [RFC,v6,003/104] KVM: Refactor CPU compatibility check on module initialiization [RFC,v6,004/104] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [RFC,v6,005/104] x86/virt/vmx/tdx: export platform_has_tdx [RFC,v6,006/104] KVM: TDX: Detect CPU feature on kernel module initialization [RFC,v6,007/104] KVM: Enable hardware before doing arch VM initialization [RFC,v6,008/104] KVM: x86: Refactor KVM VMX module init/exit functions [RFC,v6,009/104] KVM: TDX: Add placeholders for TDX VM/vcpu structure [RFC,v6,010/104] x86/virt/tdx: Add a helper function to return system wide info about TDX module [RFC,v6,011/104] KVM: TDX: Initialize TDX module when loading kvm_intel.ko [RFC,v6,012/104] KVM: x86: Introduce vm_type to differentiate default VMs from confidential VMs [RFC,v6,013/104] KVM: TDX: Make TDX VM type supported [RFC,v6,014/104,MARKER] The start of TDX KVM patch series: TDX architectural definitions [RFC,v6,015/104] KVM: TDX: Define TDX architectural definitions [RFC,v6,016/104] KVM: TDX: Add TDX "architectural" error codes [RFC,v6,017/104] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [RFC,v6,018/104] KVM: TDX: Add helper functions to print TDX SEAMCALL error [RFC,v6,019/104,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [RFC,v6,020/104] KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers [RFC,v6,021/104] x86/cpu: Add helper functions to allocate/free TDX private host key id [RFC,v6,022/104] KVM: TDX: create/destroy VM structure [RFC,v6,023/104] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [RFC,v6,024/104] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [RFC,v6,025/104] KVM: TDX: initialize VM with TDX specific parameters [RFC,v6,026/104] KVM: TDX: Make KVM_CAP_SET_IDENTITY_MAP_ADDR unsupported for TDX [RFC,v6,027/104] KVM: TDX: Make pmu_intel.c ignore guest TD case [RFC,v6,028/104,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [RFC,v6,029/104] KVM: TDX: allocate/free TDX vcpu structure [RFC,v6,030/104] KVM: TDX: allocate/free TDX vcpu structure [RFC,v6,031/104] KVM: TDX: Do TDX specific vcpu initialization [RFC,v6,032/104,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [RFC,v6,033/104] KVM: x86/mmu: introduce config for PRIVATE KVM MMU [RFC,v6,034/104] KVM: x86/mmu: Add address conversion functions for TDX shared bits [RFC,v6,035/104,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [RFC,v6,036/104] KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault [RFC,v6,037/104] KVM: x86/mmu: Allow non-zero value for non-present SPTE [RFC,v6,038/104] KVM: x86/mmu: Track shadow MMIO value/mask on a per-VM basis [RFC,v6,039/104] KVM: x86/mmu: Disallow fast page fault on private GPA [RFC,v6,040/104] KVM: x86/mmu: Allow per-VM override of the TDP max page level [RFC,v6,041/104] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for private mmu [RFC,v6,042/104] KVM: VMX: Introduce test mode related to EPT violation VE [RFC,v6,043/104,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [RFC,v6,044/104] KVM: x86/mmu: Focibly use TDP MMU for TDX [RFC,v6,045/104] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [RFC,v6,046/104] KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map() [RFC,v6,047/104] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [RFC,v6,048/104,MARKER] The start of TDX KVM patch series: TDX EPT violation [RFC,v6,049/104] KVM: x86/mmu: Disallow dirty logging for x86 TDX [RFC,v6,050/104] KVM: x86/tdp_mmu: Ignore unsupported mmu operation on private GFNs [RFC,v6,051/104] KVM: VMX: Split out guts of EPT violation to common/exposed function [RFC,v6,052/104] KVM: VMX: Move setting of EPT MMU masks to common VT-x code [RFC,v6,053/104] KVM: TDX: Add load_mmu_pgd method for TDX [RFC,v6,054/104] KVM: TDX: don't request KVM_REQ_APIC_PAGE_RELOAD [RFC,v6,055/104] KVM: TDX: TDP MMU TDX support [RFC,v6,056/104,MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA [RFC,v6,057/104] KVM: x86/mmu: steal software usable git to record if GFN is for shared or not [RFC,v6,058/104] KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX [RFC,v6,059/104] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [RFC,v6,060/104,MARKER] The start of TDX KVM patch series: TD finalization [RFC,v6,061/104] KVM: TDX: Create initial guest memory [RFC,v6,062/104] KVM: TDX: Finalize VM initialization [RFC,v6,063/104,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [RFC,v6,064/104] KVM: TDX: Add helper assembly function to TDX vcpu [RFC,v6,065/104] KVM: TDX: Implement TDX vcpu enter/exit path [RFC,v6,066/104] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [RFC,v6,067/104] KVM: TDX: restore host xsave state when exit from the guest TD [RFC,v6,068/104] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [RFC,v6,069/104] KVM: TDX: restore user ret MSRs [RFC,v6,070/104,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [RFC,v6,071/104] KVM: TDX: complete interrupts after tdexit [RFC,v6,072/104] KVM: TDX: restore debug store when TD exit [RFC,v6,073/104] KVM: TDX: handle vcpu migration over logical processor [RFC,v6,074/104] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [RFC,v6,075/104] KVM: TDX: Add support for find pending IRQ in a protected local APIC [RFC,v6,076/104] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [RFC,v6,077/104] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [RFC,v6,078/104] KVM: TDX: Implement interrupt injection [RFC,v6,079/104] KVM: TDX: Implements vcpu request_immediate_exit [RFC,v6,080/104] KVM: TDX: Implement methods to inject NMI [RFC,v6,081/104] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [RFC,v6,082/104] KVM: VMX: Move NMI/exception handler to common helper [RFC,v6,083/104] KVM: x86: Split core of hypercall emulation to helper function [RFC,v6,084/104] KVM: TDX: Add a place holder to handle TDX VM exit [RFC,v6,085/104] KVM: TDX: handle EXIT_REASON_OTHER_SMI [RFC,v6,086/104] KVM: TDX: handle ept violation/misconfig exit [RFC,v6,087/104] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [RFC,v6,088/104] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [RFC,v6,089/104] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [RFC,v6,090/104] KVM: TDX: Handle TDX PV CPUID hypercall [RFC,v6,091/104] KVM: TDX: Handle TDX PV HLT hypercall [RFC,v6,092/104] KVM: TDX: Handle TDX PV port io hypercall [RFC,v6,093/104] KVM: TDX: Handle TDX PV MMIO hypercall [RFC,v6,094/104] KVM: TDX: Implement callbacks for MSR operations for TDX [RFC,v6,095/104] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [RFC,v6,096/104] KVM: TDX: Handle TDX PV report fatal error hypercall [RFC,v6,097/104] KVM: TDX: Handle TDX PV map_gpa hypercall [RFC,v6,098/104] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [RFC,v6,099/104] KVM: TDX: Silently discard SMI request [RFC,v6,100/104] KVM: TDX: Silently ignore INIT/SIPI [RFC,v6,101/104] KVM: TDX: Add methods to ignore accesses to CPU state [RFC,v6,102/104] Documentation/virtual/kvm: Document on Trust Domain Extensions(TDX) [RFC,v6,103/104] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [RFC,v6,104/104,MARKER] the end of (the first phase of) TDX KVM patch series

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index d02c0274777a..beff084d6cd3 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -316,6 +316,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); +int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end, + bool allow_private); + int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f4284e9cf9ec..497e2b9e58cc 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6317,6 +6317,112 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) } } +static int kvm_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, gfn_t start, gfn_t end) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_memslots *slots; + struct kvm_memslot_iter iter; + int ret = 0; + + /* No need to populate as mmu_map_gpa() handles single GPA. */ + if (!is_tdp_mmu_enabled(kvm)) + return 0; + + slots = __kvm_memslots(kvm, 0 /* only normal ram. not SMM. */); + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t s = max(start, memslot->base_gfn); + gfn_t e = min(end, memslot->base_gfn + memslot->npages); + + if (WARN_ON_ONCE(s >= e)) + continue; + + ret = kvm_tdp_mmu_populate_nonleaf(vcpu, kvm_gfn_private(kvm, s), + kvm_gfn_private(kvm, e), true, false); + if (ret) + break; + ret = kvm_tdp_mmu_populate_nonleaf(vcpu, kvm_gfn_shared(kvm, s), + kvm_gfn_shared(kvm, e), false, false); + if (ret) + break; + } + return ret; +} + +int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end, + bool allow_private) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_memslots *slots; + struct kvm_memslot_iter iter; + gfn_t start = *startp; + int ret; + + if (!kvm_gfn_shared_mask(kvm)) + return -EOPNOTSUPP; + + start = start & ~kvm_gfn_shared_mask(kvm); + end = end & ~kvm_gfn_shared_mask(kvm); + + /* + * Allocate S-EPT pages first so that the operations leaf SPTE entry + * can be done without memory allocation. + */ + while (true) { + ret = mmu_topup_memory_caches(vcpu, false); + if (ret) + return ret; + + mutex_lock(&kvm->slots_lock); + write_lock(&kvm->mmu_lock); + + ret = kvm_mmu_populate_nonleaf(vcpu, start, end); + if (!ret) + break; + + write_unlock(&kvm->mmu_lock); + mutex_unlock(&kvm->slots_lock); + if (ret == -EAGAIN) { + if (need_resched()) + cond_resched(); + continue; + } + return ret; + } + + slots = __kvm_memslots(kvm, 0 /* only normal ram. not SMM. */); + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t s = max(start, memslot->base_gfn); + gfn_t e = min(end, memslot->base_gfn + memslot->npages); + + if (WARN_ON_ONCE(s >= e)) + continue; + if (is_tdp_mmu_enabled(kvm)) { + ret = kvm_tdp_mmu_map_gpa(vcpu, &s, e, allow_private); + if (ret) { + start = s; + break; + } + } else { + ret = -EOPNOTSUPP; + break; + } + } + + write_unlock(&kvm->mmu_lock); + mutex_unlock(&kvm->slots_lock); + + if (ret == -EAGAIN) { + if (allow_private) + *startp = kvm_gfn_private(kvm, start); + else + *startp = kvm_gfn_shared(kvm, start); + } + return ret; +} +EXPORT_SYMBOL_GPL(kvm_mmu_map_gpa); + static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1d7642a0acc9..8bcb241cc12c 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -658,6 +658,13 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, } change.sept_page = sept_page; + /* + * SPTE_SHARED_MASK is only changed by map_gpa that obtains + * write lock of mmu_lock. + */ + WARN_ON(shared && + (spte_shared_mask(old_spte) != + spte_shared_mask(new_spte))); static_call(kvm_x86_handle_changed_private_spte)(kvm, &change); } } @@ -1303,7 +1310,8 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter, return 0; } -static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, struct tdp_iter *iter, bool account_nx) +static int tdp_mmu_populate_nonleaf( + struct kvm_vcpu *vcpu, struct tdp_iter *iter, bool account_nx, bool shared) { struct kvm_mmu_page *sp; int ret; @@ -1314,7 +1322,7 @@ static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, struct tdp_iter *iter sp = tdp_mmu_alloc_sp(vcpu, iter->is_private, false); tdp_mmu_init_child_sp(sp, iter); - ret = tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true); + ret = tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, shared); if (ret) tdp_mmu_free_sp(sp); return ret; @@ -1390,7 +1398,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) if (is_removed_spte(iter.old_spte)) break; - if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx)) + if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx, true)) break; } } @@ -2096,6 +2104,263 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, return spte_set; } +/* + * Allocate shadow page table for given gfn so that the following operations + * on sptes can be done without memory allocation. + */ +int kvm_tdp_mmu_populate_nonleaf( + struct kvm_vcpu *vcpu, gfn_t start, gfn_t end, bool is_private, bool shared) +{ + struct kvm *kvm = vcpu->kvm; + struct tdp_iter iter; + int ret = 0; + + kvm_lockdep_assert_mmu_lock_held(kvm, false); + rcu_read_lock(); + tdp_mmu_for_each_pte(iter, vcpu->arch.mmu, is_private, start, end) { + if (iter.level == PG_LEVEL_4K) + continue; + if (is_shadow_present_pte(iter.old_spte) && + is_large_pte(iter.old_spte)) { + /* TODO: large page support. */ + WARN_ON_ONCE(true); + return -ENOSYS; + } + + if (is_shadow_present_pte(iter.old_spte)) + continue; + + /* + * Guarantee that alloc_tdp_mmu_page() succees which + * assumes page allocation from cache always successes. + */ + if (vcpu->arch.mmu_page_header_cache.nobjs == 0 || + vcpu->arch.mmu_shadow_page_cache.nobjs == 0 || + vcpu->arch.mmu_private_sp_cache.nobjs == 0) { + ret = -EAGAIN; + break; + } + + /* + * write lock of mmu_lock is held. No other thread + * freezes SPTE. + */ + ret = tdp_mmu_populate_nonleaf(vcpu, &iter, false, shared); + if (ret) { + /* As write lock is held, this case sholdn't happen. */ + WARN_ON_ONCE(true); + break; + } + } + rcu_read_unlock(); + + return ret; +} + +typedef void (*update_spte_t)( + struct kvm *kvm, struct tdp_iter *iter, bool allow_private); + +static int kvm_tdp_mmu_update_range(struct kvm_vcpu *vcpu, bool is_private, + gfn_t start, gfn_t end, gfn_t *nextp, + update_spte_t fn, bool allow_private) +{ + struct kvm *kvm = vcpu->kvm; + struct tdp_iter iter; + int ret = 0; + + rcu_read_lock(); + tdp_mmu_for_each_pte(iter, vcpu->arch.mmu, is_private, start, end) { + if (iter.level == PG_LEVEL_4K) { + fn(kvm, &iter, allow_private); + continue; + } + + /* + * Which GPA is allowed, private or shared, is recorded in the + * granular of 4K in private leaf spte as SPTE_SHARED_MASK. + * Break large page into 4K. + */ + if (is_shadow_present_pte(iter.old_spte) && + is_large_pte(iter.old_spte)) { + /* + * TODO: large page support. + * Doesn't support large page for TDX now + */ + WARN_ON_ONCE(true); + tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); + iter.old_spte = kvm_tdp_mmu_read_spte(iter.sptep); + } + + if (!is_shadow_present_pte(iter.old_spte)) { + /* + * Guarantee that alloc_tdp_mmu_page() succees which + * assumes page allocation from cache always successes. + */ + if (vcpu->arch.mmu_page_header_cache.nobjs == 0 || + vcpu->arch.mmu_shadow_page_cache.nobjs == 0 || + vcpu->arch.mmu_private_sp_cache.nobjs == 0) { + ret = -EAGAIN; + break; + } + /* + * write lock of mmu_lock is held. No other thread + * freezes SPTE. + */ + ret = tdp_mmu_populate_nonleaf(vcpu, &iter, false, false); + if (ret) { + /* As write lock is held, this case sholdn't happen. */ + WARN_ON_ONCE(true); + break; + } + } + } + rcu_read_unlock(); + + if (ret == -EAGAIN) + *nextp = iter.next_last_level_gfn; + + return ret; +} + +static void kvm_tdp_mmu_update_shared_spte( + struct kvm *kvm, struct tdp_iter *iter, bool allow_private) +{ + u64 new_spte; + + WARN_ON(iter->is_private); + if (allow_private) { + /* Zap SPTE and clear SPTE_SHARED_MASK */ + new_spte = SHADOW_NONPRESENT_VALUE; + if (new_spte != iter->old_spte) + tdp_mmu_set_spte(kvm, iter, new_spte); + } else { + new_spte = iter->old_spte | SPTE_SHARED_MASK; + /* No side effect is needed */ + if (new_spte != iter->old_spte) + kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } +} + +static void kvm_tdp_mmu_update_private_spte( + struct kvm *kvm, struct tdp_iter *iter, bool allow_private) +{ + u64 new_spte; + + WARN_ON(!iter->is_private); + if (allow_private) { + new_spte = iter->old_spte & ~SPTE_SHARED_MASK; + /* No side effect is needed */ + if (new_spte != iter->old_spte) + kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } else { + if (is_shadow_present_pte(iter->old_spte)) { + /* Zap SPTE */ + new_spte = shadow_nonpresent_spte(iter->old_spte) | + SPTE_SHARED_MASK; + if (new_spte != iter->old_spte) + tdp_mmu_set_spte(kvm, iter, new_spte); + } else { + new_spte = iter->old_spte | SPTE_SHARED_MASK; + /* No side effect is needed */ + if (new_spte != iter->old_spte) + kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } + } +} + +/* + * Whether GPA is allowed to map private or shared is recorded in both private + * and shared leaf spte entry as SPTE_SHARED_MASK bit. They must match. + * private leaf spte entry + * - present: private mapping is allowed. (already mapped) + * - non-present: private mapping is allowed. + * - present | SPTE_SHARED_MASK: invalid state. + * - non-present | SPTE_SHARED_MASK: shared mapping is allowed. + * may or may not be mapped as shared. + * shared leaf spte entry + * - present: invalid state + * - non-present: private mapping is allowed. + * - present | SPTE_SHARED_MASK: shared mapping is allowed (already mapped) + * - non-present | SPTE_SHARED_MASK: shared mapping is allowed. + * + * state change of private spte: + * map_gpa(private): + * private EPT entry: clear SPTE_SHARED_MASK + * present: nop + * non-present: nop + * non-present | SPTE_SHARED_MASK -> non-present + * share EPT entry: zap and clear SPTE_SHARED_MASK + * any -> non-present + * map_gpa(shared): + * private EPT entry: zap and set SPTE_SHARED_MASK + * present -> non-present | SPTE_SHARED_MASK + * non-present -> non-present | SPTE_SHARED_MASK + * non-present | SPTE_SHARED_MASK: nop + * shared EPT entry: set SPTE_SHARED_MASK + * present | SPTE_SHARED_MASK: nop + * non-present -> non-present | SPTE_SHARED_MASK + * non-present | SPTE_SHARED_MASK: nop + * map(private GPA): + * private EPT entry: try to populate + * present: nop + * non-present -> present + * non-present | SPTE_SHARED_MASK: nop. looping on EPT violation + * shared EPT entry: nop + * map(shared GPA): + * private EPT entry: nop + * shared EPT entry: populate + * present | SPTE_SHARED_MASK: nop + * non-present | SPTE_SHARED_MASK -> present | SPTE_SHARED_MASK + * non-present: nop. looping on EPT violation + * zap(private GPA): + * private EPT entry: zap and keep SPTE_SHARED_MASK + * present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK + * non-present: nop as is_shadow_prsent_pte() is checked + * non-present | SPTE_SHARED_MASK: nop by is_shadow_present_pte() + * shared EPT entry: nop + * zap(shared GPA): + * private EPT entry: nop + * shared EPT entry: zap and keep SPTE_SHARED_MASK + * present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK + * non-present | SPTE_SHARED_MASK: nop + * non-present: nop. + */ +int kvm_tdp_mmu_map_gpa(struct kvm_vcpu *vcpu, + gfn_t *startp, gfn_t end, bool allow_private) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_mmu *mmu = vcpu->arch.mmu; + gfn_t start = *startp; + gfn_t next; + int ret = 0; + + lockdep_assert_held_write(&kvm->mmu_lock); + WARN_ON(start & kvm_gfn_shared_mask(kvm)); + WARN_ON(end & kvm_gfn_shared_mask(kvm)); + + if (!VALID_PAGE(mmu->root.hpa) || !VALID_PAGE(mmu->private_root_hpa)) + return -EINVAL; + + next = end; + ret = kvm_tdp_mmu_update_range( + vcpu, false, kvm_gfn_shared(kvm, start), kvm_gfn_shared(kvm, end), + &next, kvm_tdp_mmu_update_shared_spte, allow_private); + if (ret) { + kvm_flush_remote_tlbs_with_address(kvm, start, next - start); + return ret; + } + + ret = kvm_tdp_mmu_update_range( + vcpu, true, kvm_gfn_private(kvm, start), kvm_gfn_private(kvm, end), + &next, kvm_tdp_mmu_update_private_spte, allow_private); + if (ret == -EAGAIN) { + *startp = next; + end = *startp; + } + kvm_flush_remote_tlbs_with_address(kvm, start, end - start); + return ret; +} + /* * Return the level of the lowest level SPTE added to sptes. * That SPTE may be non-present. diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index d1655571eb2f..4d1c27911134 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -51,6 +51,11 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm, gfn_t start, gfn_t end, int target_level, bool shared); +int kvm_tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, gfn_t start, gfn_t end, + bool is_private, bool shared); +int kvm_tdp_mmu_map_gpa(struct kvm_vcpu *vcpu, + gfn_t *startp, gfn_t end, bool allow_private); + static inline void kvm_tdp_mmu_walk_lockless_begin(void) { rcu_read_lock();

[RFC,v6,058/104] KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX

Commit Message

Patch