[v19,110/130] KVM: TDX: Handle TDX PV MMIO hypercall

Message ID	a4421e0f2eafc17b4703c920936e32489d2382a3.1708933498.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (unknown [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44C70219EB; Mon, 26 Feb 2024 08:29:07 +0000 (UTC) From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com>, Kai Huang <kai.huang@intel.com>, chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com, Sean Christopherson <sean.j.christopherson@intel.com> Subject: [PATCH v19 110/130] KVM: TDX: Handle TDX PV MMIO hypercall Date: Mon, 26 Feb 2024 00:26:52 -0800 Message-Id: <a4421e0f2eafc17b4703c920936e32489d2382a3.1708933498.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1708933498.git.isaku.yamahata@intel.com> References: <cover.1708933498.git.isaku.yamahata@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro \| expand [v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro [v19,002/130] x86/virt/tdx: Move TDMR metadata fields map table to local variable [v19,003/130] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' [v19,004/130] x86/virt/tdx: Support global metadata read for all element sizes [v19,005/130] x86/virt/tdx: Export global metadata read infrastructure [v19,006/130] x86/virt/tdx: Export TDX KeyID information [v19,007/130] x86/virt/tdx: Export SEAMCALL functions [v19,008/130] x86/tdx: Warning with 32bit build shift-count-overflow [v19,009/130] KVM: x86: Add gmem hook for determining max NPT mapping level [v19,010/130] KVM: x86: Pass is_private to gmem hook of gmem_max_level [v19,011/130] KVM: Add new members to struct kvm_gfn_range to operate on [v19,012/130] KVM: x86/mmu: Pass around full 64-bit error code for the KVM page fault [v19,013/130] KVM: x86: Use PFERR_GUEST_ENC_MASK to indicate fault is private [v19,014/130] KVM: Add KVM vcpu ioctl to pre-populate guest memory [v19,015/130] KVM: Document KVM_MEMORY_MAPPING ioctl [v19,016/130] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [v19,017/130] KVM: x86: Implement kvm_arch_{, pre_}vcpu_memory_mapping() [v19,018/130] KVM: x86/mmu: Assume guest MMIOs are shared [v19,019/130] KVM: x86: Add is_vm_type_supported callback [v19,020/130] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [v19,021/130] KVM: x86/vmx: initialize loaded_vmcss_on_cpu in vmx_init() [v19,022/130] KVM: x86/vmx: Refactor KVM VMX module init/exit functions [v19,023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module [v19,024/130] KVM: TDX: Add placeholders for TDX VM/vcpu structure [v19,025/130] KVM: TDX: Make TDX VM type supported [v19,026/130,MARKER] The start of TDX KVM patch series: TDX architectural definitions [v19,027/130] KVM: TDX: Define TDX architectural definitions [v19,028/130] KVM: TDX: Add TDX "architectural" error codes [v19,029/130] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [v19,030/130] KVM: TDX: Add helper functions to print TDX SEAMCALL error [v19,031/130,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [v19,032/130] KVM: TDX: Add helper functions to allocate/free TDX private host key id [v19,033/130] KVM: TDX: Add helper function to read TDX metadata in array [v19,034/130] KVM: TDX: Get system-wide info about TDX module on initialization [v19,035/130] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [v19,036/130] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [v19,037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific [v19,038/130] KVM: TDX: create/destroy VM structure [v19,039/130] KVM: TDX: initialize VM with TDX specific parameters [v19,040/130] KVM: TDX: Make pmu_intel.c ignore guest TD case [v19,041/130] KVM: TDX: Refuse to unplug the last cpu on the package [v19,042/130,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [v19,043/130] KVM: TDX: create/free TDX vcpu structure [v19,044/130] KVM: TDX: Do TDX specific vcpu initialization [v19,045/130,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [v19,046/130] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA [v19,047/130,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [v19,048/130] KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values [v19,049/130] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE [v19,050/130] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE [v19,051/130] KVM: x86/mmu: Add Suppress VE bit to shadow_mmio_mask/shadow_present_mask [v19,052/130] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [v19,053/130] KVM: x86/mmu: Disallow fast page fault on private GPA [v19,054/130] KVM: VMX: Introduce test mode related to EPT violation VE [v19,055/130,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [v19,056/130] KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation [v19,057/130] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role [v19,058/130] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [v19,059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases [v19,060/130] KVM: x86/tdp_mmu: Apply mmu notifier callback to only shared GPA [v19,061/130] KVM: x86/tdp_mmu: Sprinkle __must_check [v19,062/130] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [v19,063/130,MARKER] The start of TDX KVM patch series: TDX EPT violation [v19,064/130] KVM: x86/mmu: Do not enable page track for TD guest [v19,065/130] KVM: VMX: Split out guts of EPT violation to common/exposed function [v19,066/130] KVM: TDX: Add accessors VMX VMCS helpers [v19,067/130] KVM: TDX: Add load_mmu_pgd method for TDX [v19,068/130] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT [v19,069/130] KVM: TDX: Require TDP MMU and mmio caching for TDX [v19,070/130] KVM: TDX: TDP MMU TDX support [v19,071/130] KVM: TDX: MTRR: implement get_mt_mask() for TDX [v19,072/130,MARKER] The start of TDX KVM patch series: TD finalization [v19,073/130] KVM: x86: Add hooks in kvm_arch_vcpu_memory_mapping() [v19,074/130] KVM: TDX: Create initial guest memory [v19,075/130] KVM: TDX: Extend memory measurement with initial guest memory [v19,076/130] KVM: TDX: Finalize VM initialization [v19,077/130,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [v19,078/130] KVM: TDX: Implement TDX vcpu enter/exit path [v19,079/130] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [v19,080/130] KVM: TDX: restore host xsave state when exit from the guest TD [v19,081/130] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [v19,082/130] KVM: TDX: restore user ret MSRs [v19,083/130] KVM: TDX: Add TSX_CTRL msr into uret_msrs list [v19,084/130,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [v19,085/130] KVM: TDX: Complete interrupts after tdexit [v19,086/130] KVM: TDX: restore debug store when TD exit [v19,087/130] KVM: TDX: handle vcpu migration over logical processor [v19,088/130] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [v19,089/130] KVM: TDX: Add support for find pending IRQ in a protected local APIC [v19,090/130] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [v19,091/130] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [v19,092/130] KVM: TDX: Implement interrupt injection [v19,093/130] KVM: TDX: Implements vcpu request_immediate_exit [v19,094/130] KVM: TDX: Implement methods to inject NMI [v19,095/130] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [v19,096/130] KVM: VMX: Move NMI/exception handler to common helper [v19,097/130] KVM: x86: Split core of hypercall emulation to helper function [v19,098/130] KVM: TDX: Add a place holder to handle TDX VM exit [v19,099/130] KVM: TDX: Handle vmentry failure for INTEL TD guest [v19,100/130] KVM: TDX: handle EXIT_REASON_OTHER_SMI [v19,101/130] KVM: TDX: handle ept violation/misconfig exit [v19,102/130] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [v19,103/130] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI [v19,104/130] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [v19,105/130] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [v19,106/130] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL [v19,107/130] KVM: TDX: Handle TDX PV CPUID hypercall [v19,108/130] KVM: TDX: Handle TDX PV HLT hypercall [v19,109/130] KVM: TDX: Handle TDX PV port io hypercall [v19,110/130] KVM: TDX: Handle TDX PV MMIO hypercall [v19,111/130] KVM: TDX: Implement callbacks for MSR operations for TDX [v19,112/130] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [v19,113/130] KVM: TDX: Handle MSR MTRRCap and MTRRDefType access [v19,114/130] KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL [v19,115/130] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [v19,116/130] KVM: TDX: Silently discard SMI request [v19,117/130] KVM: TDX: Silently ignore INIT/SIPI [v19,118/130] KVM: TDX: Add methods to ignore accesses to CPU state [v19,119/130] KVM: TDX: Add methods to ignore guest instruction emulation [v19,120/130] KVM: TDX: Add a method to ignore dirty logging [v19,121/130] KVM: TDX: Add methods to ignore VMX preemption timer [v19,122/130] KVM: TDX: Add methods to ignore accesses to TSC [v19,123/130] KVM: TDX: Ignore setting up mce [v19,124/130] KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch [v19,125/130] KVM: TDX: Add methods to ignore virtual apic related operation [v19,126/130] KVM: TDX: Inhibit APICv for TDX guest [v19,127/130] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) [v19,128/130] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [v19,129/130] RFC: KVM: x86: Add x86 callback to check cpuid [v19,130/130] RFC: KVM: x86, TDX: Add check for KVM_SET_CPUID2

On 2/26/2024 4:26 PM, isaku.yamahata@intel.com wrote: > From: Sean Christopherson <sean.j.christopherson@intel.com> > > Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO > hypercall to the KVM backend functions. > > kvm_io_bus_read/write() searches KVM device emulated in kernel of the given > MMIO address and emulates the MMIO. As TDX PV MMIO also needs it, export > kvm_io_bus_read(). kvm_io_bus_write() is already exported. TDX PV MMIO > emulates some of MMIO itself. To add trace point consistently with x86 > kvm, export kvm_mmio tracepoint. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 1 + > virt/kvm/kvm_main.c | 2 + > 3 files changed, 117 insertions(+) > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 55fc6cc6c816..389bb95d2af0 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1217,6 +1217,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu) > return ret; > } > > +static int tdx_complete_mmio(struct kvm_vcpu *vcpu) > +{ > + unsigned long val = 0; > + gpa_t gpa; > + int size; > + > + KVM_BUG_ON(vcpu->mmio_needed != 1, vcpu->kvm); > + vcpu->mmio_needed = 0; > + > + if (!vcpu->mmio_is_write) { > + gpa = vcpu->mmio_fragments[0].gpa; > + size = vcpu->mmio_fragments[0].len; > + > + memcpy(&val, vcpu->run->mmio.data, size); > + tdvmcall_set_return_val(vcpu, val); > + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); > + } > + return 1; > +} > + > +static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size, > + unsigned long val) > +{ > + if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && > + kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val)) > + return -EOPNOTSUPP; > + > + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); > + return 0; > +} > + > +static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size) > +{ > + unsigned long val; > + > + if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && > + kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val)) > + return -EOPNOTSUPP; > + > + tdvmcall_set_return_val(vcpu, val); > + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); > + return 0; > +} > + > +static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) > +{ > + struct kvm_memory_slot *slot; > + int size, write, r; > + unsigned long val; > + gpa_t gpa; > + > + KVM_BUG_ON(vcpu->mmio_needed, vcpu->kvm); > + > + size = tdvmcall_a0_read(vcpu); > + write = tdvmcall_a1_read(vcpu); > + gpa = tdvmcall_a2_read(vcpu); > + val = write ? tdvmcall_a3_read(vcpu) : 0; > + > + if (size != 1 && size != 2 && size != 4 && size != 8) > + goto error; > + if (write != 0 && write != 1) > + goto error; > + > + /* Strip the shared bit, allow MMIO with and without it set. */ Based on the discussion https://lore.kernel.org/all/ZcUO5sFEAIH68JIA@google.com/ Do we still allow the MMIO without shared bit? > + gpa = gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(vcpu->kvm)); > + > + if (size > 8u || ((gpa + size - 1) ^ gpa) & PAGE_MASK) "size > 8u" can be removed, since based on the check of size above, it can't be greater than 8. > + goto error; > + > + slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa)); > + if (slot && !(slot->flags & KVM_MEMSLOT_INVALID)) > + goto error; > + > + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { Should this be checked for write first? I check the handle_ept_misconfig() in VMX, it doesn't check write first neither. Functionally, it should be OK since guest will not read the address range of fast mmio. So the read case will be filtered out by ioeventfd_write(). But it has take a long way to get to ioeventfd_write(). Isn't it more efficient to check write first? > + trace_kvm_fast_mmio(gpa); > + return 1; > + } > + > + if (write) > + r = tdx_mmio_write(vcpu, gpa, size, val); > + else > + r = tdx_mmio_read(vcpu, gpa, size); > + if (!r) { > + /* Kernel completed device emulation. */ > + tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); > + return 1; > + } > + > + /* Request the device emulation to userspace device model. */ > + vcpu->mmio_needed = 1; > + vcpu->mmio_is_write = write; > + vcpu->arch.complete_userspace_io = tdx_complete_mmio; > + > + vcpu->run->mmio.phys_addr = gpa; > + vcpu->run->mmio.len = size; > + vcpu->run->mmio.is_write = write; > + vcpu->run->exit_reason = KVM_EXIT_MMIO; > + > + if (write) { > + memcpy(vcpu->run->mmio.data, &val, size); > + } else { > + vcpu->mmio_fragments[0].gpa = gpa; > + vcpu->mmio_fragments[0].len = size; > + trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL); > + } > + return 0; > + > +error: > + tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); > + return 1; > +} > + > static int handle_tdvmcall(struct kvm_vcpu *vcpu) > { > if (tdvmcall_exit_type(vcpu)) > @@ -1229,6 +1341,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) > return tdx_emulate_hlt(vcpu); > case EXIT_REASON_IO_INSTRUCTION: > return tdx_emulate_io(vcpu); > + case EXIT_REASON_EPT_VIOLATION: > + return tdx_emulate_mmio(vcpu); > default: > break; > } > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 03950368d8db..d5b18cad9dcd 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13975,6 +13975,7 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); > +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index e27c22449d85..bc14e1f2610c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2689,6 +2689,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn > > return NULL; > } > +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); > > bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) > { > @@ -5992,6 +5993,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, > r = __kvm_io_bus_read(vcpu, bus, &range, val); > return r < 0 ? r : 0; > } > +EXPORT_SYMBOL_GPL(kvm_io_bus_read); > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, > int len, struct kvm_io_device *dev)

[v19,110/130] KVM: TDX: Handle TDX PV MMIO hypercall

Commit Message

Comments

Patch