[v18,023/121] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

Message ID	ed33ebe29b231e8e657cd610a983fa603b10f530.1705965634.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A6F259168; Mon, 22 Jan 2024 23:55:10 +0000 (UTC) From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com>, Kai Huang <kai.huang@intel.com>, chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v18 023/121] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific Date: Mon, 22 Jan 2024 15:52:59 -0800 Message-Id: <ed33ebe29b231e8e657cd610a983fa603b10f530.1705965634.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1705965634.git.isaku.yamahata@intel.com> References: <cover.1705965634.git.isaku.yamahata@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	KVM TDX basic feature support \| expand [v18,000/121] KVM TDX basic feature support [v18,001/121] x86/virt/tdx: Export TDX KeyID information [v18,002/121] x86/virt/tdx: Export SEAMCALL functions [v18,003/121] KVM: x86: Add is_vm_type_supported callback [v18,004/121] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [v18,005/121] KVM: x86/vmx: initialize loaded_vmcss_on_cpu in vmx_hardware_setup() [v18,006/121] KVM: x86/vmx: Refactor KVM VMX module init/exit functions [v18,007/121] KVM: VMX: Reorder vmx initialization with kvm vendor initialization [v18,008/121] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module [v18,009/121] KVM: TDX: Add placeholders for TDX VM/vcpu structure [v18,010/121] KVM: TDX: Make TDX VM type supported [v18,011/121,MARKER] The start of TDX KVM patch series: TDX architectural definitions [v18,012/121] KVM: TDX: Define TDX architectural definitions [v18,013/121] KVM: TDX: Add TDX "architectural" error codes [v18,014/121] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [v18,015/121] KVM: TDX: Retry SEAMCALL on the lack of entropy error [v18,016/121] KVM: TDX: Add helper functions to print TDX SEAMCALL error [v18,017/121,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [v18,018/121] KVM: TDX: Add helper functions to allocate/free TDX private host key id [v18,019/121] KVM: TDX: Add helper function to read TDX metadata in array [v18,020/121] x86/virt/tdx: Get system-wide info about TDX module on initialization [v18,021/121] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [v18,022/121] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [v18,023/121] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific [v18,024/121] KVM: TDX: create/destroy VM structure [v18,025/121] KVM: TDX: initialize VM with TDX specific parameters [v18,026/121] KVM: TDX: Make pmu_intel.c ignore guest TD case [v18,027/121] KVM: TDX: Refuse to unplug the last cpu on the package [v18,028/121,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [v18,029/121] KVM: TDX: create/free TDX vcpu structure [v18,030/121] KVM: TDX: Do TDX specific vcpu initialization [v18,031/121,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [v18,032/121] KVM: x86/mmu: introduce config for PRIVATE KVM MMU [v18,033/121] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA [v18,034/121,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [v18,035/121] KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values [v18,036/121] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE [v18,037/121] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE [v18,038/121] KVM: x86/mmu: Add Suppress VE bit to shadow_mmio_mask/shadow_present_mask [v18,039/121] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [v18,040/121] KVM: x86/mmu: Disallow fast page fault on private GPA [v18,041/121] KVM: x86/mmu: Allow per-VM override of the TDP max page level [v18,042/121] KVM: VMX: Introduce test mode related to EPT violation VE [v18,043/121,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [v18,044/121] KVM: x86/mmu: Assume guest MMIOs are shared [v18,045/121] KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation [v18,046/121] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role [v18,047/121] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [v18,048/121] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases [v18,049/121] KVM: x86/tdp_mmu: Apply mmu notifier callback to only shared GPA [v18,050/121] KVM: x86/tdp_mmu: Sprinkle __must_check [v18,051/121] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [v18,052/121,MARKER] The start of TDX KVM patch series: TDX EPT violation [v18,053/121] KVM: x86/mmu: TDX: Do not enable page track for TD guest [v18,054/121] KVM: VMX: Split out guts of EPT violation to common/exposed function [v18,055/121] KVM: VMX: Move setting of EPT MMU masks to common VT-x code [v18,056/121] KVM: TDX: Add accessors VMX VMCS helpers [v18,057/121] KVM: TDX: Add load_mmu_pgd method for TDX [v18,058/121] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT [v18,059/121] KVM: TDX: Require TDP MMU and mmio caching for TDX [v18,060/121] KVM: TDX: TDP MMU TDX support [v18,061/121] KVM: TDX: MTRR: implement get_mt_mask() for TDX [v18,062/121,MARKER] The start of TDX KVM patch series: TD finalization [v18,063/121] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [v18,064/121] KVM: TDX: Create initial guest memory [v18,065/121] KVM: TDX: Finalize VM initialization [v18,066/121,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [v18,067/121] KVM: TDX: Implement TDX vcpu enter/exit path [v18,068/121] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [v18,069/121] KVM: TDX: restore host xsave state when exit from the guest TD [v18,070/121] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [v18,071/121] KVM: TDX: restore user ret MSRs [v18,072/121] KVM: TDX: Add TSX_CTRL msr into uret_msrs list [v18,073/121,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [v18,074/121] KVM: TDX: complete interrupts after tdexit [v18,075/121] KVM: TDX: restore debug store when TD exit [v18,076/121] KVM: TDX: handle vcpu migration over logical processor [v18,077/121] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [v18,078/121] KVM: TDX: Add support for find pending IRQ in a protected local APIC [v18,079/121] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [v18,080/121] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [v18,081/121] KVM: TDX: Implement interrupt injection [v18,082/121] KVM: TDX: Implements vcpu request_immediate_exit [v18,083/121] KVM: TDX: Implement methods to inject NMI [v18,084/121] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [v18,085/121] KVM: VMX: Move NMI/exception handler to common helper [v18,086/121] KVM: x86: Split core of hypercall emulation to helper function [v18,087/121] KVM: TDX: Add a place holder to handle TDX VM exit [v18,088/121] KVM: TDX: Handle vmentry failure for INTEL TD guest [v18,089/121] KVM: TDX: handle EXIT_REASON_OTHER_SMI [v18,090/121] KVM: TDX: handle ept violation/misconfig exit [v18,091/121] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [v18,092/121] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI [v18,093/121] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [v18,094/121] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [v18,095/121] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL [v18,096/121] KVM: TDX: Handle TDX PV CPUID hypercall [v18,097/121] KVM: TDX: Handle TDX PV HLT hypercall [v18,098/121] KVM: TDX: Handle TDX PV port io hypercall [v18,099/121] KVM: TDX: Handle TDX PV MMIO hypercall [v18,100/121] KVM: TDX: Implement callbacks for MSR operations for TDX [v18,101/121] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [v18,102/121] KVM: TDX: Handle MSR MTRRCap and MTRRDefType access [v18,103/121] KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL [v18,104/121] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [v18,105/121] KVM: TDX: Silently discard SMI request [v18,106/121] KVM: TDX: Silently ignore INIT/SIPI [v18,107/121] KVM: TDX: Add methods to ignore accesses to CPU state [v18,108/121] KVM: TDX: Add methods to ignore guest instruction emulation [v18,109/121] KVM: TDX: Add a method to ignore dirty logging [v18,110/121] KVM: TDX: Add methods to ignore VMX preemption timer [v18,111/121] KVM: TDX: Add methods to ignore accesses to TSC [v18,112/121] KVM: TDX: Ignore setting up mce [v18,113/121] KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch [v18,114/121] KVM: TDX: Add methods to ignore virtual apic related operation [v18,115/121] KVM: TDX: Inhibit APICv for TDX guest [v18,116/121] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) [v18,117/121] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [v18,118/121] KVM: TDX: Add hint TDX ioctl to release Secure-EPT [v18,119/121] RFC: KVM: x86: Add x86 callback to check cpuid [v18,120/121] RFC: KVM: x86, TDX: Add check for KVM_SET_CPUID2 [v18,121/121,MARKER] the end of (the first phase of) TDX KVM patch series

On 2/1/2024 2:16 PM, Yuan Yao wrote: > On Wed, Jan 24, 2024 at 09:17:15AM +0800, Binbin Wu wrote: >> >> On 1/23/2024 7:52 AM, isaku.yamahata@intel.com wrote: >>> From: Isaku Yamahata <isaku.yamahata@intel.com> >>> >>> TDX has its own limitation on the maximum number of vcpus that the guest >>> can accommodate. Allow x86 kvm backend to implement its own KVM_ENABLE_CAP >>> handler and implement TDX backend for KVM_CAP_MAX_VCPUS. user space VMM, >>> e.g. qemu, can specify its value instead of KVM_MAX_VCPUS. >> For legacy VM, KVM just provides the interface to query the max_vcpus. >> Why TD needs to provide a interface for userspace to set the limitation? >> What's the scenario? > I think the reason is TDH.MNG.INIT needs it: > > TD_PARAMS: > MAX_VCPUS: > offset: 16 bytes. > type: Unsigned 16b Integer. > size: 2. > Description: Maximum number of VCPUs. Thanks for explanation. I am also wondering if this info can be passed via KVM_TDX_INIT_VM. Because userspace is allowed to set the value no greater than min(KVM_MAX_VCPUS, TDX_MAX_VCPUS), providing the extra cap KVM_CAP_MAX_VCPUS doesn't make more restriction comparing to providing it in KVM_TDX_INIT_VM. > > May better to clarify this in the commit yet. > >> >>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> >>> --- >>> v18: >>> - use TDX instead of "x86, tdx" in subject >>> - use min(max_vcpu, TDX_MAX_VCPU) instead of >>> min3(max_vcpu, KVM_MAX_VCPU, TDX_MAX_VCPU) >>> - make "if (KVM_MAX_VCPU) and if (TDX_MAX_VCPU)" into one if statement >>> --- >>> arch/x86/include/asm/kvm-x86-ops.h | 2 ++ >>> arch/x86/include/asm/kvm_host.h | 2 ++ >>> arch/x86/kvm/vmx/main.c | 22 ++++++++++++++++++++++ >>> arch/x86/kvm/vmx/tdx.c | 29 +++++++++++++++++++++++++++++ >>> arch/x86/kvm/vmx/x86_ops.h | 5 +++++ >>> arch/x86/kvm/x86.c | 4 ++++ >>> 6 files changed, 64 insertions(+) >>> >>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h >>> index 943b21b8b106..2f976c0f3116 100644 >>> --- a/arch/x86/include/asm/kvm-x86-ops.h >>> +++ b/arch/x86/include/asm/kvm-x86-ops.h >>> @@ -21,6 +21,8 @@ KVM_X86_OP(hardware_unsetup) >>> KVM_X86_OP(has_emulated_msr) >>> KVM_X86_OP(vcpu_after_set_cpuid) >>> KVM_X86_OP(is_vm_type_supported) >>> +KVM_X86_OP_OPTIONAL(max_vcpus); >>> +KVM_X86_OP_OPTIONAL(vm_enable_cap) >>> KVM_X86_OP(vm_init) >>> KVM_X86_OP_OPTIONAL(vm_destroy) >>> KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 26f4668b0273..db44a92e5659 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -1602,7 +1602,9 @@ struct kvm_x86_ops { >>> void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu); >>> bool (*is_vm_type_supported)(unsigned long vm_type); >>> + int (*max_vcpus)(struct kvm *kvm); >>> unsigned int vm_size; >>> + int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap); >>> int (*vm_init)(struct kvm *kvm); >>> void (*vm_destroy)(struct kvm *kvm); >>> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c >>> index 50da807d7aea..4611f305a450 100644 >>> --- a/arch/x86/kvm/vmx/main.c >>> +++ b/arch/x86/kvm/vmx/main.c >>> @@ -6,6 +6,7 @@ >>> #include "nested.h" >>> #include "pmu.h" >>> #include "tdx.h" >>> +#include "tdx_arch.h" >>> static bool enable_tdx __ro_after_init; >>> module_param_named(tdx, enable_tdx, bool, 0444); >>> @@ -16,6 +17,17 @@ static bool vt_is_vm_type_supported(unsigned long type) >>> (enable_tdx && tdx_is_vm_type_supported(type)); >>> } >>> +static int vt_max_vcpus(struct kvm *kvm) >>> +{ >>> + if (!kvm) >>> + return KVM_MAX_VCPUS; >>> + >>> + if (is_td(kvm)) >>> + return min(kvm->max_vcpus, TDX_MAX_VCPUS); >>> + >>> + return kvm->max_vcpus; >>> +} >>> + >>> static int vt_hardware_enable(void) >>> { >>> int ret; >>> @@ -54,6 +66,14 @@ static void vt_hardware_unsetup(void) >>> vmx_hardware_unsetup(); >>> } >>> +static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) >>> +{ >>> + if (is_td(kvm)) >>> + return tdx_vm_enable_cap(kvm, cap); >>> + >>> + return -EINVAL; >>> +} >>> + >>> static int vt_vm_init(struct kvm *kvm) >>> { >>> if (is_td(kvm)) >>> @@ -91,7 +111,9 @@ struct kvm_x86_ops vt_x86_ops __initdata = { >>> .has_emulated_msr = vmx_has_emulated_msr, >>> .is_vm_type_supported = vt_is_vm_type_supported, >>> + .max_vcpus = vt_max_vcpus, >>> .vm_size = sizeof(struct kvm_vmx), >>> + .vm_enable_cap = vt_vm_enable_cap, >>> .vm_init = vt_vm_init, >>> .vm_destroy = vmx_vm_destroy, >>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c >>> index 8c463407f8a8..876ad7895b88 100644 >>> --- a/arch/x86/kvm/vmx/tdx.c >>> +++ b/arch/x86/kvm/vmx/tdx.c >>> @@ -100,6 +100,35 @@ struct tdx_info { >>> /* Info about the TDX module. */ >>> static struct tdx_info *tdx_info; >>> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) >>> +{ >>> + int r; >>> + >>> + switch (cap->cap) { >>> + case KVM_CAP_MAX_VCPUS: { >>> + if (cap->flags || cap->args[0] == 0) >>> + return -EINVAL; >>> + if (cap->args[0] > KVM_MAX_VCPUS || >>> + cap->args[0] > TDX_MAX_VCPUS) >>> + return -E2BIG; >>> + >>> + mutex_lock(&kvm->lock); >>> + if (kvm->created_vcpus) >>> + r = -EBUSY; >>> + else { >>> + kvm->max_vcpus = cap->args[0]; >>> + r = 0; >>> + } >>> + mutex_unlock(&kvm->lock); >>> + break; >>> + } >>> + default: >>> + r = -EINVAL; >>> + break; >>> + } >>> + return r; >>> +} >>> + >>> static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) >>> { >>> struct kvm_tdx_capabilities __user *user_caps; >>> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h >>> index 6e238142b1e8..3a3be66888da 100644 >>> --- a/arch/x86/kvm/vmx/x86_ops.h >>> +++ b/arch/x86/kvm/vmx/x86_ops.h >>> @@ -139,12 +139,17 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops); >>> void tdx_hardware_unsetup(void); >>> bool tdx_is_vm_type_supported(unsigned long type); >>> +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); >>> int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); >>> #else >>> static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return -EOPNOTSUPP; } >>> static inline void tdx_hardware_unsetup(void) {} >>> static inline bool tdx_is_vm_type_supported(unsigned long type) { return false; } >>> +static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) >>> +{ >>> + return -EINVAL; >>> +}; >>> static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } >>> #endif >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index dd3a23d56621..a1389ddb1b33 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -4726,6 +4726,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>> break; >>> case KVM_CAP_MAX_VCPUS: >>> r = KVM_MAX_VCPUS; >>> + if (kvm_x86_ops.max_vcpus) >>> + r = static_call(kvm_x86_max_vcpus)(kvm); >>> break; >>> case KVM_CAP_MAX_VCPU_ID: >>> r = KVM_MAX_VCPU_IDS; >>> @@ -6683,6 +6685,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >>> break; >>> default: >>> r = -EINVAL; >>> + if (kvm_x86_ops.vm_enable_cap) >>> + r = static_call(kvm_x86_vm_enable_cap)(kvm, cap); >>> break; >>> } >>> return r; >>

[v18,023/121] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

Commit Message

Comments

Patch