Message ID | 20241201035358.2193078-2-binbin.wu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: TDX: TDX hypercalls may exit to userspace | expand |
On Sun, Dec 01, 2024 at 11:53:50AM +0800, Binbin Wu wrote: >From: Isaku Yamahata <isaku.yamahata@intel.com> > >Introduce the wiring for handling TDX VM exits by implementing the >callbacks .get_exit_info(), and .handle_exit(). Additionally, add >error handling during the TDX VM exit flow, and add a place holder >to handle various exit reasons. Add helper functions to retrieve >exit information, exit qualifications, and more. > >Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* >operations for secure EPT or TD EPOCH. If contention occurs, the return >value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU >to attempt re-entry into the guest via the fast path. > >Error Handling: The following scenarios will return to userspace with >KVM_EXIT_INTERNAL_ERROR. >- TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the > CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX > isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM > firmware is not loaded or disabled. >- TDX_ERROR: This indicates some check failed in the TDX module, preventing > the vCPU from running. >- TDX_NON_RECOVERABLE: Set by the TDX module when the error is > non-recoverable, indicating that the TDX guest is dead or the vCPU is > disabled. This also covers failed_vmentry case, which must have > TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled. > An exception is the triple fault, which also sets TDX_NON_RECOVERABLE > but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX > case. >- Any unhandled VM exit reason will also return to userspace with > KVM_EXIT_INTERNAL_ERROR. > >Suggested-by: Sean Christopherson <seanjc@google.com> >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> >Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> >Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> [..] > fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) > { > struct vcpu_tdx *tdx = to_tdx(vcpu); >@@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) > tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED; > > vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; >+ >+ if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) >+ return EXIT_FASTPATH_NONE; >+ >+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY))) >+ kvm_machine_check(); I was wandering if EXIT_REASON_MCE_DURING_VMENTRY should be handled in the switch-case in tdx_handle_exit() because I saw there is a dedicated handler for VMX. But looks EXIT_REASON_MCE_DURING_VMENTRY is a kind of VMentry failure. So, it won't reach that switch-case. And, VMX's handler for EXIT_REASON_MCE_DURING_VMENTRY is actually dead code and can be removed. >+ > trace_kvm_exit(vcpu, KVM_ISA_VMX); > >- return EXIT_FASTPATH_NONE; >+ if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry)) >+ return EXIT_FASTPATH_NONE; >+ >+ return tdx_exit_handlers_fastpath(vcpu); >+} >+ >+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) >+{ >+ vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; >+ vcpu->mmio_needed = 0; >+ return 0; > } > > void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) >@@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, > return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); > } > >+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) >+{ >+ struct vcpu_tdx *tdx = to_tdx(vcpu); >+ u64 vp_enter_ret = tdx->vp_enter_ret; >+ union vmx_exit_reason exit_reason; >+ >+ if (fastpath != EXIT_FASTPATH_NONE) >+ return 1; >+ >+ /* >+ * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and >+ * TDX_SEAMCALL_VMFAILINVALID. >+ */ >+ if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { >+ KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); >+ goto unhandled_exit; >+ } >+ >+ /* >+ * Without off-TD debug enabled, failed_vmentry case must have >+ * TDX_NON_RECOVERABLE set. >+ */ >+ if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { >+ /* Triple fault is non-recoverable. */ >+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) >+ return tdx_handle_triple_fault(vcpu); >+ >+ kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", >+ vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, >+ set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); >+ goto unhandled_exit; >+ } >+ >+ /* From now, the seamcall status should be TDX_SUCCESS. */ >+ WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); >+ exit_reason = tdexit_exit_reason(vcpu); >+ >+ switch (exit_reason.basic) { >+ default: >+ break; >+ } >+ >+unhandled_exit: >+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; >+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; >+ vcpu->run->internal.ndata = 2; >+ vcpu->run->internal.data[0] = vp_enter_ret; >+ vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; >+ return 0; >+} >+ >+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, >+ u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) >+{ >+ struct vcpu_tdx *tdx = to_tdx(vcpu); >+ >+ if (tdx_has_exit_reason(vcpu)) { >+ /* >+ * Encode some useful info from the the 64 bit return code >+ * into the 32 bit exit 'reason'. If the VMX exit reason is >+ * valid, just set it to those bits. >+ */ >+ *reason = (u32)tdx->vp_enter_ret; >+ *info1 = tdexit_exit_qual(vcpu); >+ *info2 = tdexit_ext_exit_qual(vcpu); >+ } else { >+ /* >+ * When the VMX exit reason in vp_enter_ret is not valid, >+ * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to >+ * mean the vmexit code is not valid. Set the other bits to >+ * try to avoid picking a value that may someday be a valid >+ * VMX exit code. >+ */ >+ *reason = 0xFFFFFFFF; >+ *info1 = 0; >+ *info2 = 0; >+ } >+ >+ *intr_info = tdexit_intr_info(vcpu); If there is no valid exit reason, shouldn't intr_info be set to 0? >+ *error_code = 0; >+} >+
On 12/9/2024 7:21 PM, Chao Gao wrote: > On Sun, Dec 01, 2024 at 11:53:50AM +0800, Binbin Wu wrote: >> From: Isaku Yamahata <isaku.yamahata@intel.com> >> >> Introduce the wiring for handling TDX VM exits by implementing the >> callbacks .get_exit_info(), and .handle_exit(). Additionally, add >> error handling during the TDX VM exit flow, and add a place holder >> to handle various exit reasons. Add helper functions to retrieve >> exit information, exit qualifications, and more. >> >> Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* >> operations for secure EPT or TD EPOCH. If contention occurs, the return >> value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU >> to attempt re-entry into the guest via the fast path. >> >> Error Handling: The following scenarios will return to userspace with >> KVM_EXIT_INTERNAL_ERROR. >> - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the >> CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX >> isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM >> firmware is not loaded or disabled. >> - TDX_ERROR: This indicates some check failed in the TDX module, preventing >> the vCPU from running. >> - TDX_NON_RECOVERABLE: Set by the TDX module when the error is >> non-recoverable, indicating that the TDX guest is dead or the vCPU is >> disabled. This also covers failed_vmentry case, which must have >> TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled. >> An exception is the triple fault, which also sets TDX_NON_RECOVERABLE >> but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX >> case. >> - Any unhandled VM exit reason will also return to userspace with >> KVM_EXIT_INTERNAL_ERROR. >> >> Suggested-by: Sean Christopherson <seanjc@google.com> >> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> >> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> >> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > Reviewed-by: Chao Gao <chao.gao@intel.com> > > [..] > >> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) >> { >> struct vcpu_tdx *tdx = to_tdx(vcpu); >> @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) >> tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED; >> >> vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; >> + >> + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) >> + return EXIT_FASTPATH_NONE; >> + >> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY))) >> + kvm_machine_check(); > I was wandering if EXIT_REASON_MCE_DURING_VMENTRY should be handled in the > switch-case in tdx_handle_exit() because I saw there is a dedicated handler > for VMX. But looks EXIT_REASON_MCE_DURING_VMENTRY is a kind of VMentry > failure. So, it won't reach that switch-case. Yes, similar to VMX, for TDX, EXIT_REASON_MCE_DURING_VMENTRY will be captured by failed_vmentry, which will have TDX_NON_RECOVERABLE set and handled before the switch-case. > And, VMX's handler for > EXIT_REASON_MCE_DURING_VMENTRY is actually dead code and can be removed. I think so, for VMX, the exit_reason.failed_vmentry case will return to userspace, it won't reach the handler for EXIT_REASON_MCE_DURING_VMENTRY. > >> + >> trace_kvm_exit(vcpu, KVM_ISA_VMX); >> >> - return EXIT_FASTPATH_NONE; >> + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry)) >> + return EXIT_FASTPATH_NONE; >> + >> + return tdx_exit_handlers_fastpath(vcpu); >> +} >> + >> +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) >> +{ >> + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; >> + vcpu->mmio_needed = 0; >> + return 0; >> } >> >> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) >> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, >> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); >> } >> >> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) >> +{ >> + struct vcpu_tdx *tdx = to_tdx(vcpu); >> + u64 vp_enter_ret = tdx->vp_enter_ret; >> + union vmx_exit_reason exit_reason; >> + >> + if (fastpath != EXIT_FASTPATH_NONE) >> + return 1; >> + >> + /* >> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and >> + * TDX_SEAMCALL_VMFAILINVALID. >> + */ >> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { >> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); >> + goto unhandled_exit; >> + } >> + >> + /* >> + * Without off-TD debug enabled, failed_vmentry case must have >> + * TDX_NON_RECOVERABLE set. >> + */ >> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { >> + /* Triple fault is non-recoverable. */ >> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) >> + return tdx_handle_triple_fault(vcpu); >> + >> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", >> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, >> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); >> + goto unhandled_exit; >> + } >> + >> + /* From now, the seamcall status should be TDX_SUCCESS. */ >> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); >> + exit_reason = tdexit_exit_reason(vcpu); >> + >> + switch (exit_reason.basic) { >> + default: >> + break; >> + } >> + >> +unhandled_exit: >> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; >> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; >> + vcpu->run->internal.ndata = 2; >> + vcpu->run->internal.data[0] = vp_enter_ret; >> + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; >> + return 0; >> +} >> + >> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, >> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) >> +{ >> + struct vcpu_tdx *tdx = to_tdx(vcpu); >> + >> + if (tdx_has_exit_reason(vcpu)) { >> + /* >> + * Encode some useful info from the the 64 bit return code >> + * into the 32 bit exit 'reason'. If the VMX exit reason is >> + * valid, just set it to those bits. >> + */ >> + *reason = (u32)tdx->vp_enter_ret; >> + *info1 = tdexit_exit_qual(vcpu); >> + *info2 = tdexit_ext_exit_qual(vcpu); >> + } else { >> + /* >> + * When the VMX exit reason in vp_enter_ret is not valid, >> + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to >> + * mean the vmexit code is not valid. Set the other bits to >> + * try to avoid picking a value that may someday be a valid >> + * VMX exit code. >> + */ >> + *reason = 0xFFFFFFFF; >> + *info1 = 0; >> + *info2 = 0; >> + } >> + >> + *intr_info = tdexit_intr_info(vcpu); > If there is no valid exit reason, shouldn't intr_info be set to 0? Yes. Will fix it. Thanks! > >> + *error_code = 0; >> +} >> +
On 12/1/2024 11:53 AM, Binbin Wu wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Introduce the wiring for handling TDX VM exits by implementing the > callbacks .get_exit_info(), and .handle_exit(). Additionally, add > error handling during the TDX VM exit flow, and add a place holder > to handle various exit reasons. Add helper functions to retrieve > exit information, exit qualifications, and more. > > Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* > operations for secure EPT or TD EPOCH. If contention occurs, the return > value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU > to attempt re-entry into the guest via the fast path. > > Error Handling: The following scenarios will return to userspace with > KVM_EXIT_INTERNAL_ERROR. > - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the > CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX > isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM > firmware is not loaded or disabled. > - TDX_ERROR: This indicates some check failed in the TDX module, preventing > the vCPU from running. > - TDX_NON_RECOVERABLE: Set by the TDX module when the error is > non-recoverable, indicating that the TDX guest is dead or the vCPU is > disabled. This also covers failed_vmentry case, which must have > TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled. > An exception is the triple fault, which also sets TDX_NON_RECOVERABLE > but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX > case. > - Any unhandled VM exit reason will also return to userspace with > KVM_EXIT_INTERNAL_ERROR. > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > --- > Hypercalls exit to userspace breakout: > - Dropped Paolo's Reviewed-by since the change is not subtle. > - Mention addition of .get_exit_info() handler in changelog. (Binbin) > - tdh_sept_seamcall() -> tdx_seamcall_sept() in comments. (Binbin) > - Do not open code TDX_ERROR_SEPT_BUSY. (Binbin) > - "TDH.VP.ENTRY" -> "TDH.VP.ENTER". (Binbin) > - Remove the use of union tdx_exit_reason. (Sean) > https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/ > - Add tdx_check_exit_reason() to check a VMX exit reason against the > status code of TDH.VP.ENTER. > - Move the handling of TDX_ERROR_SEPT_BUSY and (TDX_OPERAND_BUSY | > TDX_OPERAND_ID_TD_EPOCH) into fast path, and add a helper function > tdx_exit_handlers_fastpath(). > - Remove the warning on TDX_SW_ERROR in fastpath, but return without > further handling. > - Call kvm_machine_check() for EXIT_REASON_MCE_DURING_VMENTRY, align > with VMX case. > - On failed_vmentry in fast path, return without further handling. > - Exit to userspace for #UD and #GP. > - Fix whitespace in tdx_get_exit_info() > - Add a comment in tdx_handle_exit() to describe failed_vmentry case > is handled by TDX_NON_RECOVERABLE handling. > - Move the code of handling NMI, exception and external interrupts out > of the patch, i.e., the NMI handling in tdx_vcpu_enter_exit() and the > wiring of .handle_exit_irqoff() are removed. > - Drop the check for VCPU_TD_STATE_INITIALIZED in tdx_handle_exit() > because it has been checked in tdx_vcpu_pre_run(). > - Update changelog. > --- > arch/x86/include/asm/tdx.h | 1 + > arch/x86/kvm/vmx/main.c | 25 +++++- > arch/x86/kvm/vmx/tdx.c | 164 ++++++++++++++++++++++++++++++++++- > arch/x86/kvm/vmx/tdx_errno.h | 3 + > arch/x86/kvm/vmx/x86_ops.h | 8 ++ > 5 files changed, 198 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index 77477b905dca..01409a59224d 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -18,6 +18,7 @@ > * TDX module. > */ > #define TDX_ERROR _BITUL(63) > +#define TDX_NON_RECOVERABLE _BITUL(62) > #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) > #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) > > diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c > index f8acb1dc7c10..4f6faeb6e8e5 100644 > --- a/arch/x86/kvm/vmx/main.c > +++ b/arch/x86/kvm/vmx/main.c > @@ -165,6 +165,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) > return vmx_vcpu_run(vcpu, force_immediate_exit); > } > > +static int vt_handle_exit(struct kvm_vcpu *vcpu, > + enum exit_fastpath_completion fastpath) > +{ > + if (is_td_vcpu(vcpu)) > + return tdx_handle_exit(vcpu, fastpath); > + > + return vmx_handle_exit(vcpu, fastpath); > +} > + > static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) > { > if (is_td_vcpu(vcpu)) { > @@ -212,6 +221,18 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, > vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); > } > > +static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, > + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) > +{ > + if (is_td_vcpu(vcpu)) { > + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info, > + error_code); > + return; > + } > + > + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); > +} > + > static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) > { > if (!is_td(kvm)) > @@ -305,7 +326,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { > > .vcpu_pre_run = vt_vcpu_pre_run, > .vcpu_run = vt_vcpu_run, > - .handle_exit = vmx_handle_exit, > + .handle_exit = vt_handle_exit, > .skip_emulated_instruction = vmx_skip_emulated_instruction, > .update_emulated_instruction = vmx_update_emulated_instruction, > .set_interrupt_shadow = vmx_set_interrupt_shadow, > @@ -340,7 +361,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { > .set_identity_map_addr = vmx_set_identity_map_addr, > .get_mt_mask = vmx_get_mt_mask, > > - .get_exit_info = vmx_get_exit_info, > + .get_exit_info = vt_get_exit_info, > > .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index f975bb323f60..3dcbdb5a7bf8 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -186,6 +186,54 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) > return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); > } > > +static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu) > +{ > + return (union vmx_exit_reason)(u32)(to_tdx(vcpu)->vp_enter_ret); > +} > + > +/* > + * There is no simple way to check some bit(s) to decide whether the return > + * value of TDH.VP.ENTER has a VMX exit reason or not. E.g., > + * TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE has exit reason but with error bit > + * (bit 63) set, TDX_NON_RECOVERABLE_TD_CORRUPTED_MD has no exit reason but with > + * error bit cleared. > + */ > +static __always_inline bool tdx_has_exit_reason(struct kvm_vcpu *vcpu) > +{ > + u64 status = to_tdx(vcpu)->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK; > + > + return status == TDX_SUCCESS || status == TDX_NON_RECOVERABLE_VCPU || > + status == TDX_NON_RECOVERABLE_TD || > + status == TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE || > + status == TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE; > +} > + > +static __always_inline bool tdx_check_exit_reason(struct kvm_vcpu *vcpu, u16 reason) > +{ > + return tdx_has_exit_reason(vcpu) && > + (u16)tdexit_exit_reason(vcpu).basic == reason; > +} > + > +static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu) > +{ > + return kvm_rcx_read(vcpu); > +} > + > +static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu) > +{ > + return kvm_rdx_read(vcpu); > +} > + > +static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu) > +{ > + return kvm_r8_read(vcpu); > +} > + > +static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu) > +{ > + return kvm_r9_read(vcpu); > +} > + > static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) > { > tdx_guest_keyid_free(kvm_tdx->hkid); > @@ -824,6 +872,21 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) > guest_state_exit_irqoff(); > } > > +static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) > +{ > + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret; > + > + /* See the comment of tdx_seamcall_sept(). */ > + if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY)) > + return EXIT_FASTPATH_REENTER_GUEST; > + > + /* TDH.VP.ENTER checks TD EPOCH which can contend with TDH.MEM.TRACK. */ > + if (unlikely(vp_enter_ret == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH))) > + return EXIT_FASTPATH_REENTER_GUEST; > + > + return EXIT_FASTPATH_NONE; > +} > + > fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) > { > struct vcpu_tdx *tdx = to_tdx(vcpu); > @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) > tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED; > > vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; > + > + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) > + return EXIT_FASTPATH_NONE; > + > + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY))) > + kvm_machine_check(); > + > trace_kvm_exit(vcpu, KVM_ISA_VMX); > > - return EXIT_FASTPATH_NONE; > + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry)) > + return EXIT_FASTPATH_NONE; > + > + return tdx_exit_handlers_fastpath(vcpu); > +} > + > +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) > +{ > + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; > + vcpu->mmio_needed = 0; > + return 0; This function is just same as handle_triple_fault() in vmx.c, why not use it instead? > } > > void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) > @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, > return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); > } > > +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) > +{ > + struct vcpu_tdx *tdx = to_tdx(vcpu); > + u64 vp_enter_ret = tdx->vp_enter_ret; > + union vmx_exit_reason exit_reason; > + > + if (fastpath != EXIT_FASTPATH_NONE) > + return 1; > + > + /* > + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and > + * TDX_SEAMCALL_VMFAILINVALID. > + */ > + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { > + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); > + goto unhandled_exit; > + } > + > + /* > + * Without off-TD debug enabled, failed_vmentry case must have > + * TDX_NON_RECOVERABLE set. > + */ This comment is confusing. I'm not sure why it is put here. Below code does nothing with exit_reason.failed_vmentry. > + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { > + /* Triple fault is non-recoverable. */ > + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) > + return tdx_handle_triple_fault(vcpu); > + > + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", > + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, > + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); It indeed needs clarification for the need of "hkid" and "hkid pa". Especially the "hkdi pa", which is the result of applying HKID of the current TD to a physical address 0. I cannot think of any reason why we need such info. > + goto unhandled_exit; > + } > + > + /* From now, the seamcall status should be TDX_SUCCESS. */ > + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); Is there any case that TDX_SUCCESS with additional non-zero information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64-bit status code. > + exit_reason = tdexit_exit_reason(vcpu); > + > + switch (exit_reason.basic) { > + default: > + break; > + } > + > +unhandled_exit: > + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; > + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; > + vcpu->run->internal.ndata = 2; > + vcpu->run->internal.data[0] = vp_enter_ret; > + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; > + return 0; > +} > + > +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, > + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) > +{ > + struct vcpu_tdx *tdx = to_tdx(vcpu); > + > + if (tdx_has_exit_reason(vcpu)) { > + /* > + * Encode some useful info from the the 64 bit return code > + * into the 32 bit exit 'reason'. If the VMX exit reason is > + * valid, just set it to those bits. > + */ > + *reason = (u32)tdx->vp_enter_ret; > + *info1 = tdexit_exit_qual(vcpu); > + *info2 = tdexit_ext_exit_qual(vcpu); > + } else { > + /* > + * When the VMX exit reason in vp_enter_ret is not valid, > + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to > + * mean the vmexit code is not valid. Set the other bits to > + * try to avoid picking a value that may someday be a valid > + * VMX exit code. > + */ > + *reason = 0xFFFFFFFF; > + *info1 = 0; > + *info2 = 0; > + } > + > + *intr_info = tdexit_intr_info(vcpu); > + *error_code = 0; > +} > + > static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) > { > const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; > diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h > index f9dbb3a065cc..6ff4672c4181 100644 > --- a/arch/x86/kvm/vmx/tdx_errno.h > +++ b/arch/x86/kvm/vmx/tdx_errno.h > @@ -10,6 +10,9 @@ > * TDX SEAMCALL Status Codes (returned in RAX) > */ > #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL > +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL > +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL > +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL Not the fault of this patch. There are other Status code defined in arch/x86/include/asm/tdx.h /* * TDX module SEAMCALL leaf function error codes */ #define TDX_SUCCESS 0ULL #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL It's better to put them in one single place. > #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL > #define TDX_OPERAND_INVALID 0xC000010000000000ULL > #define TDX_OPERAND_BUSY 0x8000020000000000ULL > diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h > index 02b33390e1bf..1c18943e0e1d 100644 > --- a/arch/x86/kvm/vmx/x86_ops.h > +++ b/arch/x86/kvm/vmx/x86_ops.h > @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); > fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); > void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); > void tdx_vcpu_put(struct kvm_vcpu *vcpu); > +int tdx_handle_exit(struct kvm_vcpu *vcpu, > + enum exit_fastpath_completion fastpath); > +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, > + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); > > int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); > > @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat > } > static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} > static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} > +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, > + enum exit_fastpath_completion fastpath) { return 0; } > +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, > + u64 *info2, u32 *intr_info, u32 *error_code) {} > > static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } >
On 12/13/2024 4:57 PM, Xiaoyao Li wrote: > On 12/1/2024 11:53 AM, Binbin Wu wrote: >> [...] >> + >> +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) >> +{ >> + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; >> + vcpu->mmio_needed = 0; >> + return 0; > > This function is just same as handle_triple_fault() in vmx.c, why not use it instead? Yes, handle_triple_fault() could be moved to vmx.h can then it can be used by tdx code. Will do to. > >> } >> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) >> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, >> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); >> } >> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) >> +{ >> + struct vcpu_tdx *tdx = to_tdx(vcpu); >> + u64 vp_enter_ret = tdx->vp_enter_ret; >> + union vmx_exit_reason exit_reason; >> + >> + if (fastpath != EXIT_FASTPATH_NONE) >> + return 1; >> + >> + /* >> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and >> + * TDX_SEAMCALL_VMFAILINVALID. >> + */ >> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { >> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); >> + goto unhandled_exit; >> + } >> + >> + /* >> + * Without off-TD debug enabled, failed_vmentry case must have >> + * TDX_NON_RECOVERABLE set. >> + */ > > This comment is confusing. I'm not sure why it is put here. Below code does nothing with exit_reason.failed_vmentry. Because when failed_vmentry occurs, vp_enter_ret will have TDX_NON_RECOVERABLE set, so it will be handled below. > >> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { >> + /* Triple fault is non-recoverable. */ >> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) >> + return tdx_handle_triple_fault(vcpu); >> + >> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", >> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, >> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); > > It indeed needs clarification for the need of "hkid" and "hkid pa". Especially the "hkdi pa", which is the result of applying HKID of the current TD to a physical address 0. I cannot think of any reason why we need such info. Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed. I didn't notice it. Thanks! > >> + goto unhandled_exit; >> + } >> + >> + /* From now, the seamcall status should be TDX_SUCCESS. */ >> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); > > Is there any case that TDX_SUCCESS with additional non-zero information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64-bit status code. TDX status code uses the upper 32-bits. When the status code is TDX_SUCCESS and has a valid VMX exit reason, the lower 32-bit is the VMX exit reason. You can refer to the TDX module ABI spec or interface_function_completion_status.json from the intel-tdx-module-1.5-abi-table for details. > >> + exit_reason = tdexit_exit_reason(vcpu); >> + >> + switch (exit_reason.basic) { >> + default: >> + break; >> + } >> + >> +unhandled_exit: >> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; >> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; >> + vcpu->run->internal.ndata = 2; >> + vcpu->run->internal.data[0] = vp_enter_ret; >> + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; >> + return 0; >> +} >> + >> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, >> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) >> +{ >> + struct vcpu_tdx *tdx = to_tdx(vcpu); >> + >> + if (tdx_has_exit_reason(vcpu)) { >> + /* >> + * Encode some useful info from the the 64 bit return code >> + * into the 32 bit exit 'reason'. If the VMX exit reason is >> + * valid, just set it to those bits. >> + */ >> + *reason = (u32)tdx->vp_enter_ret; >> + *info1 = tdexit_exit_qual(vcpu); >> + *info2 = tdexit_ext_exit_qual(vcpu); >> + } else { >> + /* >> + * When the VMX exit reason in vp_enter_ret is not valid, >> + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to >> + * mean the vmexit code is not valid. Set the other bits to >> + * try to avoid picking a value that may someday be a valid >> + * VMX exit code. >> + */ >> + *reason = 0xFFFFFFFF; >> + *info1 = 0; >> + *info2 = 0; >> + } >> + >> + *intr_info = tdexit_intr_info(vcpu); >> + *error_code = 0; >> +} >> + >> static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) >> { >> const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; >> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h >> index f9dbb3a065cc..6ff4672c4181 100644 >> --- a/arch/x86/kvm/vmx/tdx_errno.h >> +++ b/arch/x86/kvm/vmx/tdx_errno.h >> @@ -10,6 +10,9 @@ >> * TDX SEAMCALL Status Codes (returned in RAX) >> */ >> #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL >> +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL >> +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL >> +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL > > Not the fault of this patch. > > There are other Status code defined in arch/x86/include/asm/tdx.h > > /* > * TDX module SEAMCALL leaf function error codes > */ > #define TDX_SUCCESS 0ULL > #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL > > It's better to put them in one single place. Agree. Thanks! > >> #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL >> #define TDX_OPERAND_INVALID 0xC000010000000000ULL >> #define TDX_OPERAND_BUSY 0x8000020000000000ULL >> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h >> index 02b33390e1bf..1c18943e0e1d 100644 >> --- a/arch/x86/kvm/vmx/x86_ops.h >> +++ b/arch/x86/kvm/vmx/x86_ops.h >> @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); >> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); >> void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); >> void tdx_vcpu_put(struct kvm_vcpu *vcpu); >> +int tdx_handle_exit(struct kvm_vcpu *vcpu, >> + enum exit_fastpath_completion fastpath); >> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, >> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); >> int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); >> @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat >> } >> static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} >> static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} >> +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, >> + enum exit_fastpath_completion fastpath) { return 0; } >> +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, >> + u64 *info2, u32 *intr_info, u32 *error_code) {} >> static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } >
On 12/16/2024 8:54 AM, Binbin Wu wrote: > > > > On 12/13/2024 4:57 PM, Xiaoyao Li wrote: >> On 12/1/2024 11:53 AM, Binbin Wu wrote: ... >> >>> } >>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int >>> pgd_level) >>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm >>> *kvm, gfn_t gfn, >>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); >>> } >>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) >>> +{ >>> + struct vcpu_tdx *tdx = to_tdx(vcpu); >>> + u64 vp_enter_ret = tdx->vp_enter_ret; >>> + union vmx_exit_reason exit_reason; >>> + >>> + if (fastpath != EXIT_FASTPATH_NONE) >>> + return 1; >>> + >>> + /* >>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, >>> TDX_SEAMCALL_GP and >>> + * TDX_SEAMCALL_VMFAILINVALID. >>> + */ >>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { >>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); >>> + goto unhandled_exit; >>> + } >>> + >>> + /* >>> + * Without off-TD debug enabled, failed_vmentry case must have >>> + * TDX_NON_RECOVERABLE set. >>> + */ >> >> This comment is confusing. I'm not sure why it is put here. Below code >> does nothing with exit_reason.failed_vmentry. > > Because when failed_vmentry occurs, vp_enter_ret will have > TDX_NON_RECOVERABLE set, so it will be handled below. The words somehow is confusing, which to me is implying something like: WARN_ON(!debug_td() && exit_reason.failed_vmentry && !(vp_enter_ret & TDX_NON_RECOVERABLE)) Besides, VMX returns KVM_EXIT_FAIL_ENTRY for vm-entry failure. So the question is why TDX cannot do it same way? >> >>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { >>> + /* Triple fault is non-recoverable. */ >>> + if (unlikely(tdx_check_exit_reason(vcpu, >>> EXIT_REASON_TRIPLE_FAULT))) >>> + return tdx_handle_triple_fault(vcpu); >>> + >>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa >>> 0x%llx\n", >>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, >>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); >> >> It indeed needs clarification for the need of "hkid" and "hkid pa". >> Especially the "hkdi pa", which is the result of applying HKID of the >> current TD to a physical address 0. I cannot think of any reason why >> we need such info. > Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed. > I didn't notice it. don't forget to justify why HKID is useful here. To me, HKID can be dropped as well. > Thanks! > > >> >>> + goto unhandled_exit; >>> + } >>> + >>> + /* From now, the seamcall status should be TDX_SUCCESS. */ >>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != >>> TDX_SUCCESS); >> >> Is there any case that TDX_SUCCESS with additional non-zero >> information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64- >> bit status code. > TDX status code uses the upper 32-bits. > > When the status code is TDX_SUCCESS and has a valid VMX exit reason, the > lower > 32-bit is the VMX exit reason. > > You can refer to the TDX module ABI spec or > interface_function_completion_status.json > from the intel-tdx-module-1.5-abi-table for details. > I see. (I asked a silly question that I even missed the normal Exit case)
On 12/16/2024 12:37 PM, Xiaoyao Li wrote: > On 12/16/2024 8:54 AM, Binbin Wu wrote: >> >> >> >> On 12/13/2024 4:57 PM, Xiaoyao Li wrote: >>> On 12/1/2024 11:53 AM, Binbin Wu wrote: > ... >>> >>>> } >>>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) >>>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, >>>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); >>>> } >>>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) >>>> +{ >>>> + struct vcpu_tdx *tdx = to_tdx(vcpu); >>>> + u64 vp_enter_ret = tdx->vp_enter_ret; >>>> + union vmx_exit_reason exit_reason; >>>> + >>>> + if (fastpath != EXIT_FASTPATH_NONE) >>>> + return 1; >>>> + >>>> + /* >>>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and >>>> + * TDX_SEAMCALL_VMFAILINVALID. >>>> + */ >>>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { >>>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); >>>> + goto unhandled_exit; >>>> + } >>>> + >>>> + /* >>>> + * Without off-TD debug enabled, failed_vmentry case must have >>>> + * TDX_NON_RECOVERABLE set. >>>> + */ >>> >>> This comment is confusing. I'm not sure why it is put here. Below code does nothing with exit_reason.failed_vmentry. >> >> Because when failed_vmentry occurs, vp_enter_ret will have >> TDX_NON_RECOVERABLE set, so it will be handled below. > > The words somehow is confusing, which to me is implying something like: > > WARN_ON(!debug_td() && exit_reason.failed_vmentry && > !(vp_enter_ret & TDX_NON_RECOVERABLE)) The comment tried to say that the failed_vmentry case is also covered by TDX_NON_RECOVERABLE since currently off-TD debug is not supported yet. Since it's causing confusion, I'd like to add the code the handle the failed_vmentry separately and aligned to VMX by using KVM_EXIT_FAIL_ENTRY as suggested below. > > Besides, VMX returns KVM_EXIT_FAIL_ENTRY for vm-entry failure. So the question is why TDX cannot do it same way? > >>> >>>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { >>>> + /* Triple fault is non-recoverable. */ >>>> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) >>>> + return tdx_handle_triple_fault(vcpu); >>>> + >>>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", >>>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, >>>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); >>> >>> It indeed needs clarification for the need of "hkid" and "hkid pa". Especially the "hkdi pa", which is the result of applying HKID of the current TD to a physical address 0. I cannot think of any reason why we need such info. >> Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed. >> I didn't notice it. > > don't forget to justify why HKID is useful here. To me, HKID can be dropped as well. OK. > >> Thanks! >> >> >>> >>>> + goto unhandled_exit; >>>> + } >>>> + >>>> + /* From now, the seamcall status should be TDX_SUCCESS. */ >>>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); >>> >>> Is there any case that TDX_SUCCESS with additional non-zero information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64- bit status code. >> TDX status code uses the upper 32-bits. >> >> When the status code is TDX_SUCCESS and has a valid VMX exit reason, the lower >> 32-bit is the VMX exit reason. >> >> You can refer to the TDX module ABI spec or interface_function_completion_status.json >> from the intel-tdx-module-1.5-abi-table for details. >> > > I see. (I asked a silly question that I even missed the normal Exit case) > >
On Sun, Dec 1, 2024 at 4:52 AM Binbin Wu <binbin.wu@linux.intel.com> wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Introduce the wiring for handling TDX VM exits by implementing the > callbacks .get_exit_info(), and .handle_exit(). Linux 6.14 adds get_entry_info. For lack of a better place, I added it in this patch when rebasing kvm-coco-queue: static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code) { *intr_info = 0; *error_code = 0; if (is_td_vcpu(vcpu)) return; vmx_get_entry_info(vcpu, intr_info, error_code); } Thanks, Paolo
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 77477b905dca..01409a59224d 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -18,6 +18,7 @@ * TDX module. */ #define TDX_ERROR _BITUL(63) +#define TDX_NON_RECOVERABLE _BITUL(62) #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index f8acb1dc7c10..4f6faeb6e8e5 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -165,6 +165,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return vmx_vcpu_run(vcpu, force_immediate_exit); } +static int vt_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) +{ + if (is_td_vcpu(vcpu)) + return tdx_handle_exit(vcpu, fastpath); + + return vmx_handle_exit(vcpu, fastpath); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -212,6 +221,18 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + if (is_td_vcpu(vcpu)) { + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info, + error_code); + return; + } + + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -305,7 +326,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_pre_run = vt_vcpu_pre_run, .vcpu_run = vt_vcpu_run, - .handle_exit = vmx_handle_exit, + .handle_exit = vt_handle_exit, .skip_emulated_instruction = vmx_skip_emulated_instruction, .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, @@ -340,7 +361,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_identity_map_addr = vmx_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, - .get_exit_info = vmx_get_exit_info, + .get_exit_info = vt_get_exit_info, .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f975bb323f60..3dcbdb5a7bf8 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -186,6 +186,54 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); } +static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu) +{ + return (union vmx_exit_reason)(u32)(to_tdx(vcpu)->vp_enter_ret); +} + +/* + * There is no simple way to check some bit(s) to decide whether the return + * value of TDH.VP.ENTER has a VMX exit reason or not. E.g., + * TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE has exit reason but with error bit + * (bit 63) set, TDX_NON_RECOVERABLE_TD_CORRUPTED_MD has no exit reason but with + * error bit cleared. + */ +static __always_inline bool tdx_has_exit_reason(struct kvm_vcpu *vcpu) +{ + u64 status = to_tdx(vcpu)->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK; + + return status == TDX_SUCCESS || status == TDX_NON_RECOVERABLE_VCPU || + status == TDX_NON_RECOVERABLE_TD || + status == TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE || + status == TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE; +} + +static __always_inline bool tdx_check_exit_reason(struct kvm_vcpu *vcpu, u16 reason) +{ + return tdx_has_exit_reason(vcpu) && + (u16)tdexit_exit_reason(vcpu).basic == reason; +} + +static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu) +{ + return kvm_rcx_read(vcpu); +} + +static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu) +{ + return kvm_rdx_read(vcpu); +} + +static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu) +{ + return kvm_r8_read(vcpu); +} + +static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu) +{ + return kvm_r9_read(vcpu); +} + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -824,6 +872,21 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) guest_state_exit_irqoff(); } +static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) +{ + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret; + + /* See the comment of tdx_seamcall_sept(). */ + if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY)) + return EXIT_FASTPATH_REENTER_GUEST; + + /* TDH.VP.ENTER checks TD EPOCH which can contend with TDH.MEM.TRACK. */ + if (unlikely(vp_enter_ret == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH))) + return EXIT_FASTPATH_REENTER_GUEST; + + return EXIT_FASTPATH_NONE; +} + fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED; vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; + + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) + return EXIT_FASTPATH_NONE; + + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY))) + kvm_machine_check(); + trace_kvm_exit(vcpu, KVM_ISA_VMX); - return EXIT_FASTPATH_NONE; + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry)) + return EXIT_FASTPATH_NONE; + + return tdx_exit_handlers_fastpath(vcpu); +} + +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) +{ + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; + vcpu->mmio_needed = 0; + return 0; } void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); } +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 vp_enter_ret = tdx->vp_enter_ret; + union vmx_exit_reason exit_reason; + + if (fastpath != EXIT_FASTPATH_NONE) + return 1; + + /* + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and + * TDX_SEAMCALL_VMFAILINVALID. + */ + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); + goto unhandled_exit; + } + + /* + * Without off-TD debug enabled, failed_vmentry case must have + * TDX_NON_RECOVERABLE set. + */ + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { + /* Triple fault is non-recoverable. */ + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) + return tdx_handle_triple_fault(vcpu); + + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); + goto unhandled_exit; + } + + /* From now, the seamcall status should be TDX_SUCCESS. */ + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); + exit_reason = tdexit_exit_reason(vcpu); + + switch (exit_reason.basic) { + default: + break; + } + +unhandled_exit: + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; + vcpu->run->internal.ndata = 2; + vcpu->run->internal.data[0] = vp_enter_ret; + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + return 0; +} + +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (tdx_has_exit_reason(vcpu)) { + /* + * Encode some useful info from the the 64 bit return code + * into the 32 bit exit 'reason'. If the VMX exit reason is + * valid, just set it to those bits. + */ + *reason = (u32)tdx->vp_enter_ret; + *info1 = tdexit_exit_qual(vcpu); + *info2 = tdexit_ext_exit_qual(vcpu); + } else { + /* + * When the VMX exit reason in vp_enter_ret is not valid, + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to + * mean the vmexit code is not valid. Set the other bits to + * try to avoid picking a value that may someday be a valid + * VMX exit code. + */ + *reason = 0xFFFFFFFF; + *info1 = 0; + *info2 = 0; + } + + *intr_info = tdexit_intr_info(vcpu); + *error_code = 0; +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index f9dbb3a065cc..6ff4672c4181 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -10,6 +10,9 @@ * TDX SEAMCALL Status Codes (returned in RAX) */ #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 02b33390e1bf..1c18943e0e1d 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath); +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) { return 0; } +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, + u64 *info2, u32 *intr_info, u32 *error_code) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }