@@ -1854,6 +1854,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
{
gpa_t gpa = tdexit_gpa(vcpu);
unsigned long exit_qual;
+ bool local_retry = false;
+ int ret;
if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
@@ -1872,6 +1874,24 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
* due to aliasing a single HPA to multiple GPAs.
*/
exit_qual = EPT_VIOLATION_ACC_WRITE;
+
+ /*
+ * Mapping of private memory may return RET_PF_RETRY due to
+ * SEAMCALL contention, e.g.
+ * - TDH.MEM.PAGE.AUG/TDH.MEM.SEPT.ADD on local vCPU may
+ * contend with TDH.VP.ENTER (due to 0-step mitigation)
+ * on a remote vCPU.
+ * - TDH.MEM.PAGE.AUG/TDH.MEM.SEPT.ADD on local vCPU may
+ * contend with TDG.MEM.PAGE.ACCEPT on a remote vCPU.
+ *
+ * Retry internally in TDX to prevent exacerbating the
+ * activation of 0-step mitigation on local vCPU.
+ * However, despite these retries, the 0-step mitigation on the
+ * local vCPU may still be triggered due to:
+ * - Exiting on signals, interrupts.
+ * - KVM_EXIT_MEMORY_FAULT.
+ */
+ local_retry = true;
} else {
exit_qual = tdexit_exit_qual(vcpu);
/*
@@ -1884,7 +1904,24 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
}
trace_kvm_page_fault(vcpu, tdexit_gpa(vcpu), exit_qual);
- return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual);
+
+ while (1) {
+ ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
+
+ if (ret != RET_PF_RETRY || !local_retry)
+ break;
+
+ /*
+ * Break and keep the orig return value.
+ * Signal & irq handling will be done later in vcpu_run()
+ */
+ if (signal_pending(current) || pi_has_pending_interrupt(vcpu) ||
+ kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending)
+ break;
+
+ cond_resched();
+ }
+ return ret;
}
int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
Retry locally in the TDX EPT violation handler for private memory to reduce the chances for tdh_mem_sept_add()/tdh_mem_page_aug() to contend with tdh_vp_enter(). TDX EPT violation installs private pages via tdh_mem_sept_add() and tdh_mem_page_aug(). The two may have contention with tdh_vp_enter() or TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) tdh_mem_page_aug ------------------------------------------------------------ SEPT entry tdh_mem_sept_add (Host lock) tdh_mem_page_aug (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Though the contention between tdh_mem_sept_add()/tdh_mem_page_aug() and TDCALLs may be removed in future TDX module, their contention with tdh_vp_enter() due to 0-step mitigation still persists. The TDX module may trigger 0-step mitigation in SEAMCALL TDH.VP.ENTER, which works as follows: 0. Each TDH.VP.ENTER records the guest RIP on TD entry. 1. When the TDX module encounters a VM exit with reason EPT_VIOLATION, it checks if the guest RIP is the same as last guest RIP on TD entry. -if yes, it means the EPT violation is caused by the same instruction that caused the last VM exit. Then, the TDX module increases the guest RIP no-progress count. When the count increases from 0 to the threshold (currently 6), the TDX module records the faulting GPA into a last_epf_gpa_list. -if no, it means the guest RIP has made progress. So, the TDX module resets the RIP no-progress count and the last_epf_gpa_list. 2. On the next TDH.VP.ENTER, the TDX module (after saving the guest RIP on TD entry) checks if the last_epf_gpa_list is empty. -if yes, TD entry continues without acquiring the lock on the SEPT tree. -if no, it triggers the 0-step mitigation by acquiring the exclusive lock on SEPT tree, walking the EPT tree to check if all page faults caused by the GPAs in the last_epf_gpa_list have been resolved before continuing TD entry. Since KVM TDP MMU usually re-enters guest whenever it exits to userspace (e.g. for KVM_EXIT_MEMORY_FAULT) or encounters a BUSY, it is possible for a tdh_vp_enter() to be called more than the threshold count before a page fault is addressed, triggering contention when tdh_vp_enter() attempts to acquire exclusive lock on SEPT tree. Retry locally in TDX EPT violation handler to reduce the count of invoking tdh_vp_enter(), hence reducing the possibility of its contention with tdh_mem_sept_add()/tdh_mem_page_aug(). However, the 0-step mitigation and the contention are still not eliminated due to KVM_EXIT_MEMORY_FAULT, signals/interrupts, and cases when one instruction faults more GFNs than the threshold count. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- arch/x86/kvm/vmx/tdx.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)