Message ID | 20250113021138.18875-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: TDX SEPT SEAMCALL retry | expand |
On Mon, 2025-01-13 at 10:11 +0800, Yan Zhao wrote: > Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID) > instead of 1 in kvm_mmu_page_fault(). > > The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e., > npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(), > kvm_handle_page_fault()). They either check if the return value is > 0 (as > in npf_interception()) or pass it further to vcpu_run() to decide whether > to break out of the kernel loop and return to the user when r <= 0. > Therefore, returning any positive value is equivalent to returning 1. > > Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure > a positive return value. > > This is a preparation to allow TDX's EPT violation handler to check the > RET_PF* value and retry internally for RET_PF_RETRY. > > No functional changes are intended. Any reason why this can't go ahead of the TDX patches? Seems pretty generic cleanup.
On Wed, Jan 15, 2025 at 06:24:43AM +0800, Edgecombe, Rick P wrote: > On Mon, 2025-01-13 at 10:11 +0800, Yan Zhao wrote: > > Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID) > > instead of 1 in kvm_mmu_page_fault(). > > > > The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e., > > npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(), > > kvm_handle_page_fault()). They either check if the return value is > 0 (as > > in npf_interception()) or pass it further to vcpu_run() to decide whether > > to break out of the kernel loop and return to the user when r <= 0. > > Therefore, returning any positive value is equivalent to returning 1. > > > > Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure > > a positive return value. > > > > This is a preparation to allow TDX's EPT violation handler to check the > > RET_PF* value and retry internally for RET_PF_RETRY. > > > > No functional changes are intended. > > Any reason why this can't go ahead of the TDX patches? Seems pretty generic > cleanup. Hmm, I wouldn't consider this a cleanup, as returning 1 to indicate continuation of the kernel loop is a well-established convention in arch/x86/kvm. Returning a positive RET_PF_* in this patch is primarily a preparatory step for the subsequent patch, "KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY".
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index eedc6ff37b89..53dcf600e934 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6120,8 +6120,16 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err else if (r == RET_PF_SPURIOUS) vcpu->stat.pf_spurious++; + /* + * None of handle_mmio_page_fault(), kvm_mmu_do_page_fault(), or + * kvm_mmu_write_protect_fault() return RET_PF_CONTINUE. + * kvm_mmu_do_page_fault() only uses RET_PF_CONTINUE internally to + * indicate continuing the page fault handling until to the final + * page table mapping phase. + */ + WARN_ON_ONCE(r == RET_PF_CONTINUE); if (r != RET_PF_EMULATE) - return 1; + return r; emulate: return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn, diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 957636660149..4fde91cade1b 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -315,9 +315,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h * * Note, all values must be greater than or equal to zero so as not to encroach - * on -errno return values. Somewhat arbitrarily use '0' for CONTINUE, which - * will allow for efficient machine code when checking for CONTINUE, e.g. - * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero. + * on -errno return values. */ enum { RET_PF_CONTINUE = 0, @@ -329,6 +327,14 @@ enum { RET_PF_SPURIOUS, }; +/* + * Define RET_PF_CONTINUE as 0 to allow for + * - efficient machine code when checking for CONTINUE, e.g. + * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero, + * - kvm_mmu_do_page_fault() to return other RET_PF_* as a positive value. + */ +static_assert(RET_PF_CONTINUE == 0); + static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) {
Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID) instead of 1 in kvm_mmu_page_fault(). The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e., npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(), kvm_handle_page_fault()). They either check if the return value is > 0 (as in npf_interception()) or pass it further to vcpu_run() to decide whether to break out of the kernel loop and return to the user when r <= 0. Therefore, returning any positive value is equivalent to returning 1. Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure a positive return value. This is a preparation to allow TDX's EPT violation handler to check the RET_PF* value and retry internally for RET_PF_RETRY. No functional changes are intended. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- arch/x86/kvm/mmu/mmu.c | 10 +++++++++- arch/x86/kvm/mmu/mmu_internal.h | 12 +++++++++--- 2 files changed, 18 insertions(+), 4 deletions(-)