diff mbox series

[2/7] KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault()

Message ID 20250113021138.18875-1-yan.y.zhao@intel.com (mailing list archive)
State New
Headers show
Series KVM: TDX SEPT SEAMCALL retry | expand

Commit Message

Yan Zhao Jan. 13, 2025, 2:11 a.m. UTC
Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID)
instead of 1 in kvm_mmu_page_fault().

The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e.,
npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(),
kvm_handle_page_fault()). They either check if the return value is > 0 (as
in npf_interception()) or pass it further to vcpu_run() to decide whether
to break out of the kernel loop and return to the user when r <= 0.
Therefore, returning any positive value is equivalent to returning 1.

Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure
a positive return value.

This is a preparation to allow TDX's EPT violation handler to check the
RET_PF* value and retry internally for RET_PF_RETRY.

No functional changes are intended.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c          | 10 +++++++++-
 arch/x86/kvm/mmu/mmu_internal.h | 12 +++++++++---
 2 files changed, 18 insertions(+), 4 deletions(-)

Comments

Edgecombe, Rick P Jan. 14, 2025, 10:24 p.m. UTC | #1
On Mon, 2025-01-13 at 10:11 +0800, Yan Zhao wrote:
> Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID)
> instead of 1 in kvm_mmu_page_fault().
> 
> The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e.,
> npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(),
> kvm_handle_page_fault()). They either check if the return value is > 0 (as
> in npf_interception()) or pass it further to vcpu_run() to decide whether
> to break out of the kernel loop and return to the user when r <= 0.
> Therefore, returning any positive value is equivalent to returning 1.
> 
> Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure
> a positive return value.
> 
> This is a preparation to allow TDX's EPT violation handler to check the
> RET_PF* value and retry internally for RET_PF_RETRY.
> 
> No functional changes are intended.

Any reason why this can't go ahead of the TDX patches? Seems pretty generic
cleanup.
Yan Zhao Jan. 15, 2025, 4:58 a.m. UTC | #2
On Wed, Jan 15, 2025 at 06:24:43AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-01-13 at 10:11 +0800, Yan Zhao wrote:
> > Return RET_PF* (excluding RET_PF_EMULATE/RET_PF_CONTINUE/RET_PF_INVALID)
> > instead of 1 in kvm_mmu_page_fault().
> > 
> > The callers of kvm_mmu_page_fault() are KVM page fault handlers (i.e.,
> > npf_interception(), handle_ept_misconfig(), __vmx_handle_ept_violation(),
> > kvm_handle_page_fault()). They either check if the return value is > 0 (as
> > in npf_interception()) or pass it further to vcpu_run() to decide whether
> > to break out of the kernel loop and return to the user when r <= 0.
> > Therefore, returning any positive value is equivalent to returning 1.
> > 
> > Warn if r == RET_PF_CONTINUE (which should not be a valid value) to ensure
> > a positive return value.
> > 
> > This is a preparation to allow TDX's EPT violation handler to check the
> > RET_PF* value and retry internally for RET_PF_RETRY.
> > 
> > No functional changes are intended.
> 
> Any reason why this can't go ahead of the TDX patches? Seems pretty generic
> cleanup.
Hmm, I wouldn't consider this a cleanup, as returning 1 to indicate continuation
of the kernel loop is a well-established convention in arch/x86/kvm.

Returning a positive RET_PF_* in this patch is primarily a preparatory step for
the subsequent patch,
"KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY".
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eedc6ff37b89..53dcf600e934 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6120,8 +6120,16 @@  int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	else if (r == RET_PF_SPURIOUS)
 		vcpu->stat.pf_spurious++;
 
+	/*
+	 * None of handle_mmio_page_fault(), kvm_mmu_do_page_fault(), or
+	 * kvm_mmu_write_protect_fault() return RET_PF_CONTINUE.
+	 * kvm_mmu_do_page_fault() only uses RET_PF_CONTINUE internally to
+	 * indicate continuing the page fault handling until to the final
+	 * page table mapping phase.
+	 */
+	WARN_ON_ONCE(r == RET_PF_CONTINUE);
 	if (r != RET_PF_EMULATE)
-		return 1;
+		return r;
 
 emulate:
 	return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 957636660149..4fde91cade1b 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -315,9 +315,7 @@  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
  * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
  *
  * Note, all values must be greater than or equal to zero so as not to encroach
- * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
- * will allow for efficient machine code when checking for CONTINUE, e.g.
- * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
+ * on -errno return values.
  */
 enum {
 	RET_PF_CONTINUE = 0,
@@ -329,6 +327,14 @@  enum {
 	RET_PF_SPURIOUS,
 };
 
+/*
+ * Define RET_PF_CONTINUE as 0 to allow for
+ * - efficient machine code when checking for CONTINUE, e.g.
+ *   "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero,
+ * - kvm_mmu_do_page_fault() to return other RET_PF_* as a positive value.
+ */
+static_assert(RET_PF_CONTINUE == 0);
+
 static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 						     struct kvm_page_fault *fault)
 {