diff mbox series

[5/5] kvm/x86: rework guest entry logic

Message ID 20220111153539.2532246-6-mark.rutland@arm.com (mailing list archive)
State New, archived
Headers show
Series kvm: fix latent guest entry/exit bugs | expand

Commit Message

Mark Rutland Jan. 11, 2022, 3:35 p.m. UTC
For consistency and clarity, migrate x86 over to the generic helpers for
guest timing and lockdep/RCU/tracing management, and remove the
x86-specific helpers.

Prior to this patch, the guest timing was entered in
kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
svm_vcpu_enter_exit()), and was exited by the call to
vtime_account_guest_exit() within vcpu_enter_guest().

To minimize duplication and to more clearly balance entry and exit, both
entry and exit of guest timing are placed in vcpu_enter_guest(), using
the new guest_timing_{enter,exit}_irqoff() helpers. This may result in a
small amount of additional time being acounted towards guests.

Other than this, there should be no functional change as a result of
this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/svm/svm.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c |  4 ++--
 arch/x86/kvm/x86.c     |  4 +++-
 arch/x86/kvm/x86.h     | 45 ------------------------------------------
 4 files changed, 7 insertions(+), 50 deletions(-)

Comments

Sean Christopherson Jan. 13, 2022, 8:50 p.m. UTC | #1
On Tue, Jan 11, 2022, Mark Rutland wrote:
> For consistency and clarity, migrate x86 over to the generic helpers for
> guest timing and lockdep/RCU/tracing management, and remove the
> x86-specific helpers.
> 
> Prior to this patch, the guest timing was entered in
> kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
> svm_vcpu_enter_exit()), and was exited by the call to
> vtime_account_guest_exit() within vcpu_enter_guest().
> 
> To minimize duplication and to more clearly balance entry and exit, both
> entry and exit of guest timing are placed in vcpu_enter_guest(), using
> the new guest_timing_{enter,exit}_irqoff() helpers. This may result in a
> small amount of additional time being acounted towards guests.

This can be further qualified to state that it only affects time accounting when
using context tracking; tick-based accounting is unaffected because IRQs are
disabled the entire time.

And this might actually be a (benign?) bug fix for context tracking accounting in
the EXIT_FASTPATH_REENTER_GUEST case (commits ae95f566b3d2 "KVM: X86: TSCDEADLINE
MSR emulation fastpath" and 26efe2fd92e5, "KVM: VMX: Handle preemption timer
fastpath").  In those cases, KVM will enter the guest multiple times without
bouncing through vtime_account_guest_exit().  That means vtime_guest_enter() will
be called when the CPU is already "in guest", and call vtime_account_system()
when it really should call vtime_account_guest().  account_system_time() does
check PF_VCPU and redirect to account_guest_time(), so it appears to be benign,
but it's at least odd.

> Other than this, there should be no functional change as a result of
> this patch.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e50e97ac4408..bd3873b90889 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9876,6 +9876,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		set_debugreg(0, 7);
>  	}
>  
> +	guest_timing_enter_irqoff();
> +
>  	for (;;) {
>  		/*
>  		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
> @@ -9949,7 +9951,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 * of accounting via context tracking, but the loss of accuracy is
>  	 * acceptable for all known use cases.
>  	 */
> -	vtime_account_guest_exit();
> +	guest_timing_exit_irqoff();
>  
>  	if (lapic_in_kernel(vcpu)) {
>  		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
Mark Rutland Jan. 14, 2022, 12:05 p.m. UTC | #2
On Thu, Jan 13, 2022 at 08:50:00PM +0000, Sean Christopherson wrote:
> On Tue, Jan 11, 2022, Mark Rutland wrote:
> > For consistency and clarity, migrate x86 over to the generic helpers for
> > guest timing and lockdep/RCU/tracing management, and remove the
> > x86-specific helpers.
> > 
> > Prior to this patch, the guest timing was entered in
> > kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
> > svm_vcpu_enter_exit()), and was exited by the call to
> > vtime_account_guest_exit() within vcpu_enter_guest().
> > 
> > To minimize duplication and to more clearly balance entry and exit, both
> > entry and exit of guest timing are placed in vcpu_enter_guest(), using
> > the new guest_timing_{enter,exit}_irqoff() helpers. This may result in a
> > small amount of additional time being acounted towards guests.
> 
> This can be further qualified to state that it only affects time accounting when
> using context tracking; tick-based accounting is unaffected because IRQs are
> disabled the entire time.

Ok. I'll replace that last sentence with:

  When context tracking is used a small amount of additional time will be
  accounted towards guests; tick-based accounting is unnaffected as IRQs are
  disabled at this point and not enabled until after the return from the guest.

> 
> And this might actually be a (benign?) bug fix for context tracking accounting in
> the EXIT_FASTPATH_REENTER_GUEST case (commits ae95f566b3d2 "KVM: X86: TSCDEADLINE
> MSR emulation fastpath" and 26efe2fd92e5, "KVM: VMX: Handle preemption timer
> fastpath").  In those cases, KVM will enter the guest multiple times without
> bouncing through vtime_account_guest_exit().  That means vtime_guest_enter() will
> be called when the CPU is already "in guest", and call vtime_account_system()
> when it really should call vtime_account_guest().  account_system_time() does
> check PF_VCPU and redirect to account_guest_time(), so it appears to be benign,
> but it's at least odd.
> 
> > Other than this, there should be no functional change as a result of
> > this patch.

I've added wording:

  This also corrects (benign) mis-balanced context tracking accounting
  introduced in commits:
  
    ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
    26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath")
  
  Where KVM can enter a guest multiple times, calling vtime_guest_enter()
  without a corresponding call to vtime_account_guest_exit(), and with
  vtime_account_system() called when vtime_account_guest() should be used.
  As account_system_time() checks PF_VCPU and calls account_guest_time(),
  this doesn't result in any functional problem, but is unnecessarily
  confusing.

... and deleted the "no functional change" line for now.

I assume that other than the naming of the entry/exit functions you're happy
with this patch?

Thanks,
Mark.

> ...
> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index e50e97ac4408..bd3873b90889 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9876,6 +9876,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  		set_debugreg(0, 7);
> >  	}
> >  
> > +	guest_timing_enter_irqoff();
> > +
> >  	for (;;) {
> >  		/*
> >  		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
> > @@ -9949,7 +9951,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	 * of accounting via context tracking, but the loss of accuracy is
> >  	 * acceptable for all known use cases.
> >  	 */
> > -	vtime_account_guest_exit();
> > +	guest_timing_exit_irqoff();
> >  
> >  	if (lapic_in_kernel(vcpu)) {
> >  		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
Sean Christopherson Jan. 14, 2022, 4:49 p.m. UTC | #3
On Fri, Jan 14, 2022, Mark Rutland wrote:
> I assume that other than the naming of the entry/exit functions you're happy
> with this patch?

Yep!
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5151efa424ac..af5d90de243f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3814,7 +3814,7 @@  static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned long vmcb_pa = svm->current_vmcb->pa;
 
-	kvm_guest_enter_irqoff();
+	exit_to_guest_mode();
 
 	if (sev_es_guest(vcpu->kvm)) {
 		__svm_sev_es_vcpu_run(vmcb_pa);
@@ -3834,7 +3834,7 @@  static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 		vmload(__sme_page_pa(sd->save_area));
 	}
 
-	kvm_guest_exit_irqoff();
+	enter_from_guest_mode();
 }
 
 static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0dbf94eb954f..3dd240ef6414 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6593,7 +6593,7 @@  static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 					struct vcpu_vmx *vmx)
 {
-	kvm_guest_enter_irqoff();
+	exit_to_guest_mode();
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
@@ -6609,7 +6609,7 @@  static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.cr2 = native_read_cr2();
 
-	kvm_guest_exit_irqoff();
+	enter_from_guest_mode();
 }
 
 static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e50e97ac4408..bd3873b90889 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9876,6 +9876,8 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		set_debugreg(0, 7);
 	}
 
+	guest_timing_enter_irqoff();
+
 	for (;;) {
 		/*
 		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
@@ -9949,7 +9951,7 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * of accounting via context tracking, but the loss of accuracy is
 	 * acceptable for all known use cases.
 	 */
-	vtime_account_guest_exit();
+	guest_timing_exit_irqoff();
 
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 4abcd8d9836d..8e50645ac740 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -10,51 +10,6 @@ 
 
 void kvm_spurious_fault(void);
 
-static __always_inline void kvm_guest_enter_irqoff(void)
-{
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	instrumentation_begin();
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	instrumentation_end();
-
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-}
-
-static __always_inline void kvm_guest_exit_irqoff(void)
-{
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * context_tracking_guest_exit() restores host context and reinstates
-	 * RCU if enabled and required.
-	 *
-	 * This needs to be done immediately after VM-Exit, before any code
-	 * that might contain tracepoints or call out to the greater world,
-	 * e.g. before x86_spec_ctrl_restore_host().
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	context_tracking_guest_exit();
-
-	instrumentation_begin();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-}
-
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed = (consistency_check);				\