diff mbox series

[v3,1/9] KVM: lapic: Hard cap the auto-calculated timer advancement

Message ID 20190416203248.29429-2-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: lapic: Fix a variety of timer adv issues | expand

Commit Message

Sean Christopherson April 16, 2019, 8:32 p.m. UTC
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event.  Now
that the timer advancement is automatically tuned during runtime, it's
effectively unbounded by default, e.g. if KVM is running as L1 the
advancement can measure in hundreds of milliseconds.

Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated
advancement, as large advancements can break reasonable assumptions of
the guest, e.g. that a timer configured to fire after 1ms won't arrive
on the next instruction.  Although KVM busy waits to mitigate the timer
event arriving too early, complications can arise when shifting the
interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail
when its "host" exits on interrupts (because the INTR is injected before
the gets executes STI+HLT).  Arguably the unit test is "broken" in the
sense that delaying the timer interrupt by 1ms doesn't technically
guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
assumption that KVM should support.

Furthermore, an unbounded advancement also effectively unbounds the time
spent busy waiting, e.g. if the guest programs a timer with a very large
delay.

Arguably the advancement logic could simply be disabled when running as
L1, but KVM needs to bound the advancement time regardless, e.g. if the
TSC is unstable and the calculations get wildly out of whack.  And
allowing the advancement when running as L1 is a good stress test of
sorts for the logic.

Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/lapic.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Paolo Bonzini April 17, 2019, 12:57 p.m. UTC | #1
On 16/04/19 22:32, Sean Christopherson wrote:
> To minimize the latency of timer interrupts as observed by the guest,
> KVM adjusts the values it programs into the host timers to account for
> the host's overhead of programming and handling the timer event.  Now
> that the timer advancement is automatically tuned during runtime, it's
> effectively unbounded by default, e.g. if KVM is running as L1 the
> advancement can measure in hundreds of milliseconds.
> 
> Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated
> advancement, as large advancements can break reasonable assumptions of
> the guest, e.g. that a timer configured to fire after 1ms won't arrive
> on the next instruction.  Although KVM busy waits to mitigate the timer
> event arriving too early, complications can arise when shifting the
> interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail
> when its "host" exits on interrupts (because the INTR is injected before
> the gets executes STI+HLT).  Arguably the unit test is "broken" in the
> sense that delaying the timer interrupt by 1ms doesn't technically
> guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
> assumption that KVM should support.
> 
> Furthermore, an unbounded advancement also effectively unbounds the time
> spent busy waiting, e.g. if the guest programs a timer with a very large
> delay.
> 
> Arguably the advancement logic could simply be disabled when running as
> L1, but KVM needs to bound the advancement time regardless, e.g. if the
> TSC is unstable and the calculations get wildly out of whack.  And
> allowing the advancement when running as L1 is a good stress test of
> sorts for the logic.
> 
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/lapic.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9bf70cf84564..92446cba9b24 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false;
>  #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
>  /* step-by-step approximation to mitigate fluctuation */
>  #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
> +#define LAPIC_TIMER_ADVANCE_MAX_NS	5000
>  
>  static inline int apic_test_vector(int vec, void *bitmap)
>  {
> @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>  		}
>  		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
>  			lapic_timer_advance_adjust_done = true;
> +		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
> +			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
> +			lapic_timer_advance_adjust_done = true;
> +		}

I would treat this case as "advancing the timer has failed miserably"
and reset lapic_timer_advance_ns to 0.

Paolo
Sean Christopherson April 17, 2019, 2:34 p.m. UTC | #2
On Wed, Apr 17, 2019 at 02:57:55PM +0200, Paolo Bonzini wrote:
> On 16/04/19 22:32, Sean Christopherson wrote:
> > To minimize the latency of timer interrupts as observed by the guest,
> > KVM adjusts the values it programs into the host timers to account for
> > the host's overhead of programming and handling the timer event.  Now
> > that the timer advancement is automatically tuned during runtime, it's
> > effectively unbounded by default, e.g. if KVM is running as L1 the
> > advancement can measure in hundreds of milliseconds.
> > 
> > Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated
> > advancement, as large advancements can break reasonable assumptions of
> > the guest, e.g. that a timer configured to fire after 1ms won't arrive
> > on the next instruction.  Although KVM busy waits to mitigate the timer
> > event arriving too early, complications can arise when shifting the
> > interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail
> > when its "host" exits on interrupts (because the INTR is injected before
> > the gets executes STI+HLT).  Arguably the unit test is "broken" in the
> > sense that delaying the timer interrupt by 1ms doesn't technically
> > guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
> > assumption that KVM should support.
> > 
> > Furthermore, an unbounded advancement also effectively unbounds the time
> > spent busy waiting, e.g. if the guest programs a timer with a very large
> > delay.
> > 
> > Arguably the advancement logic could simply be disabled when running as
> > L1, but KVM needs to bound the advancement time regardless, e.g. if the
> > TSC is unstable and the calculations get wildly out of whack.  And
> > allowing the advancement when running as L1 is a good stress test of
> > sorts for the logic.
> > 
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Cc: Wanpeng Li <wanpengli@tencent.com>
> > Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/lapic.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 9bf70cf84564..92446cba9b24 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false;
> >  #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
> >  /* step-by-step approximation to mitigate fluctuation */
> >  #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
> > +#define LAPIC_TIMER_ADVANCE_MAX_NS	5000
> >  
> >  static inline int apic_test_vector(int vec, void *bitmap)
> >  {
> > @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
> >  		}
> >  		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
> >  			lapic_timer_advance_adjust_done = true;
> > +		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
> > +			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
> > +			lapic_timer_advance_adjust_done = true;
> > +		}
> 
> I would treat this case as "advancing the timer has failed miserably"
> and reset lapic_timer_advance_ns to 0.

As in, disable the feature, correct?  That works for me.
Paolo Bonzini April 17, 2019, 3 p.m. UTC | #3
On 17/04/19 16:34, Sean Christopherson wrote:
>>> @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>>>  		}
>>>  		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
>>>  			lapic_timer_advance_adjust_done = true;
>>> +		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
>>> +			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
>>> +			lapic_timer_advance_adjust_done = true;
>>> +		}
>> I would treat this case as "advancing the timer has failed miserably"
>> and reset lapic_timer_advance_ns to 0.
> As in, disable the feature, correct?  That works for me.

Yes.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9bf70cf84564..92446cba9b24 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -74,6 +74,7 @@  static bool lapic_timer_advance_adjust_done = false;
 #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
 /* step-by-step approximation to mitigate fluctuation */
 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
+#define LAPIC_TIMER_ADVANCE_MAX_NS	5000
 
 static inline int apic_test_vector(int vec, void *bitmap)
 {
@@ -1522,6 +1523,10 @@  void wait_lapic_expire(struct kvm_vcpu *vcpu)
 		}
 		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
 			lapic_timer_advance_adjust_done = true;
+		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
+			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
+			lapic_timer_advance_adjust_done = true;
+		}
 	}
 }