diff mbox series

[1/7] KVM: lapic: Hard cap the auto-calculated timer advancement

Message ID 20190412201834.10831-2-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: lapic: Fix a variety of timer adv issues | expand

Commit Message

Sean Christopherson April 12, 2019, 8:18 p.m. UTC
To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event.  Now
that the timer advancement is automatically tuned during runtime, it's
effectively unbounded by default, e.g. if KVM is running as L1 the
advancement can measure in hundreds of milliseconds.

Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated
advancement, as large advancements can break reasonable assumptions of
the guest, e.g. that a timer configured to fire after 1ms won't arrive
on the next instruction.  Although KVM busy waits to mitigate the timer
event arriving too early, complications can arise when shifting the
interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail
when its "host" exits on interrupts (because the INTR is injected before
the gets executes STI+HLT).  Arguably the unit test is "broken" in the
sense that delaying the timer interrupt by 1ms doesn't technically
guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
assumption that KVM should support.

Furthermore, an unbounded advancement also effectively unbounds the time
spent busy waiting, e.g. if the guest programs a timer with a very large
delay.

Arguably the advancement logic could simply be disabled when running as
L1, but KVM needs to bound the advancement time regardless, e.g. if the
TSC is unstable and the calculations get wildly out of whack.  And
allowing the advancement when running as L1 is a good stress test of
sorts for the logic.

Cc: Liran Alon <liran.alon@oracle.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/lapic.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Liran Alon April 14, 2019, 10:22 a.m. UTC | #1
> On 12 Apr 2019, at 23:18, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> To minimize the latency of timer interrupts as observed by the guest,
> KVM adjusts the values it programs into the host timers to account for
> the host's overhead of programming and handling the timer event.  Now
> that the timer advancement is automatically tuned during runtime, it's
> effectively unbounded by default, e.g. if KVM is running as L1 the
> advancement can measure in hundreds of milliseconds.
> 
> Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated
> advancement, as large advancements can break reasonable assumptions of
> the guest, e.g. that a timer configured to fire after 1ms won't arrive
> on the next instruction.  Although KVM busy waits to mitigate the timer
> event arriving too early, complications can arise when shifting the
> interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail
> when its "host" exits on interrupts (because the INTR is injected before
> the gets executes STI+HLT).  Arguably the unit test is "broken" in the
> sense that delaying the timer interrupt by 1ms doesn't technically
> guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
> assumption that KVM should support.
> 
> Furthermore, an unbounded advancement also effectively unbounds the time
> spent busy waiting, e.g. if the guest programs a timer with a very large
> delay.
> 
> Arguably the advancement logic could simply be disabled when running as
> L1, but KVM needs to bound the advancement time regardless, e.g. if the
> TSC is unstable and the calculations get wildly out of whack.  And
> allowing the advancement when running as L1 is a good stress test of
> sorts for the logic.
> 
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

I completely agree with the above. The below change also seems correct to me.
Reviewed-by: Liran Alon <liran.alon@oracle.com>

Having said that, I think we are starting to have too many hard-coded parameters for the algorithm adjusting lapic_timer_advance_ns.
(LAPIC_TIMER_ADVANCE_ADJUST_DONE, LAPIC_TIMER_ADVANCE_ADJUST_STEP and now LAPIC_TIMER_ADVANCE_MAX_NS).
We should consider changing them to KVM module parameters.

In addition, this patch led me wondering what will happen when start_sw_tscdeadline() will sub lapic_timer_advance_ns from “expire” such that (expire < now).
I think that because apic->lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it is not allowed to run in softirq and therefore will never expire.
Thus, I think that start_sw_tscdeadline() should check if (expire < now) after sub of lapic_timer_advance_ns and if so call apic_timer_expired() instead of hrtimer_start().
I can submit such a patch if you agree.

-Liran

> ---
> arch/x86/kvm/lapic.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9bf70cf84564..92446cba9b24 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false;
> #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
> /* step-by-step approximation to mitigate fluctuation */
> #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
> +#define LAPIC_TIMER_ADVANCE_MAX_NS	5000
> 
> static inline int apic_test_vector(int vec, void *bitmap)
> {
> @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
> 		}
> 		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
> 			lapic_timer_advance_adjust_done = true;
> +		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
> +			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
> +			lapic_timer_advance_adjust_done = true;
> +		}
> 	}
> }
> 
> -- 
> 2.21.0
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9bf70cf84564..92446cba9b24 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -74,6 +74,7 @@  static bool lapic_timer_advance_adjust_done = false;
 #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
 /* step-by-step approximation to mitigate fluctuation */
 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
+#define LAPIC_TIMER_ADVANCE_MAX_NS	5000
 
 static inline int apic_test_vector(int vec, void *bitmap)
 {
@@ -1522,6 +1523,10 @@  void wait_lapic_expire(struct kvm_vcpu *vcpu)
 		}
 		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
 			lapic_timer_advance_adjust_done = true;
+		if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) {
+			lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS;
+			lapic_timer_advance_adjust_done = true;
+		}
 	}
 }