Message ID | 20190412201834.10831-2-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: lapic: Fix a variety of timer adv issues | expand |
> On 12 Apr 2019, at 23:18, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > To minimize the latency of timer interrupts as observed by the guest, > KVM adjusts the values it programs into the host timers to account for > the host's overhead of programming and handling the timer event. Now > that the timer advancement is automatically tuned during runtime, it's > effectively unbounded by default, e.g. if KVM is running as L1 the > advancement can measure in hundreds of milliseconds. > > Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated > advancement, as large advancements can break reasonable assumptions of > the guest, e.g. that a timer configured to fire after 1ms won't arrive > on the next instruction. Although KVM busy waits to mitigate the timer > event arriving too early, complications can arise when shifting the > interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail > when its "host" exits on interrupts (because the INTR is injected before > the gets executes STI+HLT). Arguably the unit test is "broken" in the > sense that delaying the timer interrupt by 1ms doesn't technically > guarantee the interrupt will arrive after STI+HLT, but it's a reasonable > assumption that KVM should support. > > Furthermore, an unbounded advancement also effectively unbounds the time > spent busy waiting, e.g. if the guest programs a timer with a very large > delay. > > Arguably the advancement logic could simply be disabled when running as > L1, but KVM needs to bound the advancement time regardless, e.g. if the > TSC is unstable and the calculations get wildly out of whack. And > allowing the advancement when running as L1 is a good stress test of > sorts for the logic. > > Cc: Liran Alon <liran.alon@oracle.com> > Cc: Wanpeng Li <wanpengli@tencent.com> > Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> I completely agree with the above. The below change also seems correct to me. Reviewed-by: Liran Alon <liran.alon@oracle.com> Having said that, I think we are starting to have too many hard-coded parameters for the algorithm adjusting lapic_timer_advance_ns. (LAPIC_TIMER_ADVANCE_ADJUST_DONE, LAPIC_TIMER_ADVANCE_ADJUST_STEP and now LAPIC_TIMER_ADVANCE_MAX_NS). We should consider changing them to KVM module parameters. In addition, this patch led me wondering what will happen when start_sw_tscdeadline() will sub lapic_timer_advance_ns from “expire” such that (expire < now). I think that because apic->lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it is not allowed to run in softirq and therefore will never expire. Thus, I think that start_sw_tscdeadline() should check if (expire < now) after sub of lapic_timer_advance_ns and if so call apic_timer_expired() instead of hrtimer_start(). I can submit such a patch if you agree. -Liran > --- > arch/x86/kvm/lapic.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 9bf70cf84564..92446cba9b24 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false; > #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 > /* step-by-step approximation to mitigate fluctuation */ > #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 > +#define LAPIC_TIMER_ADVANCE_MAX_NS 5000 > > static inline int apic_test_vector(int vec, void *bitmap) > { > @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > } > if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) > lapic_timer_advance_adjust_done = true; > + if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) { > + lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS; > + lapic_timer_advance_adjust_done = true; > + } > } > } > > -- > 2.21.0 >
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9bf70cf84564..92446cba9b24 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false; #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +#define LAPIC_TIMER_ADVANCE_MAX_NS 5000 static inline int apic_test_vector(int vec, void *bitmap) { @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) } if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) lapic_timer_advance_adjust_done = true; + if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) { + lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS; + lapic_timer_advance_adjust_done = true; + } } }
To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. Now that the timer advancement is automatically tuned during runtime, it's effectively unbounded by default, e.g. if KVM is running as L1 the advancement can measure in hundreds of milliseconds. Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated advancement, as large advancements can break reasonable assumptions of the guest, e.g. that a timer configured to fire after 1ms won't arrive on the next instruction. Although KVM busy waits to mitigate the timer event arriving too early, complications can arise when shifting the interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail when its "host" exits on interrupts (because the INTR is injected before the gets executes STI+HLT). Arguably the unit test is "broken" in the sense that delaying the timer interrupt by 1ms doesn't technically guarantee the interrupt will arrive after STI+HLT, but it's a reasonable assumption that KVM should support. Furthermore, an unbounded advancement also effectively unbounds the time spent busy waiting, e.g. if the guest programs a timer with a very large delay. Arguably the advancement logic could simply be disabled when running as L1, but KVM needs to bound the advancement time regardless, e.g. if the TSC is unstable and the calculations get wildly out of whack. And allowing the advancement when running as L1 is a good stress test of sorts for the logic. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/lapic.c | 5 +++++ 1 file changed, 5 insertions(+)