Message ID | 1565329531-12327-1-git-send-email-wanpengli@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: LAPIC: Periodically revaluate appropriate lapic_timer_advance_ns | expand |
On 09/08/19 07:45, Wanpeng Li wrote: > From: Wanpeng Li <wanpengli@tencent.com> > > Even if for realtime CPUs, cache line bounces, frequency scaling, presence > of higher-priority RT tasks, etc can cause different response. These > interferences should be considered and periodically revaluate whether > or not the lapic_timer_advance_ns value is the best, do nothing if it is, > otherwise recaluate again. How much fluctuation do you observe between different runs? Paolo
On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 09/08/19 07:45, Wanpeng Li wrote: > > From: Wanpeng Li <wanpengli@tencent.com> > > > > Even if for realtime CPUs, cache line bounces, frequency scaling, presence > > of higher-priority RT tasks, etc can cause different response. These > > interferences should be considered and periodically revaluate whether > > or not the lapic_timer_advance_ns value is the best, do nothing if it is, > > otherwise recaluate again. > > How much fluctuation do you observe between different runs? Sometimes can ~1000 cycles after converting to guest tsc freq. Regards, Wanpeng Li
On 12/08/19 11:06, Wanpeng Li wrote: > On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 09/08/19 07:45, Wanpeng Li wrote: >>> From: Wanpeng Li <wanpengli@tencent.com> >>> >>> Even if for realtime CPUs, cache line bounces, frequency scaling, presence >>> of higher-priority RT tasks, etc can cause different response. These >>> interferences should be considered and periodically revaluate whether >>> or not the lapic_timer_advance_ns value is the best, do nothing if it is, >>> otherwise recaluate again. >> >> How much fluctuation do you observe between different runs? > > Sometimes can ~1000 cycles after converting to guest tsc freq. Hmm, I wonder if we need some kind of continuous smoothing. Something like if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) { /* no update for random fluctuations */ return; } if (unlikely(timer_advance_ns > 5000)) timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; apic->lapic_timer.timer_advance_ns = timer_advance_ns; and removing all the timer_advance_adjust_done stuff. What do you think? Paolo
On Wed, 14 Aug 2019 at 20:50, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 12/08/19 11:06, Wanpeng Li wrote: > > On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > >> On 09/08/19 07:45, Wanpeng Li wrote: > >>> From: Wanpeng Li <wanpengli@tencent.com> > >>> > >>> Even if for realtime CPUs, cache line bounces, frequency scaling, presence > >>> of higher-priority RT tasks, etc can cause different response. These > >>> interferences should be considered and periodically revaluate whether > >>> or not the lapic_timer_advance_ns value is the best, do nothing if it is, > >>> otherwise recaluate again. > >> > >> How much fluctuation do you observe between different runs? > > > > Sometimes can ~1000 cycles after converting to guest tsc freq. > > Hmm, I wonder if we need some kind of continuous smoothing. Something like Actually this can fluctuate drastically instead of continuous smoothing during testing (running linux guest instead of kvm-unit-tests). > > if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) { > /* no update for random fluctuations */ > return; > } > > if (unlikely(timer_advance_ns > 5000)) > timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; > apic->lapic_timer.timer_advance_ns = timer_advance_ns; > > and removing all the timer_advance_adjust_done stuff. What do you think? I just sent out v2, periodically revaluate and get a minimal conservative value from these revaluate points. Please have a look. :) Regards, Wanpeng Li
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index df5cd07..8b62008 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -69,6 +69,7 @@ #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +#define LAPIC_TIMER_ADVANCE_RECALC_PERIOD (600 * HZ) static inline int apic_test_vector(int vec, void *bitmap) { @@ -1484,6 +1485,17 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu, u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns; u64 ns; + /* periodic revaluate */ + if (unlikely(apic->lapic_timer.timer_advance_adjust_done)) { + apic->lapic_timer.recalc_timer_advance_ns = jiffies + + LAPIC_TIMER_ADVANCE_RECALC_PERIOD; + if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_DONE) { + timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; + apic->lapic_timer.timer_advance_adjust_done = false; + } else + return; + } + /* too early */ if (advance_expire_delta < 0) { ns = -advance_expire_delta * 1000000ULL; @@ -1523,7 +1535,8 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) if (guest_tsc < tsc_deadline) __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc); - if (unlikely(!apic->lapic_timer.timer_advance_adjust_done)) + if (unlikely(!apic->lapic_timer.timer_advance_adjust_done) || + time_before(apic->lapic_timer.recalc_timer_advance_ns, jiffies)) adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta); } @@ -2301,6 +2314,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) if (timer_advance_ns == -1) { apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; apic->lapic_timer.timer_advance_adjust_done = false; + apic->lapic_timer.recalc_timer_advance_ns = jiffies; } else { apic->lapic_timer.timer_advance_ns = timer_advance_ns; apic->lapic_timer.timer_advance_adjust_done = true; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 50053d2..31ced36 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -31,6 +31,7 @@ struct kvm_timer { u32 timer_mode_mask; u64 tscdeadline; u64 expired_tscdeadline; + unsigned long recalc_timer_advance_ns; u32 timer_advance_ns; s64 advance_expire_delta; atomic_t pending; /* accumulated triggered timers */