diff mbox series

KVM: LAPIC: Periodically revaluate appropriate lapic_timer_advance_ns

Message ID 1565329531-12327-1-git-send-email-wanpengli@tencent.com (mailing list archive)
State New, archived
Headers show
Series KVM: LAPIC: Periodically revaluate appropriate lapic_timer_advance_ns | expand

Commit Message

Wanpeng Li Aug. 9, 2019, 5:45 a.m. UTC
From: Wanpeng Li <wanpengli@tencent.com>

Even if for realtime CPUs, cache line bounces, frequency scaling, presence 
of higher-priority RT tasks, etc can cause different response. These 
interferences should be considered and periodically revaluate whether 
or not the lapic_timer_advance_ns value is the best, do nothing if it is,
otherwise recaluate again. 

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 16 +++++++++++++++-
 arch/x86/kvm/lapic.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Aug. 9, 2019, 10:24 a.m. UTC | #1
On 09/08/19 07:45, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Even if for realtime CPUs, cache line bounces, frequency scaling, presence 
> of higher-priority RT tasks, etc can cause different response. These 
> interferences should be considered and periodically revaluate whether 
> or not the lapic_timer_advance_ns value is the best, do nothing if it is,
> otherwise recaluate again. 

How much fluctuation do you observe between different runs?

Paolo
Wanpeng Li Aug. 12, 2019, 9:06 a.m. UTC | #2
On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/08/19 07:45, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Even if for realtime CPUs, cache line bounces, frequency scaling, presence
> > of higher-priority RT tasks, etc can cause different response. These
> > interferences should be considered and periodically revaluate whether
> > or not the lapic_timer_advance_ns value is the best, do nothing if it is,
> > otherwise recaluate again.
>
> How much fluctuation do you observe between different runs?

Sometimes can ~1000 cycles after converting to guest tsc freq.

Regards,
Wanpeng Li
Paolo Bonzini Aug. 14, 2019, 12:50 p.m. UTC | #3
On 12/08/19 11:06, Wanpeng Li wrote:
> On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 09/08/19 07:45, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>
>>> Even if for realtime CPUs, cache line bounces, frequency scaling, presence
>>> of higher-priority RT tasks, etc can cause different response. These
>>> interferences should be considered and periodically revaluate whether
>>> or not the lapic_timer_advance_ns value is the best, do nothing if it is,
>>> otherwise recaluate again.
>>
>> How much fluctuation do you observe between different runs?
> 
> Sometimes can ~1000 cycles after converting to guest tsc freq.

Hmm, I wonder if we need some kind of continuous smoothing.  Something like

        if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) {
                /* no update for random fluctuations */
		return;
	}

        if (unlikely(timer_advance_ns > 5000))
                timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
        apic->lapic_timer.timer_advance_ns = timer_advance_ns;

and removing all the timer_advance_adjust_done stuff.  What do you think?

Paolo
Wanpeng Li Aug. 15, 2019, 4:04 a.m. UTC | #4
On Wed, 14 Aug 2019 at 20:50, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/08/19 11:06, Wanpeng Li wrote:
> > On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 09/08/19 07:45, Wanpeng Li wrote:
> >>> From: Wanpeng Li <wanpengli@tencent.com>
> >>>
> >>> Even if for realtime CPUs, cache line bounces, frequency scaling, presence
> >>> of higher-priority RT tasks, etc can cause different response. These
> >>> interferences should be considered and periodically revaluate whether
> >>> or not the lapic_timer_advance_ns value is the best, do nothing if it is,
> >>> otherwise recaluate again.
> >>
> >> How much fluctuation do you observe between different runs?
> >
> > Sometimes can ~1000 cycles after converting to guest tsc freq.
>
> Hmm, I wonder if we need some kind of continuous smoothing.  Something like

Actually this can fluctuate drastically instead of continuous
smoothing during testing (running linux guest instead of
kvm-unit-tests).

>
>         if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) {
>                 /* no update for random fluctuations */
>                 return;
>         }
>
>         if (unlikely(timer_advance_ns > 5000))
>                 timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
>         apic->lapic_timer.timer_advance_ns = timer_advance_ns;
>
> and removing all the timer_advance_adjust_done stuff.  What do you think?

I just sent out v2, periodically revaluate and get a minimal
conservative value from these revaluate points. Please have a look. :)

Regards,
Wanpeng Li
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index df5cd07..8b62008 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -69,6 +69,7 @@ 
 #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
 /* step-by-step approximation to mitigate fluctuation */
 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
+#define LAPIC_TIMER_ADVANCE_RECALC_PERIOD (600 * HZ)
 
 static inline int apic_test_vector(int vec, void *bitmap)
 {
@@ -1484,6 +1485,17 @@  static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
 	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
 	u64 ns;
 
+	/* periodic revaluate */
+	if (unlikely(apic->lapic_timer.timer_advance_adjust_done)) {
+		apic->lapic_timer.recalc_timer_advance_ns = jiffies +
+			LAPIC_TIMER_ADVANCE_RECALC_PERIOD;
+		if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_DONE) {
+			timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
+			apic->lapic_timer.timer_advance_adjust_done = false;
+		} else
+			return;
+	}
+
 	/* too early */
 	if (advance_expire_delta < 0) {
 		ns = -advance_expire_delta * 1000000ULL;
@@ -1523,7 +1535,8 @@  static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
-	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done) ||
+		time_before(apic->lapic_timer.recalc_timer_advance_ns, jiffies))
 		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
@@ -2301,6 +2314,7 @@  int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 	if (timer_advance_ns == -1) {
 		apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
 		apic->lapic_timer.timer_advance_adjust_done = false;
+		apic->lapic_timer.recalc_timer_advance_ns = jiffies;
 	} else {
 		apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 		apic->lapic_timer.timer_advance_adjust_done = true;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 50053d2..31ced36 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -31,6 +31,7 @@  struct kvm_timer {
 	u32 timer_mode_mask;
 	u64 tscdeadline;
 	u64 expired_tscdeadline;
+	unsigned long recalc_timer_advance_ns;
 	u32 timer_advance_ns;
 	s64 advance_expire_delta;
 	atomic_t pending;			/* accumulated triggered timers */