diff mbox

[v5,3/3] KVM: LAPIC: Apply change to TDCR right away to the timer

Message ID 1507214117-2899-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li Oct. 5, 2017, 2:35 p.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/lapic.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

Comments

Radim Krčmář Oct. 5, 2017, 6:14 p.m. UTC | #1
2017-10-05 07:35-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> The description in the Intel SDM of how the divide configuration
> register is used: "The APIC timer frequency will be the processor's bus
> clock or core crystal clock frequency divided by the value specified in
> the divide configuration register."
> 
> Observation of baremetal shown that when the TDCR is change, the TMCCT
> does not change or make a big jump in value, but the rate at which it
> count down change.
> 
> The patch update the emulation to APIC timer to so that a change to the
> divide configuration would be reflected in the value of the counter and
> when the next interrupt is triggered.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic)
>  		   ktime_to_ns(ktime_add_ns(now,
>  				apic->lapic_timer.period)));
>  
> +	delta = apic->lapic_timer.period;
> +	if (apic->divide_count != old_divisor) {

Hm, nothing should happen if the guest writes the same value TDCR, but
we'll reset the timer.  (An extra argument would solve it, but maybe it
would be nicer to add a new function for updating the expiration.)

> +		remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
> +		if (ktime_to_ns(remaining) < 0)
> +			remaining = 0;
> +		delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> +
> +		if (!delta)
> +			return false;
> +
> +		apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
> +			* APIC_BUS_CYCLE_NS * apic->divide_count;

I'd prefer to apply the rate limiting (done earlier in this function) to
the period.  This version allows the guest to configure 128 times more
frequent interrupts in the host.
(And thinking about it, the version of [2/3] I proposed has similar
 problem when switching from one-shot to periodic, only there it is
 unpredictably limited by the speed of KVM.)

Thanks.

> +		delta = delta * apic->divide_count / old_divisor;
> +	}
> +
>  	apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
> -		nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
> -	apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
> +		nsec_to_cycles(apic->vcpu, delta);
> +	apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
>  
>  	return true;
>  }
Wanpeng Li Oct. 5, 2017, 11:14 p.m. UTC | #2
2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-10-05 07:35-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The description in the Intel SDM of how the divide configuration
>> register is used: "The APIC timer frequency will be the processor's bus
>> clock or core crystal clock frequency divided by the value specified in
>> the divide configuration register."
>>
>> Observation of baremetal shown that when the TDCR is change, the TMCCT
>> does not change or make a big jump in value, but the rate at which it
>> count down change.
>>
>> The patch update the emulation to APIC timer to so that a change to the
>> divide configuration would be reflected in the value of the counter and
>> when the next interrupt is triggered.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic)
>>                  ktime_to_ns(ktime_add_ns(now,
>>                               apic->lapic_timer.period)));
>>
>> +     delta = apic->lapic_timer.period;
>> +     if (apic->divide_count != old_divisor) {
>
> Hm, nothing should happen if the guest writes the same value TDCR, but
> we'll reset the timer.  (An extra argument would solve it, but maybe it
> would be nicer to add a new function for updating the expiration.)

Agreed.

>
>> +             remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
>> +             if (ktime_to_ns(remaining) < 0)
>> +                     remaining = 0;
>> +             delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
>> +
>> +             if (!delta)
>> +                     return false;
>> +
>> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
>> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
>
> I'd prefer to apply the rate limiting (done earlier in this function) to
> the period.  This version allows the guest to configure 128 times more
> frequent interrupts in the host.
> (And thinking about it, the version of [2/3] I proposed has similar
>  problem when switching from one-shot to periodic, only there it is
>  unpredictably limited by the speed of KVM.)

We didn't stop and restart the timer, why the rate will influence us for [2/3]?

Regards,
Wanpeng Li

>
> Thanks.
>
>> +             delta = delta * apic->divide_count / old_divisor;
>> +     }
>> +
>>       apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
>> -             nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
>> -     apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
>> +             nsec_to_cycles(apic->vcpu, delta);
>> +     apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
>>
>>       return true;
>>  }
Wanpeng Li Oct. 6, 2017, 12:55 p.m. UTC | #3
2017-10-06 7:14 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> 2017-10-05 07:35-0700, Wanpeng Li:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> The description in the Intel SDM of how the divide configuration
>>> register is used: "The APIC timer frequency will be the processor's bus
>>> clock or core crystal clock frequency divided by the value specified in
>>> the divide configuration register."
>>>
>>> Observation of baremetal shown that when the TDCR is change, the TMCCT
>>> does not change or make a big jump in value, but the rate at which it
>>> count down change.
>>>
>>> The patch update the emulation to APIC timer to so that a change to the
>>> divide configuration would be reflected in the value of the counter and
>>> when the next interrupt is triggered.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic)
>>>                  ktime_to_ns(ktime_add_ns(now,
>>>                               apic->lapic_timer.period)));
>>>
>>> +     delta = apic->lapic_timer.period;
>>> +     if (apic->divide_count != old_divisor) {
>>
>> Hm, nothing should happen if the guest writes the same value TDCR, but
>> we'll reset the timer.  (An extra argument would solve it, but maybe it
>> would be nicer to add a new function for updating the expiration.)
>
> Agreed.
>
>>
>>> +             remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
>>> +             if (ktime_to_ns(remaining) < 0)
>>> +                     remaining = 0;
>>> +             delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
>>> +
>>> +             if (!delta)
>>> +                     return false;
>>> +
>>> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
>>> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
>>
>> I'd prefer to apply the rate limiting (done earlier in this function) to
>> the period.  This version allows the guest to configure 128 times more
>> frequent interrupts in the host.
>> (And thinking about it, the version of [2/3] I proposed has similar
>>  problem when switching from one-shot to periodic, only there it is
>>  unpredictably limited by the speed of KVM.)
>
> We didn't stop and restart the timer, why the rate will influence us for [2/3]?

Have already done in v6, please have a review. :)

Regards,
Wanpeng Li

>>
>> Thanks.
>>
>>> +             delta = delta * apic->divide_count / old_divisor;
>>> +     }
>>> +
>>>       apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
>>> -             nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
>>> -     apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
>>> +             nsec_to_cycles(apic->vcpu, delta);
>>> +     apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
>>>
>>>       return true;
>>>  }
Radim Krčmář Oct. 6, 2017, 1:03 p.m. UTC | #4
2017-10-06 07:14+0800, Wanpeng Li:
> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-10-05 07:35-0700, Wanpeng Li:
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >> +             remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
> >> +             if (ktime_to_ns(remaining) < 0)
> >> +                     remaining = 0;
> >> +             delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> >> +
> >> +             if (!delta)
> >> +                     return false;
> >> +
> >> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
> >> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
> >
> > I'd prefer to apply the rate limiting (done earlier in this function) to
> > the period.  This version allows the guest to configure 128 times more
> > frequent interrupts in the host.
> > (And thinking about it, the version of [2/3] I proposed has similar
> >  problem when switching from one-shot to periodic, only there it is
> >  unpredictably limited by the speed of KVM.)
> 
> We didn't stop and restart the timer, why the rate will influence us for [2/3]?

It is because of the rate limiting -- the guest could setup a one-shot
timer with a short expiration and switch to periodic

It is mostly theoretical as the expiration would have to be long enough
so that the timer doesn't fire before KVM emulates the next instruction
that switches the timer to periodic mode, but shorter than rate limit.

I see you handled that in v6, thanks!
Wanpeng Li Oct. 6, 2017, 2:03 p.m. UTC | #5
2017-10-06 21:03 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-10-06 07:14+0800, Wanpeng Li:
>> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-10-05 07:35-0700, Wanpeng Li:
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> +             remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
>> >> +             if (ktime_to_ns(remaining) < 0)
>> >> +                     remaining = 0;
>> >> +             delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
>> >> +
>> >> +             if (!delta)
>> >> +                     return false;
>> >> +
>> >> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
>> >> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
>> >
>> > I'd prefer to apply the rate limiting (done earlier in this function) to
>> > the period.  This version allows the guest to configure 128 times more
>> > frequent interrupts in the host.
>> > (And thinking about it, the version of [2/3] I proposed has similar
>> >  problem when switching from one-shot to periodic, only there it is
>> >  unpredictably limited by the speed of KVM.)
>>
>> We didn't stop and restart the timer, why the rate will influence us for [2/3]?
>
> It is because of the rate limiting -- the guest could setup a one-shot
> timer with a short expiration and switch to periodic

Yeah, in addition, I think configure 128 means more slower interrupts
instead of faster.

Regards,
Wanpeng Li

>
> It is mostly theoretical as the expiration would have to be long enough
> so that the timer doesn't fire before KVM emulates the next instruction
> that switches the timer to periodic mode, but shorter than rate limit.
>
> I see you handled that in v6, thanks!
Radim Krčmář Oct. 6, 2017, 2:20 p.m. UTC | #6
2017-10-06 22:03+0800, Wanpeng Li:
> 2017-10-06 21:03 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-10-06 07:14+0800, Wanpeng Li:
> >> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> > 2017-10-05 07:35-0700, Wanpeng Li:
> >> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >> >> +             remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
> >> >> +             if (ktime_to_ns(remaining) < 0)
> >> >> +                     remaining = 0;
> >> >> +             delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> >> >> +
> >> >> +             if (!delta)
> >> >> +                     return false;
> >> >> +
> >> >> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
> >> >> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
> >> >
> >> > I'd prefer to apply the rate limiting (done earlier in this function) to
> >> > the period.  This version allows the guest to configure 128 times more
> >> > frequent interrupts in the host.
> >> > (And thinking about it, the version of [2/3] I proposed has similar
> >> >  problem when switching from one-shot to periodic, only there it is
> >> >  unpredictably limited by the speed of KVM.)
> >>
> >> We didn't stop and restart the timer, why the rate will influence us for [2/3]?
> >
> > It is because of the rate limiting -- the guest could setup a one-shot
> > timer with a short expiration and switch to periodic
> 
> Yeah, in addition, I think configure 128 means more slower interrupts
> instead of faster.

Yes, it says how many cycles it takes to decrement APIC_TMCCT.

(I only concerned about the case where rate limit was configured with
 divide_count=128 and then switched to 1.)
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6b366c1..36f9bc8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1434,14 +1434,14 @@  static void start_sw_period(struct kvm_lapic *apic)
 		HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic)
+static bool set_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
 {
-	ktime_t now;
-	u64 tscl = rdtsc();
+	ktime_t now, remaining;
+	u64 tscl = rdtsc(), delta;
 
 	now = ktime_get();
 	apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
-		* APIC_BUS_CYCLE_NS * apic->divide_count;
+		* APIC_BUS_CYCLE_NS * old_divisor;
 
 	if (!apic->lapic_timer.period)
 		return false;
@@ -1474,9 +1474,24 @@  static bool set_target_expiration(struct kvm_lapic *apic)
 		   ktime_to_ns(ktime_add_ns(now,
 				apic->lapic_timer.period)));
 
+	delta = apic->lapic_timer.period;
+	if (apic->divide_count != old_divisor) {
+		remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+		if (ktime_to_ns(remaining) < 0)
+			remaining = 0;
+		delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
+
+		if (!delta)
+			return false;
+
+		apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+			* APIC_BUS_CYCLE_NS * apic->divide_count;
+		delta = delta * apic->divide_count / old_divisor;
+	}
+
 	apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
-		nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
-	apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
+		nsec_to_cycles(apic->vcpu, delta);
+	apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
 
 	return true;
 }
@@ -1613,12 +1628,12 @@  void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
 	restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic)
+static void start_apic_timer(struct kvm_lapic *apic, uint32_t old_divisor)
 {
 	atomic_set(&apic->lapic_timer.pending, 0);
 
 	if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-	    && !set_target_expiration(apic))
+	    && !set_target_expiration(apic, old_divisor))
 		return;
 
 	restart_apic_timer(apic);
@@ -1739,16 +1754,20 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 		hrtimer_cancel(&apic->lapic_timer.timer);
 		kvm_lapic_set_reg(apic, APIC_TMICT, val);
-		start_apic_timer(apic);
+		start_apic_timer(apic, apic->divide_count);
 		break;
 
-	case APIC_TDCR:
+	case APIC_TDCR: {
+		uint32_t current_divisor = apic->divide_count;
+
 		if (val & 4)
 			apic_debug("KVM_WRITE:TDCR %x\n", val);
 		kvm_lapic_set_reg(apic, APIC_TDCR, val);
 		update_divide_count(apic);
+		hrtimer_cancel(&apic->lapic_timer.timer);
+		start_apic_timer(apic, current_divisor);
 		break;
-
+	}
 	case APIC_ESR:
 		if (apic_x2apic_mode(apic) && val != 0) {
 			apic_debug("KVM_WRITE:ESR not zero %x\n", val);
@@ -1873,7 +1892,7 @@  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)
 
 	hrtimer_cancel(&apic->lapic_timer.timer);
 	apic->lapic_timer.tscdeadline = data;
-	start_apic_timer(apic);
+	start_apic_timer(apic, apic->divide_count);
 }
 
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2239,7 +2258,7 @@  int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	apic_update_lvtt(apic);
 	apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
 	update_divide_count(apic);
-	start_apic_timer(apic);
+	start_apic_timer(apic, apic->divide_count);
 	apic->irr_pending = true;
 	apic->isr_count = vcpu->arch.apicv_active ?
 				1 : count_vectors(apic->regs + APIC_ISR);