diff mbox

Fix almost infinite loop in APIC

Message ID 200901201920.42519.sheng@linux.intel.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Sheng Yang Jan. 20, 2009, 11:20 a.m. UTC
On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> Marcelo Tosatti wrote:
> > On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>> +	 * Since reinjection is not rate-limited, use the delay
> >>> + 	 * to inject the last interrupt as an estimate.
> >>> + 	 */
> >>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>> +		remaining = apic->timer.injection_delay;
> >>> +		if (ktime_to_ns(remaining) > apic->timer.period)
> >>> +			remaining = ns_to_ktime(apic->timer.period);
> >>> +        } else
> >>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>
> >> A little doubt...
> >>
> >> A: time_fire
> >> B: intr_post
> >> C: read TMCCT
> >>
> >> The sequence can be ABC or ACB.
> >>
> >> injection_delay = time(B) - time(A)
> >>
> >> So it didn't count time from read TMCCT... And guest get interrupt at
> >> time(B), not quite understand why time(B) - time(A) matters here...
> >
> > Its an estimate of the delay it takes to inject an interrupt to the
> > guest once fired. Its only used if there have been accumulated ones, so
> > ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >
> >> I think the reasonable here means, this interval is usable later after
> >> the accumulated interrupts are injected. From this point of view, I
> >> think current solution is reasonable. It just assume the delayed
> >> interrupts have been injected.
> >>
> >> However, seriously, any value here is wrong, no elegant one.
> >
> > I agree.
> >
> >> But I still prefer to the current solution...
> >
> > Why? The proposed version is smaller and simpler than the current
> > one IMO, and also not vulnerable to whatever bug is causing now <
> > last_update.
> >
> > And more precise, since the current scheme uses interrupt injection time
> > as if it was "count-down restart" time, which is not true.
> >
> >> And here is not the really problem for now I think. The current
> >> mechanism is mostly OK, but where is the bug... We have either have a
> >> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
> >> it worth a try? Anyway, it would return a buggy result if we have
> >> pending interrupts...
> >
> > The overflow calculation is not necessary as discussed. Alexander, can
> > you please try the following? Sheng, do you have any other suggestion to
> > test?
> >
> > /proc/timer_list output of the host when ESX is running too.
>
> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> after the issue occured. I left the warning message in that I put there
> in case the value is too high:
>
>         counter_passed = div64_u64(ktime_to_ns(passed),
>                                    (APIC_BUS_CYCLE_NS *
> apic->timer.divide_count));
>
> +       if (counter_passed > 0x7f00000000000000) {
> +               /* If we're in here we probably encountered a bug! */
> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> ktime_to_ns(passed), passed, now.tv64, apic
> ->timer.last_update.tv64);
> +       }
> +
>         if (counter_passed > tmcct) {
>                 if (unlikely(!apic_lvtt_period(apic))) {
>                         /* one-shot timers stick at 0 until reset */
>
>
>
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> last_update = 1014860818546063 now = 1014860818426085
> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>

Thanks, Alex! 

When update time of last_update is 0x39B02B29428E5, and the start_apic_timer 
at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like 
a window here?:

> static void start_apic_timer(struct kvm_lapic *apic)
> {
> 	ktime_t now = apic->timer.dev.base->get_time();
>
> 	apic->timer.last_update = now;

[window?]
>
> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
> 	atomic_set(&apic->timer.pending, 0);
>
> 	if (!apic->timer.period)
> 		return;
>
> 	hrtimer_start(&apic->timer.dev,
> 		      ktime_add_ns(now, apic->timer.period),
> 		      HRTIMER_MODE_ABS);
>
> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
> 			   PRIx64 ", "
> 			   "timer initial count 0x%x, period %lldns, "
> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
> 			   apic_get_reg(apic, APIC_TMICT),
> 			   apic->timer.period,
> 			   ktime_to_ns(ktime_add_ns(now,
> 					apic->timer.period)));
> }

Looks like due to the period is reduced, but the last_update use old value and 
got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of 
vcpu B happened at vcpu A? Both seems unreasonable...

Alex, can you help to add another line above to confirm the problem? e.g.


Also add a similar one at the beginning of print_last_update_record().

Thanks! (And thanks for Marcelo's excellent debug patch! :) )

Comments

Alexander Graf Jan. 20, 2009, 12:09 p.m. UTC | #1
Sheng Yang wrote:
> On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
>>>       
>>>>> +	 * Since reinjection is not rate-limited, use the delay
>>>>> + 	 * to inject the last interrupt as an estimate.
>>>>> + 	 */
>>>>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
>>>>> +		remaining = apic->timer.injection_delay;
>>>>> +		if (ktime_to_ns(remaining) > apic->timer.period)
>>>>> +			remaining = ns_to_ktime(apic->timer.period);
>>>>> +        } else
>>>>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
>>>>>           
>>>> A little doubt...
>>>>
>>>> A: time_fire
>>>> B: intr_post
>>>> C: read TMCCT
>>>>
>>>> The sequence can be ABC or ACB.
>>>>
>>>> injection_delay = time(B) - time(A)
>>>>
>>>> So it didn't count time from read TMCCT... And guest get interrupt at
>>>> time(B), not quite understand why time(B) - time(A) matters here...
>>>>         
>>> Its an estimate of the delay it takes to inject an interrupt to the
>>> guest once fired. Its only used if there have been accumulated ones, so
>>> ACB sequence with pending=0 will use hrtimer_expires_remaining().
>>>
>>>       
>>>> I think the reasonable here means, this interval is usable later after
>>>> the accumulated interrupts are injected. From this point of view, I
>>>> think current solution is reasonable. It just assume the delayed
>>>> interrupts have been injected.
>>>>
>>>> However, seriously, any value here is wrong, no elegant one.
>>>>         
>>> I agree.
>>>
>>>       
>>>> But I still prefer to the current solution...
>>>>         
>>> Why? The proposed version is smaller and simpler than the current
>>> one IMO, and also not vulnerable to whatever bug is causing now <
>>> last_update.
>>>
>>> And more precise, since the current scheme uses interrupt injection time
>>> as if it was "count-down restart" time, which is not true.
>>>
>>>       
>>>> And here is not the really problem for now I think. The current
>>>> mechanism is mostly OK, but where is the bug... We have either have a
>>>> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
>>>> it worth a try? Anyway, it would return a buggy result if we have
>>>> pending interrupts...
>>>>         
>>> The overflow calculation is not necessary as discussed. Alexander, can
>>> you please try the following? Sheng, do you have any other suggestion to
>>> test?
>>>
>>> /proc/timer_list output of the host when ESX is running too.
>>>       
>> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
>> after the issue occured. I left the warning message in that I put there
>> in case the value is too high:
>>
>>         counter_passed = div64_u64(ktime_to_ns(passed),
>>                                    (APIC_BUS_CYCLE_NS *
>> apic->timer.divide_count));
>>
>> +       if (counter_passed > 0x7f00000000000000) {
>> +               /* If we're in here we probably encountered a bug! */
>> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
>> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
>> ktime_to_ns(passed), passed, now.tv64, apic
>> ->timer.last_update.tv64);
>> +       }
>> +
>>         if (counter_passed > tmcct) {
>>                 if (unlikely(!apic_lvtt_period(apic))) {
>>                         /* one-shot timers stick at 0 until reset */
>>
>>
>>
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
>> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
>> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
>> last_update = 1014860818546063 now = 1014860818426085
>> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
>> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
>> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
>> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
>> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
>> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
>> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
>> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
>> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
>> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
>> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
>> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
>> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
>> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
>> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
>> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
>> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
>> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
>> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
>> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
>> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
>> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
>> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
>> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
>> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
>> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
>> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
>> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
>> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
>> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
>> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
>> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
>> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
>> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
>> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
>> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
>> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
>> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
>> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
>> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
>> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
>> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
>> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
>> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
>> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
>> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
>> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
>> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
>> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
>> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
>> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
>> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
>> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
>> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>>
>>     
>
> Thanks, Alex! 
>
> When update time of last_update is 0x39B02B29428E5, and the start_apic_timer 
> at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like 
> a window here?:
>
>   
>> static void start_apic_timer(struct kvm_lapic *apic)
>> {
>> 	ktime_t now = apic->timer.dev.base->get_time();
>>
>> 	apic->timer.last_update = now;
>>     
>
> [window?]
>   
>> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>> 	atomic_set(&apic->timer.pending, 0);
>>
>> 	if (!apic->timer.period)
>> 		return;
>>
>> 	hrtimer_start(&apic->timer.dev,
>> 		      ktime_add_ns(now, apic->timer.period),
>> 		      HRTIMER_MODE_ABS);
>>
>> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
>> 			   PRIx64 ", "
>> 			   "timer initial count 0x%x, period %lldns, "
>> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
>> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
>> 			   apic_get_reg(apic, APIC_TMICT),
>> 			   apic->timer.period,
>> 			   ktime_to_ns(ktime_add_ns(now,
>> 					apic->timer.period)));
>> }
>>     
>
> Looks like due to the period is reduced, but the last_update use old value and 
> got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of 
> vcpu B happened at vcpu A? Both seems unreasonable...
>
> Alex, can you help to add another line above to confirm the problem? e.g.
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..15f8ed5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
>  {
>         ktime_t now = apic->timer.dev.base->get_time();
>
> +       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
> +
>         apic->timer.last_update = now;
>
>         apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>
> Also add a similar one at the beginning of print_last_update_record().
>
> Thanks! (And thanks for Marcelo's excellent debug patch! :) )
>
>   

The VMware ESX VM is UP, so I don't think we'll get anything from the
vcpu_id apart from 0 :-). Nevertheless here's the dmesg output:

enter start_apic_timer! vcpu 0
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer
initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26.
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer
initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62.
last_update = 1020361050214818 now = 1020361050178604
enter print_last_update_record! vcpu 0
rec[5] when=1020361043617361 last_update=1020361043746818 pend=0
rec[4] when=1020361050088268 last_update=1020361050214818 pend=0
rec[3] when=1020361049970143 last_update=1020361050082818 pend=0
rec[2] when=1020361049822220 last_update=1020361049950818 pend=0
rec[1] when=1020361049690023 last_update=1020361049818818 pend=0
rec[0] when=1020361049559209 last_update=1020361049686818 pend=0
rec[49] when=1020361049427469 last_update=1020361049554818 pend=0
rec[48] when=1020361049294815 last_update=1020361049422818 pend=0
rec[47] when=1020361049162401 last_update=1020361049290818 pend=0
rec[46] when=1020361049030431 last_update=1020361049158818 pend=0
rec[45] when=1020361048898352 last_update=1020361049026818 pend=0
rec[44] when=1020361048765413 last_update=1020361048894818 pend=0
rec[43] when=1020361048634310 last_update=1020361048762818 pend=0
rec[42] when=1020361048501343 last_update=1020361048630818 pend=0
rec[41] when=1020361048369388 last_update=1020361048498818 pend=0
rec[40] when=1020361048241616 last_update=1020361048366818 pend=0
rec[39] when=1020361048108017 last_update=1020361048234818 pend=0
rec[38] when=1020361047973826 last_update=1020361048102818 pend=0
rec[37] when=1020361047841371 last_update=1020361047970818 pend=0
rec[36] when=1020361047709322 last_update=1020361047838818 pend=0
rec[35] when=1020361047577366 last_update=1020361047706818 pend=0
rec[34] when=1020361047445316 last_update=1020361047574818 pend=0
rec[33] when=1020361047313423 last_update=1020361047442818 pend=0
rec[32] when=1020361047181310 last_update=1020361047310818 pend=0
rec[31] when=1020361047049990 last_update=1020361047178818 pend=0
rec[30] when=1020361046917375 last_update=1020361047046818 pend=0
rec[29] when=1020361046785332 last_update=1020361046914818 pend=0
rec[28] when=1020361046653374 last_update=1020361046782818 pend=0
rec[27] when=1020361046521325 last_update=1020361046650818 pend=0
rec[26] when=1020361046389374 last_update=1020361046518818 pend=0
rec[25] when=1020361046257989 last_update=1020361046386818 pend=0
rec[24] when=1020361046125335 last_update=1020361046254818 pend=0
rec[23] when=1020361045993393 last_update=1020361046122818 pend=0
rec[22] when=1020361045861330 last_update=1020361045990818 pend=0
rec[21] when=1020361045729389 last_update=1020361045858818 pend=0
rec[20] when=1020361045602604 last_update=1020361045726818 pend=0
rec[19] when=1020361045471032 last_update=1020361045594818 pend=0
rec[18] when=1020361045333385 last_update=1020361045462818 pend=0
rec[17] when=1020361045201397 last_update=1020361045330818 pend=0
rec[16] when=1020361045074634 last_update=1020361045198818 pend=0
rec[15] when=1020361044941707 last_update=1020361045066818 pend=0
rec[14] when=1020361044805373 last_update=1020361044934818 pend=0
rec[13] when=1020361044673417 last_update=1020361044802818 pend=0
rec[12] when=1020361044547848 last_update=1020361044670818 pend=0
rec[11] when=1020361044409439 last_update=1020361044538818 pend=0
rec[10] when=1020361044277316 last_update=1020361044406818 pend=0
rec[9] when=1020361044150456 last_update=1020361044274818 pend=0
rec[8] when=1020361044013204 last_update=1020361044142818 pend=0
rec[7] when=1020361043881203 last_update=1020361044010818 pend=0
rec[6] when=1020361043749913 last_update=1020361043878818 pend=0
LAPIC: Too high counter_passed value: 0x7fffffffffff7289 |
0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2)
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer
initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sheng Yang Jan. 20, 2009, 12:30 p.m. UTC | #2
On Tuesday 20 January 2009 20:09:23 Alexander Graf wrote:
> Sheng Yang wrote:
> > On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
> >>>>> +	 * Since reinjection is not rate-limited, use the delay
> >>>>> + 	 * to inject the last interrupt as an estimate.
> >>>>> + 	 */
> >>>>> +	if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
> >>>>> +		remaining = apic->timer.injection_delay;
> >>>>> +		if (ktime_to_ns(remaining) > apic->timer.period)
> >>>>> +			remaining = ns_to_ktime(apic->timer.period);
> >>>>> +        } else
> >>>>> +		remaining = hrtimer_expires_remaining(&apic->timer.dev);
> >>>>
> >>>> A little doubt...
> >>>>
> >>>> A: time_fire
> >>>> B: intr_post
> >>>> C: read TMCCT
> >>>>
> >>>> The sequence can be ABC or ACB.
> >>>>
> >>>> injection_delay = time(B) - time(A)
> >>>>
> >>>> So it didn't count time from read TMCCT... And guest get interrupt at
> >>>> time(B), not quite understand why time(B) - time(A) matters here...
> >>>
> >>> Its an estimate of the delay it takes to inject an interrupt to the
> >>> guest once fired. Its only used if there have been accumulated ones, so
> >>> ACB sequence with pending=0 will use hrtimer_expires_remaining().
> >>>
> >>>> I think the reasonable here means, this interval is usable later after
> >>>> the accumulated interrupts are injected. From this point of view, I
> >>>> think current solution is reasonable. It just assume the delayed
> >>>> interrupts have been injected.
> >>>>
> >>>> However, seriously, any value here is wrong, no elegant one.
> >>>
> >>> I agree.
> >>>
> >>>> But I still prefer to the current solution...
> >>>
> >>> Why? The proposed version is smaller and simpler than the current
> >>> one IMO, and also not vulnerable to whatever bug is causing now <
> >>> last_update.
> >>>
> >>> And more precise, since the current scheme uses interrupt injection
> >>> time as if it was "count-down restart" time, which is not true.
> >>>
> >>>> And here is not the really problem for now I think. The current
> >>>> mechanism is mostly OK, but where is the bug... We have either have a
> >>>> simple fix(e.g. if now < last_update, then return 0) or dig into it.
> >>>> Did it worth a try? Anyway, it would return a buggy result if we have
> >>>> pending interrupts...
> >>>
> >>> The overflow calculation is not necessary as discussed. Alexander, can
> >>> you please try the following? Sheng, do you have any other suggestion
> >>> to test?
> >>>
> >>> /proc/timer_list output of the host when ESX is running too.
> >>
> >> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
> >> after the issue occured. I left the warning message in that I put there
> >> in case the value is too high:
> >>
> >>         counter_passed = div64_u64(ktime_to_ns(passed),
> >>                                    (APIC_BUS_CYCLE_NS *
> >> apic->timer.divide_count));
> >>
> >> +       if (counter_passed > 0x7f00000000000000) {
> >> +               /* If we're in here we probably encountered a bug! */
> >> +               printk(KERN_INFO "LAPIC: Too high counter_passed value:
> >> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
> >> ktime_to_ns(passed), passed, now.tv64, apic
> >> ->timer.last_update.tv64);
> >> +       }
> >> +
> >>         if (counter_passed > tmcct) {
> >>                 if (unlikely(!apic_lvtt_period(apic))) {
> >>                         /* one-shot timers stick at 0 until reset */
> >>
> >>
> >>
> >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
> >> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
> >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
> >> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
> >> last_update = 1014860818546063 now = 1014860818426085
> >> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
> >> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
> >> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
> >> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
> >> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
> >> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
> >> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
> >> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
> >> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
> >> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
> >> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
> >> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
> >> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
> >> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
> >> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
> >> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
> >> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
> >> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
> >> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
> >> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
> >> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
> >> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
> >> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
> >> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
> >> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
> >> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
> >> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
> >> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
> >> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
> >> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
> >> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
> >> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
> >> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
> >> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
> >> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
> >> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
> >> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
> >> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
> >> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
> >> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
> >> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
> >> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
> >> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
> >> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
> >> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
> >> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
> >> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
> >> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
> >> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
> >> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
> >> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
> >> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5
> >> (0x39b02b295fd8f) start_apic_timer: bus cycle is 1ns, now
> >> 0x00039b02b2998ada, timer initial count 0x8400, period 33792ns, expire @
> >> 0x00039b02b29a0eda.
> >
> > Thanks, Alex!
> >
> > When update time of last_update is 0x39B02B29428E5, and the
> > start_apic_timer at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757,
> > very close. Looks like
> >
> > a window here?:
> >> static void start_apic_timer(struct kvm_lapic *apic)
> >> {
> >> 	ktime_t now = apic->timer.dev.base->get_time();
> >>
> >> 	apic->timer.last_update = now;
> >
> > [window?]
> >
> >> 	apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> >> 		    APIC_BUS_CYCLE_NS * apic->timer.divide_count;
> >> 	atomic_set(&apic->timer.pending, 0);
> >>
> >> 	if (!apic->timer.period)
> >> 		return;
> >>
> >> 	hrtimer_start(&apic->timer.dev,
> >> 		      ktime_add_ns(now, apic->timer.period),
> >> 		      HRTIMER_MODE_ABS);
> >>
> >> 	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
> >> 			   PRIx64 ", "
> >> 			   "timer initial count 0x%x, period %lldns, "
> >> 			   "expire @ 0x%016" PRIx64 ".\n", __func__,
> >> 			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
> >> 			   apic_get_reg(apic, APIC_TMICT),
> >> 			   apic->timer.period,
> >> 			   ktime_to_ns(ktime_add_ns(now,
> >> 					apic->timer.period)));
> >> }
> >
> > Looks like due to the period is reduced, but the last_update use old
> > value and got just updated... But guest vcpu B program vcpu A's lapic? Or
> > intr_post of vcpu B happened at vcpu A? Both seems unreasonable...
> >
> > Alex, can you help to add another line above to confirm the problem? e.g.
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index afac68c..15f8ed5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
> >  {
> >         ktime_t now = apic->timer.dev.base->get_time();
> >
> > +       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
> > +
> >         apic->timer.last_update = now;
> >
> >         apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
> >
> > Also add a similar one at the beginning of print_last_update_record().
> >
> > Thanks! (And thanks for Marcelo's excellent debug patch! :) )
>
> The VMware ESX VM is UP, so I don't think we'll get anything from the
> vcpu_id apart from 0 :-). Nevertheless here's the dmesg output:
>
> enter start_apic_timer! vcpu 0
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer
> initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26.
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer
> initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62.
> last_update = 1020361050214818 now = 1020361050178604
> enter print_last_update_record! vcpu 0
> rec[5] when=1020361043617361 last_update=1020361043746818 pend=0
> rec[4] when=1020361050088268 last_update=1020361050214818 pend=0
> rec[3] when=1020361049970143 last_update=1020361050082818 pend=0
> rec[2] when=1020361049822220 last_update=1020361049950818 pend=0
> rec[1] when=1020361049690023 last_update=1020361049818818 pend=0
> rec[0] when=1020361049559209 last_update=1020361049686818 pend=0
> rec[49] when=1020361049427469 last_update=1020361049554818 pend=0
> rec[48] when=1020361049294815 last_update=1020361049422818 pend=0
> rec[47] when=1020361049162401 last_update=1020361049290818 pend=0
> rec[46] when=1020361049030431 last_update=1020361049158818 pend=0
> rec[45] when=1020361048898352 last_update=1020361049026818 pend=0
> rec[44] when=1020361048765413 last_update=1020361048894818 pend=0
> rec[43] when=1020361048634310 last_update=1020361048762818 pend=0
> rec[42] when=1020361048501343 last_update=1020361048630818 pend=0
> rec[41] when=1020361048369388 last_update=1020361048498818 pend=0
> rec[40] when=1020361048241616 last_update=1020361048366818 pend=0
> rec[39] when=1020361048108017 last_update=1020361048234818 pend=0
> rec[38] when=1020361047973826 last_update=1020361048102818 pend=0
> rec[37] when=1020361047841371 last_update=1020361047970818 pend=0
> rec[36] when=1020361047709322 last_update=1020361047838818 pend=0
> rec[35] when=1020361047577366 last_update=1020361047706818 pend=0
> rec[34] when=1020361047445316 last_update=1020361047574818 pend=0
> rec[33] when=1020361047313423 last_update=1020361047442818 pend=0
> rec[32] when=1020361047181310 last_update=1020361047310818 pend=0
> rec[31] when=1020361047049990 last_update=1020361047178818 pend=0
> rec[30] when=1020361046917375 last_update=1020361047046818 pend=0
> rec[29] when=1020361046785332 last_update=1020361046914818 pend=0
> rec[28] when=1020361046653374 last_update=1020361046782818 pend=0
> rec[27] when=1020361046521325 last_update=1020361046650818 pend=0
> rec[26] when=1020361046389374 last_update=1020361046518818 pend=0
> rec[25] when=1020361046257989 last_update=1020361046386818 pend=0
> rec[24] when=1020361046125335 last_update=1020361046254818 pend=0
> rec[23] when=1020361045993393 last_update=1020361046122818 pend=0
> rec[22] when=1020361045861330 last_update=1020361045990818 pend=0
> rec[21] when=1020361045729389 last_update=1020361045858818 pend=0
> rec[20] when=1020361045602604 last_update=1020361045726818 pend=0
> rec[19] when=1020361045471032 last_update=1020361045594818 pend=0
> rec[18] when=1020361045333385 last_update=1020361045462818 pend=0
> rec[17] when=1020361045201397 last_update=1020361045330818 pend=0
> rec[16] when=1020361045074634 last_update=1020361045198818 pend=0
> rec[15] when=1020361044941707 last_update=1020361045066818 pend=0
> rec[14] when=1020361044805373 last_update=1020361044934818 pend=0
> rec[13] when=1020361044673417 last_update=1020361044802818 pend=0
> rec[12] when=1020361044547848 last_update=1020361044670818 pend=0
> rec[11] when=1020361044409439 last_update=1020361044538818 pend=0
> rec[10] when=1020361044277316 last_update=1020361044406818 pend=0
> rec[9] when=1020361044150456 last_update=1020361044274818 pend=0
> rec[8] when=1020361044013204 last_update=1020361044142818 pend=0
> rec[7] when=1020361043881203 last_update=1020361044010818 pend=0
> rec[6] when=1020361043749913 last_update=1020361043878818 pend=0
> LAPIC: Too high counter_passed value: 0x7fffffffffff7289 |
> 0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2)
> enter start_apic_timer! vcpu 0
> start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer
> initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454.

Oh, you don't need to. :) I thought you are running on MP system... OK, seems 
it would become more easier. :)

Checked the data again, seems last_update is always ahead of "when", at about 
one period. That's not desired. Last_update should delay about one period...

Continue to check the code...
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..15f8ed5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -653,6 +653,8 @@  static void start_apic_timer(struct kvm_lapic *apic)
 {
        ktime_t now = apic->timer.dev.base->get_time();

+       printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
+
        apic->timer.last_update = now;

        apic->timer.period = apic_get_reg(apic, APIC_TMICT) *