Message ID | 200901201920.42519.sheng@linux.intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Sheng Yang wrote: > On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote: > >> Marcelo Tosatti wrote: >> >>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote: >>> >>>>> + * Since reinjection is not rate-limited, use the delay >>>>> + * to inject the last interrupt as an estimate. >>>>> + */ >>>>> + if (unlikely(atomic_read(&apic->timer.pending) > 0)) { >>>>> + remaining = apic->timer.injection_delay; >>>>> + if (ktime_to_ns(remaining) > apic->timer.period) >>>>> + remaining = ns_to_ktime(apic->timer.period); >>>>> + } else >>>>> + remaining = hrtimer_expires_remaining(&apic->timer.dev); >>>>> >>>> A little doubt... >>>> >>>> A: time_fire >>>> B: intr_post >>>> C: read TMCCT >>>> >>>> The sequence can be ABC or ACB. >>>> >>>> injection_delay = time(B) - time(A) >>>> >>>> So it didn't count time from read TMCCT... And guest get interrupt at >>>> time(B), not quite understand why time(B) - time(A) matters here... >>>> >>> Its an estimate of the delay it takes to inject an interrupt to the >>> guest once fired. Its only used if there have been accumulated ones, so >>> ACB sequence with pending=0 will use hrtimer_expires_remaining(). >>> >>> >>>> I think the reasonable here means, this interval is usable later after >>>> the accumulated interrupts are injected. From this point of view, I >>>> think current solution is reasonable. It just assume the delayed >>>> interrupts have been injected. >>>> >>>> However, seriously, any value here is wrong, no elegant one. >>>> >>> I agree. >>> >>> >>>> But I still prefer to the current solution... >>>> >>> Why? The proposed version is smaller and simpler than the current >>> one IMO, and also not vulnerable to whatever bug is causing now < >>> last_update. >>> >>> And more precise, since the current scheme uses interrupt injection time >>> as if it was "count-down restart" time, which is not true. >>> >>> >>>> And here is not the really problem for now I think. The current >>>> mechanism is mostly OK, but where is the bug... We have either have a >>>> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did >>>> it worth a try? Anyway, it would return a buggy result if we have >>>> pending interrupts... >>>> >>> The overflow calculation is not necessary as discussed. Alexander, can >>> you please try the following? Sheng, do you have any other suggestion to >>> test? >>> >>> /proc/timer_list output of the host when ESX is running too. >>> >> Sorry for the late reply. Here's the dmesg output and /proc/timer_list >> after the issue occured. I left the warning message in that I put there >> in case the value is too high: >> >> counter_passed = div64_u64(ktime_to_ns(passed), >> (APIC_BUS_CYCLE_NS * >> apic->timer.divide_count)); >> >> + if (counter_passed > 0x7f00000000000000) { >> + /* If we're in here we probably encountered a bug! */ >> + printk(KERN_INFO "LAPIC: Too high counter_passed value: >> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed, >> ktime_to_ns(passed), passed, now.tv64, apic >> ->timer.last_update.tv64); >> + } >> + >> if (counter_passed > tmcct) { >> if (unlikely(!apic_lvtt_period(apic))) { >> /* one-shot timers stick at 0 until reset */ >> >> >> >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer >> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264. >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer >> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f. >> last_update = 1014860818546063 now = 1014860818426085 >> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0 >> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0 >> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0 >> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0 >> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0 >> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0 >> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0 >> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0 >> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0 >> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0 >> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0 >> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0 >> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0 >> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0 >> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0 >> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0 >> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0 >> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0 >> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0 >> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0 >> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0 >> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0 >> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0 >> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0 >> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0 >> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0 >> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0 >> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0 >> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0 >> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0 >> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0 >> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0 >> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0 >> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0 >> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0 >> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0 >> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0 >> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0 >> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0 >> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0 >> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0 >> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0 >> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0 >> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0 >> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0 >> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0 >> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0 >> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0 >> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0 >> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0 >> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 | >> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f) >> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer >> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda. >> >> > > Thanks, Alex! > > When update time of last_update is 0x39B02B29428E5, and the start_apic_timer > at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like > a window here?: > > >> static void start_apic_timer(struct kvm_lapic *apic) >> { >> ktime_t now = apic->timer.dev.base->get_time(); >> >> apic->timer.last_update = now; >> > > [window?] > >> apic->timer.period = apic_get_reg(apic, APIC_TMICT) * >> APIC_BUS_CYCLE_NS * apic->timer.divide_count; >> atomic_set(&apic->timer.pending, 0); >> >> if (!apic->timer.period) >> return; >> >> hrtimer_start(&apic->timer.dev, >> ktime_add_ns(now, apic->timer.period), >> HRTIMER_MODE_ABS); >> >> apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" >> PRIx64 ", " >> "timer initial count 0x%x, period %lldns, " >> "expire @ 0x%016" PRIx64 ".\n", __func__, >> APIC_BUS_CYCLE_NS, ktime_to_ns(now), >> apic_get_reg(apic, APIC_TMICT), >> apic->timer.period, >> ktime_to_ns(ktime_add_ns(now, >> apic->timer.period))); >> } >> > > Looks like due to the period is reduced, but the last_update use old value and > got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of > vcpu B happened at vcpu A? Both seems unreasonable... > > Alex, can you help to add another line above to confirm the problem? e.g. > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index afac68c..15f8ed5 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic) > { > ktime_t now = apic->timer.dev.base->get_time(); > > + printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id); > + > apic->timer.last_update = now; > > apic->timer.period = apic_get_reg(apic, APIC_TMICT) * > > Also add a similar one at the beginning of print_last_update_record(). > > Thanks! (And thanks for Marcelo's excellent debug patch! :) ) > > The VMware ESX VM is UP, so I don't think we'll get anything from the vcpu_id apart from 0 :-). Nevertheless here's the dmesg output: enter start_apic_timer! vcpu 0 enter start_apic_timer! vcpu 0 start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26. enter start_apic_timer! vcpu 0 start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62. last_update = 1020361050214818 now = 1020361050178604 enter print_last_update_record! vcpu 0 rec[5] when=1020361043617361 last_update=1020361043746818 pend=0 rec[4] when=1020361050088268 last_update=1020361050214818 pend=0 rec[3] when=1020361049970143 last_update=1020361050082818 pend=0 rec[2] when=1020361049822220 last_update=1020361049950818 pend=0 rec[1] when=1020361049690023 last_update=1020361049818818 pend=0 rec[0] when=1020361049559209 last_update=1020361049686818 pend=0 rec[49] when=1020361049427469 last_update=1020361049554818 pend=0 rec[48] when=1020361049294815 last_update=1020361049422818 pend=0 rec[47] when=1020361049162401 last_update=1020361049290818 pend=0 rec[46] when=1020361049030431 last_update=1020361049158818 pend=0 rec[45] when=1020361048898352 last_update=1020361049026818 pend=0 rec[44] when=1020361048765413 last_update=1020361048894818 pend=0 rec[43] when=1020361048634310 last_update=1020361048762818 pend=0 rec[42] when=1020361048501343 last_update=1020361048630818 pend=0 rec[41] when=1020361048369388 last_update=1020361048498818 pend=0 rec[40] when=1020361048241616 last_update=1020361048366818 pend=0 rec[39] when=1020361048108017 last_update=1020361048234818 pend=0 rec[38] when=1020361047973826 last_update=1020361048102818 pend=0 rec[37] when=1020361047841371 last_update=1020361047970818 pend=0 rec[36] when=1020361047709322 last_update=1020361047838818 pend=0 rec[35] when=1020361047577366 last_update=1020361047706818 pend=0 rec[34] when=1020361047445316 last_update=1020361047574818 pend=0 rec[33] when=1020361047313423 last_update=1020361047442818 pend=0 rec[32] when=1020361047181310 last_update=1020361047310818 pend=0 rec[31] when=1020361047049990 last_update=1020361047178818 pend=0 rec[30] when=1020361046917375 last_update=1020361047046818 pend=0 rec[29] when=1020361046785332 last_update=1020361046914818 pend=0 rec[28] when=1020361046653374 last_update=1020361046782818 pend=0 rec[27] when=1020361046521325 last_update=1020361046650818 pend=0 rec[26] when=1020361046389374 last_update=1020361046518818 pend=0 rec[25] when=1020361046257989 last_update=1020361046386818 pend=0 rec[24] when=1020361046125335 last_update=1020361046254818 pend=0 rec[23] when=1020361045993393 last_update=1020361046122818 pend=0 rec[22] when=1020361045861330 last_update=1020361045990818 pend=0 rec[21] when=1020361045729389 last_update=1020361045858818 pend=0 rec[20] when=1020361045602604 last_update=1020361045726818 pend=0 rec[19] when=1020361045471032 last_update=1020361045594818 pend=0 rec[18] when=1020361045333385 last_update=1020361045462818 pend=0 rec[17] when=1020361045201397 last_update=1020361045330818 pend=0 rec[16] when=1020361045074634 last_update=1020361045198818 pend=0 rec[15] when=1020361044941707 last_update=1020361045066818 pend=0 rec[14] when=1020361044805373 last_update=1020361044934818 pend=0 rec[13] when=1020361044673417 last_update=1020361044802818 pend=0 rec[12] when=1020361044547848 last_update=1020361044670818 pend=0 rec[11] when=1020361044409439 last_update=1020361044538818 pend=0 rec[10] when=1020361044277316 last_update=1020361044406818 pend=0 rec[9] when=1020361044150456 last_update=1020361044274818 pend=0 rec[8] when=1020361044013204 last_update=1020361044142818 pend=0 rec[7] when=1020361043881203 last_update=1020361044010818 pend=0 rec[6] when=1020361043749913 last_update=1020361043878818 pend=0 LAPIC: Too high counter_passed value: 0x7fffffffffff7289 | 0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2) enter start_apic_timer! vcpu 0 start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 20 January 2009 20:09:23 Alexander Graf wrote: > Sheng Yang wrote: > > On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote: > >> Marcelo Tosatti wrote: > >>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote: > >>>>> + * Since reinjection is not rate-limited, use the delay > >>>>> + * to inject the last interrupt as an estimate. > >>>>> + */ > >>>>> + if (unlikely(atomic_read(&apic->timer.pending) > 0)) { > >>>>> + remaining = apic->timer.injection_delay; > >>>>> + if (ktime_to_ns(remaining) > apic->timer.period) > >>>>> + remaining = ns_to_ktime(apic->timer.period); > >>>>> + } else > >>>>> + remaining = hrtimer_expires_remaining(&apic->timer.dev); > >>>> > >>>> A little doubt... > >>>> > >>>> A: time_fire > >>>> B: intr_post > >>>> C: read TMCCT > >>>> > >>>> The sequence can be ABC or ACB. > >>>> > >>>> injection_delay = time(B) - time(A) > >>>> > >>>> So it didn't count time from read TMCCT... And guest get interrupt at > >>>> time(B), not quite understand why time(B) - time(A) matters here... > >>> > >>> Its an estimate of the delay it takes to inject an interrupt to the > >>> guest once fired. Its only used if there have been accumulated ones, so > >>> ACB sequence with pending=0 will use hrtimer_expires_remaining(). > >>> > >>>> I think the reasonable here means, this interval is usable later after > >>>> the accumulated interrupts are injected. From this point of view, I > >>>> think current solution is reasonable. It just assume the delayed > >>>> interrupts have been injected. > >>>> > >>>> However, seriously, any value here is wrong, no elegant one. > >>> > >>> I agree. > >>> > >>>> But I still prefer to the current solution... > >>> > >>> Why? The proposed version is smaller and simpler than the current > >>> one IMO, and also not vulnerable to whatever bug is causing now < > >>> last_update. > >>> > >>> And more precise, since the current scheme uses interrupt injection > >>> time as if it was "count-down restart" time, which is not true. > >>> > >>>> And here is not the really problem for now I think. The current > >>>> mechanism is mostly OK, but where is the bug... We have either have a > >>>> simple fix(e.g. if now < last_update, then return 0) or dig into it. > >>>> Did it worth a try? Anyway, it would return a buggy result if we have > >>>> pending interrupts... > >>> > >>> The overflow calculation is not necessary as discussed. Alexander, can > >>> you please try the following? Sheng, do you have any other suggestion > >>> to test? > >>> > >>> /proc/timer_list output of the host when ESX is running too. > >> > >> Sorry for the late reply. Here's the dmesg output and /proc/timer_list > >> after the issue occured. I left the warning message in that I put there > >> in case the value is too high: > >> > >> counter_passed = div64_u64(ktime_to_ns(passed), > >> (APIC_BUS_CYCLE_NS * > >> apic->timer.divide_count)); > >> > >> + if (counter_passed > 0x7f00000000000000) { > >> + /* If we're in here we probably encountered a bug! */ > >> + printk(KERN_INFO "LAPIC: Too high counter_passed value: > >> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed, > >> ktime_to_ns(passed), passed, now.tv64, apic > >> ->timer.last_update.tv64); > >> + } > >> + > >> if (counter_passed > tmcct) { > >> if (unlikely(!apic_lvtt_period(apic))) { > >> /* one-shot timers stick at 0 until reset */ > >> > >> > >> > >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer > >> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264. > >> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer > >> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f. > >> last_update = 1014860818546063 now = 1014860818426085 > >> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0 > >> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0 > >> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0 > >> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0 > >> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0 > >> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0 > >> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0 > >> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0 > >> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0 > >> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0 > >> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0 > >> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0 > >> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0 > >> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0 > >> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0 > >> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0 > >> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0 > >> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0 > >> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0 > >> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0 > >> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0 > >> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0 > >> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0 > >> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0 > >> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0 > >> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0 > >> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0 > >> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0 > >> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0 > >> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0 > >> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0 > >> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0 > >> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0 > >> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0 > >> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0 > >> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0 > >> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0 > >> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0 > >> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0 > >> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0 > >> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0 > >> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0 > >> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0 > >> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0 > >> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0 > >> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0 > >> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0 > >> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0 > >> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0 > >> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0 > >> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 | > >> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 > >> (0x39b02b295fd8f) start_apic_timer: bus cycle is 1ns, now > >> 0x00039b02b2998ada, timer initial count 0x8400, period 33792ns, expire @ > >> 0x00039b02b29a0eda. > > > > Thanks, Alex! > > > > When update time of last_update is 0x39B02B29428E5, and the > > start_apic_timer at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, > > very close. Looks like > > > > a window here?: > >> static void start_apic_timer(struct kvm_lapic *apic) > >> { > >> ktime_t now = apic->timer.dev.base->get_time(); > >> > >> apic->timer.last_update = now; > > > > [window?] > > > >> apic->timer.period = apic_get_reg(apic, APIC_TMICT) * > >> APIC_BUS_CYCLE_NS * apic->timer.divide_count; > >> atomic_set(&apic->timer.pending, 0); > >> > >> if (!apic->timer.period) > >> return; > >> > >> hrtimer_start(&apic->timer.dev, > >> ktime_add_ns(now, apic->timer.period), > >> HRTIMER_MODE_ABS); > >> > >> apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" > >> PRIx64 ", " > >> "timer initial count 0x%x, period %lldns, " > >> "expire @ 0x%016" PRIx64 ".\n", __func__, > >> APIC_BUS_CYCLE_NS, ktime_to_ns(now), > >> apic_get_reg(apic, APIC_TMICT), > >> apic->timer.period, > >> ktime_to_ns(ktime_add_ns(now, > >> apic->timer.period))); > >> } > > > > Looks like due to the period is reduced, but the last_update use old > > value and got just updated... But guest vcpu B program vcpu A's lapic? Or > > intr_post of vcpu B happened at vcpu A? Both seems unreasonable... > > > > Alex, can you help to add another line above to confirm the problem? e.g. > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index afac68c..15f8ed5 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic) > > { > > ktime_t now = apic->timer.dev.base->get_time(); > > > > + printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id); > > + > > apic->timer.last_update = now; > > > > apic->timer.period = apic_get_reg(apic, APIC_TMICT) * > > > > Also add a similar one at the beginning of print_last_update_record(). > > > > Thanks! (And thanks for Marcelo's excellent debug patch! :) ) > > The VMware ESX VM is UP, so I don't think we'll get anything from the > vcpu_id apart from 0 :-). Nevertheless here's the dmesg output: > > enter start_apic_timer! vcpu 0 > enter start_apic_timer! vcpu 0 > start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer > initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26. > enter start_apic_timer! vcpu 0 > start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer > initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62. > last_update = 1020361050214818 now = 1020361050178604 > enter print_last_update_record! vcpu 0 > rec[5] when=1020361043617361 last_update=1020361043746818 pend=0 > rec[4] when=1020361050088268 last_update=1020361050214818 pend=0 > rec[3] when=1020361049970143 last_update=1020361050082818 pend=0 > rec[2] when=1020361049822220 last_update=1020361049950818 pend=0 > rec[1] when=1020361049690023 last_update=1020361049818818 pend=0 > rec[0] when=1020361049559209 last_update=1020361049686818 pend=0 > rec[49] when=1020361049427469 last_update=1020361049554818 pend=0 > rec[48] when=1020361049294815 last_update=1020361049422818 pend=0 > rec[47] when=1020361049162401 last_update=1020361049290818 pend=0 > rec[46] when=1020361049030431 last_update=1020361049158818 pend=0 > rec[45] when=1020361048898352 last_update=1020361049026818 pend=0 > rec[44] when=1020361048765413 last_update=1020361048894818 pend=0 > rec[43] when=1020361048634310 last_update=1020361048762818 pend=0 > rec[42] when=1020361048501343 last_update=1020361048630818 pend=0 > rec[41] when=1020361048369388 last_update=1020361048498818 pend=0 > rec[40] when=1020361048241616 last_update=1020361048366818 pend=0 > rec[39] when=1020361048108017 last_update=1020361048234818 pend=0 > rec[38] when=1020361047973826 last_update=1020361048102818 pend=0 > rec[37] when=1020361047841371 last_update=1020361047970818 pend=0 > rec[36] when=1020361047709322 last_update=1020361047838818 pend=0 > rec[35] when=1020361047577366 last_update=1020361047706818 pend=0 > rec[34] when=1020361047445316 last_update=1020361047574818 pend=0 > rec[33] when=1020361047313423 last_update=1020361047442818 pend=0 > rec[32] when=1020361047181310 last_update=1020361047310818 pend=0 > rec[31] when=1020361047049990 last_update=1020361047178818 pend=0 > rec[30] when=1020361046917375 last_update=1020361047046818 pend=0 > rec[29] when=1020361046785332 last_update=1020361046914818 pend=0 > rec[28] when=1020361046653374 last_update=1020361046782818 pend=0 > rec[27] when=1020361046521325 last_update=1020361046650818 pend=0 > rec[26] when=1020361046389374 last_update=1020361046518818 pend=0 > rec[25] when=1020361046257989 last_update=1020361046386818 pend=0 > rec[24] when=1020361046125335 last_update=1020361046254818 pend=0 > rec[23] when=1020361045993393 last_update=1020361046122818 pend=0 > rec[22] when=1020361045861330 last_update=1020361045990818 pend=0 > rec[21] when=1020361045729389 last_update=1020361045858818 pend=0 > rec[20] when=1020361045602604 last_update=1020361045726818 pend=0 > rec[19] when=1020361045471032 last_update=1020361045594818 pend=0 > rec[18] when=1020361045333385 last_update=1020361045462818 pend=0 > rec[17] when=1020361045201397 last_update=1020361045330818 pend=0 > rec[16] when=1020361045074634 last_update=1020361045198818 pend=0 > rec[15] when=1020361044941707 last_update=1020361045066818 pend=0 > rec[14] when=1020361044805373 last_update=1020361044934818 pend=0 > rec[13] when=1020361044673417 last_update=1020361044802818 pend=0 > rec[12] when=1020361044547848 last_update=1020361044670818 pend=0 > rec[11] when=1020361044409439 last_update=1020361044538818 pend=0 > rec[10] when=1020361044277316 last_update=1020361044406818 pend=0 > rec[9] when=1020361044150456 last_update=1020361044274818 pend=0 > rec[8] when=1020361044013204 last_update=1020361044142818 pend=0 > rec[7] when=1020361043881203 last_update=1020361044010818 pend=0 > rec[6] when=1020361043749913 last_update=1020361043878818 pend=0 > LAPIC: Too high counter_passed value: 0x7fffffffffff7289 | > 0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2) > enter start_apic_timer! vcpu 0 > start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer > initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454. Oh, you don't need to. :) I thought you are running on MP system... OK, seems it would become more easier. :) Checked the data again, seems last_update is always ahead of "when", at about one period. That's not desired. Last_update should delay about one period... Continue to check the code...
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index afac68c..15f8ed5 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic) { ktime_t now = apic->timer.dev.base->get_time(); + printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id); + apic->timer.last_update = now; apic->timer.period = apic_get_reg(apic, APIC_TMICT) *