Message ID | 20190819230422.244888-1-delco@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: lapic: restart counter on change to periodic mode | expand |
On 20/08/19 01:04, Matt delco wrote: > From: Matt Delco <delco@google.com> > > Time seems to eventually stop in a Windows VM when using Skype. > Instrumentation shows that the OS is frequently switching the APIC > timer between one-shot and periodic mode. The OS is typically writing > to both LVTT and TMICT. When time stops the sequence observed is that > the APIC was in one-shot mode, the timer expired, and the OS writes to > LVTT (but not TMICT) to change to periodic mode. No future timer events > are received by the OS since the timer is only re-armed on TMICT writes. > > With this change time continues to advance in the VM. TBD if physical > hardware will reset the current count if/when the mode is changed to > period and the current count is zero. > > Signed-off-by: Matt Delco <delco@google.com> > --- > arch/x86/kvm/lapic.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 685d17c11461..fddd810eeca5 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > break; > > - case APIC_LVTT: > + case APIC_LVTT: { > + u32 timer_mode = apic->lapic_timer.timer_mode; > if (!kvm_apic_sw_enabled(apic)) > val |= APIC_LVT_MASKED; > val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask); > kvm_lapic_set_reg(apic, APIC_LVTT, val); > apic_update_lvtt(apic); > + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > + apic_lvtt_period(apic) && > + !hrtimer_active(&apic->lapic_timer.timer)) > + start_apic_timer(apic); The manual says "A write to the LVT Timer Register that changes the timer mode disarms the local APIC timer", but we already know this is not true (commit dedf9c5e216902c6d34b5a0d0c40f4acbb3706d8). Still, this needs some more explanation. Can you cover this, as well as the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c testcase? Then we could try running it on bare metal and see what happens. Thanks, Paolo > break; > - > + } > case APIC_TMICT: > if (apic_lvtt_tscdeadline(apic)) > break; >
On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: > On 20/08/19 01:04, Matt delco wrote: > > From: Matt Delco <delco@google.com> > > > > Time seems to eventually stop in a Windows VM when using Skype. > > Instrumentation shows that the OS is frequently switching the APIC > > timer between one-shot and periodic mode. The OS is typically writing > > to both LVTT and TMICT. When time stops the sequence observed is that > > the APIC was in one-shot mode, the timer expired, and the OS writes to > > LVTT (but not TMICT) to change to periodic mode. No future timer events > > are received by the OS since the timer is only re-armed on TMICT writes. > > > > With this change time continues to advance in the VM. TBD if physical > > hardware will reset the current count if/when the mode is changed to > > period and the current count is zero. > > > > Signed-off-by: Matt Delco <delco@google.com> > > --- > > arch/x86/kvm/lapic.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 685d17c11461..fddd810eeca5 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > > > break; > > > > - case APIC_LVTT: > > + case APIC_LVTT: { > > + u32 timer_mode = apic->lapic_timer.timer_mode; > > if (!kvm_apic_sw_enabled(apic)) > > val |= APIC_LVT_MASKED; > > val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask); > > kvm_lapic_set_reg(apic, APIC_LVTT, val); > > apic_update_lvtt(apic); > > + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > > + apic_lvtt_period(apic) && > > + !hrtimer_active(&apic->lapic_timer.timer)) > > + start_apic_timer(apic); > > The manual says "A write to the LVT Timer Register that changes the > timer mode disarms the local APIC timer", but we already know this is > not true (commit dedf9c5e216902c6d34b5a0d0c40f4acbb3706d8). That was a confirmed SDM bug that has been fixed as of the May 2019 version of the SDM. > > Still, this needs some more explanation. Can you cover this, as well as > the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c > testcase? Then we could try running it on bare metal and see what happens. Only transitions to/from deadline should disable the timer, i.e. this blurb from the SDM was found to be correct. Transitioning between TSC-deadline mode and other timer modes also disarms the timer. But yeah, tests are in order, at least for oneshot->periodic and vice versa. I can't find any internal code that tests whether transitioning between oneshot and periodic actually rearms the timer or if it simply doesn't disable it, and the SDM doesn't clarify what constitutes "reprogrammed". If possible, we should also test what happens if APIC_TMCCT != 0, though that might be tricky and/or fragile. If the timer is rearmed on a transition between oneshot and periodic, then I would expect it to happen for both APIC_TMCCT==0 and APIC_TMCCT!=0. > > Thanks, > > Paolo > > > > break; > > - > > + } > > case APIC_TMICT: > > if (apic_lvtt_tscdeadline(apic)) > > break; > > >
+Cc Nadav On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: > On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < > sean.j.christopherson@intel.com> wrote: > > > On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: > > > On 20/08/19 01:04, Matt delco wrote: > > > > From: Matt Delco <delco@google.com> > > > > > > > > Time seems to eventually stop in a Windows VM when using Skype. > > > > Instrumentation shows that the OS is frequently switching the APIC > > > > timer between one-shot and periodic mode. The OS is typically writing > > > > to both LVTT and TMICT. When time stops the sequence observed is that > > > > the APIC was in one-shot mode, the timer expired, and the OS writes to > > > > LVTT (but not TMICT) to change to periodic mode. No future timer > > events > > > > are received by the OS since the timer is only re-armed on TMICT > > writes. > > > > > > > > With this change time continues to advance in the VM. TBD if physical > > > > hardware will reset the current count if/when the mode is changed to > > > > period and the current count is zero. > > > > > > > > Signed-off-by: Matt Delco <delco@google.com> > > > > --- > > > > arch/x86/kvm/lapic.c | 9 +++++++-- > > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > > index 685d17c11461..fddd810eeca5 100644 > > > > --- a/arch/x86/kvm/lapic.c > > > > +++ b/arch/x86/kvm/lapic.c > > > > @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic > > *apic, u32 reg, u32 val) > > > > > > > > break; > > > > > > > > - case APIC_LVTT: > > > > + case APIC_LVTT: { > > > > + u32 timer_mode = apic->lapic_timer.timer_mode; > > > > if (!kvm_apic_sw_enabled(apic)) > > > > val |= APIC_LVT_MASKED; > > > > val &= (apic_lvt_mask[0] | > > apic->lapic_timer.timer_mode_mask); > > > > kvm_lapic_set_reg(apic, APIC_LVTT, val); > > > > apic_update_lvtt(apic); > > > > + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > > > > + apic_lvtt_period(apic) && > > > > + !hrtimer_active(&apic->lapic_timer.timer)) > > > > + start_apic_timer(apic); > > > > > > Still, this needs some more explanation. Can you cover this, as well as > > > the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c > > > testcase? Then we could try running it on bare metal and see what > > happens. > > > > I looked at apic.c and test_apic_change_mode() might already be testing > this. It sets oneshot & TMICT, waits for the current value to get > half-way, changes the mode to periodic, and then tries to test that the > value wraps back to the upper half. It then waits again for the half-way > point, changes the mode back to oneshot, and waits for zero. After > reaching zero it does: > > /* now tmcct == 0 and tmict != 0 */ > apic_change_mode(APIC_LVT_TIMER_PERIODIC); > report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); > > which seems to be testing that oneshot->periodic won't reset the timer if > it's already zero. A possible caveat is there's hardly any delay between > the mode change and the timer read. Emulated hardware will react > instantaneously (at least as seen from within the VM), but hardware might > need more time to react (though offhand I'd expect HW to be fast enough for > this particular timer). > > So, it looks like the code might already be ready to run on physical > hardware, and if it has (or does already as part of a regular test), then > that does raise some doubt on what's the appropriate code change to make > this work. Nadav has been running tests on bare metal, maybe he can weigh in on whether or not test_apic_change_mode() passes on bare metal.
> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > +Cc Nadav > > On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: >> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < >> sean.j.christopherson@intel.com> wrote: >> >>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: >>>> On 20/08/19 01:04, Matt delco wrote: >>>>> From: Matt Delco <delco@google.com> >>>>> >>>>> Time seems to eventually stop in a Windows VM when using Skype. >>>>> Instrumentation shows that the OS is frequently switching the APIC >>>>> timer between one-shot and periodic mode. The OS is typically writing >>>>> to both LVTT and TMICT. When time stops the sequence observed is that >>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to >>>>> LVTT (but not TMICT) to change to periodic mode. No future timer >>> events >>>>> are received by the OS since the timer is only re-armed on TMICT >>> writes. >>>>> With this change time continues to advance in the VM. TBD if physical >>>>> hardware will reset the current count if/when the mode is changed to >>>>> period and the current count is zero. >>>>> >>>>> Signed-off-by: Matt Delco <delco@google.com> >>>>> --- >>>>> arch/x86/kvm/lapic.c | 9 +++++++-- >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>>>> index 685d17c11461..fddd810eeca5 100644 >>>>> --- a/arch/x86/kvm/lapic.c >>>>> +++ b/arch/x86/kvm/lapic.c >>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic >>> *apic, u32 reg, u32 val) >>>>> break; >>>>> >>>>> - case APIC_LVTT: >>>>> + case APIC_LVTT: { >>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; >>>>> if (!kvm_apic_sw_enabled(apic)) >>>>> val |= APIC_LVT_MASKED; >>>>> val &= (apic_lvt_mask[0] | >>> apic->lapic_timer.timer_mode_mask); >>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); >>>>> apic_update_lvtt(apic); >>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && >>>>> + apic_lvtt_period(apic) && >>>>> + !hrtimer_active(&apic->lapic_timer.timer)) >>>>> + start_apic_timer(apic); >>>> >>>> Still, this needs some more explanation. Can you cover this, as well as >>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c >>>> testcase? Then we could try running it on bare metal and see what >>> happens. >> >> I looked at apic.c and test_apic_change_mode() might already be testing >> this. It sets oneshot & TMICT, waits for the current value to get >> half-way, changes the mode to periodic, and then tries to test that the >> value wraps back to the upper half. It then waits again for the half-way >> point, changes the mode back to oneshot, and waits for zero. After >> reaching zero it does: >> >> /* now tmcct == 0 and tmict != 0 */ >> apic_change_mode(APIC_LVT_TIMER_PERIODIC); >> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); >> >> which seems to be testing that oneshot->periodic won't reset the timer if >> it's already zero. A possible caveat is there's hardly any delay between >> the mode change and the timer read. Emulated hardware will react >> instantaneously (at least as seen from within the VM), but hardware might >> need more time to react (though offhand I'd expect HW to be fast enough for >> this particular timer). >> >> So, it looks like the code might already be ready to run on physical >> hardware, and if it has (or does already as part of a regular test), then >> that does raise some doubt on what's the appropriate code change to make >> this work. > > Nadav has been running tests on bare metal, maybe he can weigh in on > whether or not test_apic_change_mode() passes on bare metal. These tests pass on bare-metal.
On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: > > > On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > +Cc Nadav > > > > On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: > >> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < > >> sean.j.christopherson@intel.com> wrote: > >> > >>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: > >>>> On 20/08/19 01:04, Matt delco wrote: > >>>>> From: Matt Delco <delco@google.com> > >>>>> > >>>>> Time seems to eventually stop in a Windows VM when using Skype. > >>>>> Instrumentation shows that the OS is frequently switching the APIC > >>>>> timer between one-shot and periodic mode. The OS is typically writing > >>>>> to both LVTT and TMICT. When time stops the sequence observed is that > >>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to > >>>>> LVTT (but not TMICT) to change to periodic mode. No future timer > >>> events > >>>>> are received by the OS since the timer is only re-armed on TMICT > >>> writes. > >>>>> With this change time continues to advance in the VM. TBD if physical > >>>>> hardware will reset the current count if/when the mode is changed to > >>>>> period and the current count is zero. > >>>>> > >>>>> Signed-off-by: Matt Delco <delco@google.com> > >>>>> --- > >>>>> arch/x86/kvm/lapic.c | 9 +++++++-- > >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >>>>> index 685d17c11461..fddd810eeca5 100644 > >>>>> --- a/arch/x86/kvm/lapic.c > >>>>> +++ b/arch/x86/kvm/lapic.c > >>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic > >>> *apic, u32 reg, u32 val) > >>>>> break; > >>>>> > >>>>> - case APIC_LVTT: > >>>>> + case APIC_LVTT: { > >>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; > >>>>> if (!kvm_apic_sw_enabled(apic)) > >>>>> val |= APIC_LVT_MASKED; > >>>>> val &= (apic_lvt_mask[0] | > >>> apic->lapic_timer.timer_mode_mask); > >>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); > >>>>> apic_update_lvtt(apic); > >>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > >>>>> + apic_lvtt_period(apic) && > >>>>> + !hrtimer_active(&apic->lapic_timer.timer)) > >>>>> + start_apic_timer(apic); > >>>> > >>>> Still, this needs some more explanation. Can you cover this, as well as > >>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c > >>>> testcase? Then we could try running it on bare metal and see what > >>> happens. > >> > >> I looked at apic.c and test_apic_change_mode() might already be testing > >> this. It sets oneshot & TMICT, waits for the current value to get > >> half-way, changes the mode to periodic, and then tries to test that the > >> value wraps back to the upper half. It then waits again for the half-way > >> point, changes the mode back to oneshot, and waits for zero. After > >> reaching zero it does: > >> > >> /* now tmcct == 0 and tmict != 0 */ > >> apic_change_mode(APIC_LVT_TIMER_PERIODIC); > >> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); > >> > >> which seems to be testing that oneshot->periodic won't reset the timer if > >> it's already zero. A possible caveat is there's hardly any delay between > >> the mode change and the timer read. Emulated hardware will react > >> instantaneously (at least as seen from within the VM), but hardware might > >> need more time to react (though offhand I'd expect HW to be fast enough for > >> this particular timer). > >> > >> So, it looks like the code might already be ready to run on physical > >> hardware, and if it has (or does already as part of a regular test), then > >> that does raise some doubt on what's the appropriate code change to make > >> this work. > > > > Nadav has been running tests on bare metal, maybe he can weigh in on > > whether or not test_apic_change_mode() passes on bare metal. > > These tests pass on bare-metal. Good to know this. In addition, in linux apic driver, during mode switch __setup_APIC_LVTT() always sets lapic_timer_period(number of clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the issue Matt report. So is it because there is no such stuff in windows or the windows version which Matt testing is too old? Regards, Wanpeng Li
On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote: > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: > > > > > On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > > > +Cc Nadav > > > > > > On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: > > >> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < > > >> sean.j.christopherson@intel.com> wrote: > > >> > > >>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: > > >>>> On 20/08/19 01:04, Matt delco wrote: > > >>>>> From: Matt Delco <delco@google.com> > > >>>>> > > >>>>> Time seems to eventually stop in a Windows VM when using Skype. > > >>>>> Instrumentation shows that the OS is frequently switching the APIC > > >>>>> timer between one-shot and periodic mode. The OS is typically writing > > >>>>> to both LVTT and TMICT. When time stops the sequence observed is that > > >>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to > > >>>>> LVTT (but not TMICT) to change to periodic mode. No future timer > > >>> events > > >>>>> are received by the OS since the timer is only re-armed on TMICT > > >>> writes. > > >>>>> With this change time continues to advance in the VM. TBD if physical > > >>>>> hardware will reset the current count if/when the mode is changed to > > >>>>> period and the current count is zero. > > >>>>> > > >>>>> Signed-off-by: Matt Delco <delco@google.com> > > >>>>> --- > > >>>>> arch/x86/kvm/lapic.c | 9 +++++++-- > > >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) > > >>>>> > > >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > >>>>> index 685d17c11461..fddd810eeca5 100644 > > >>>>> --- a/arch/x86/kvm/lapic.c > > >>>>> +++ b/arch/x86/kvm/lapic.c > > >>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic > > >>> *apic, u32 reg, u32 val) > > >>>>> break; > > >>>>> > > >>>>> - case APIC_LVTT: > > >>>>> + case APIC_LVTT: { > > >>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; > > >>>>> if (!kvm_apic_sw_enabled(apic)) > > >>>>> val |= APIC_LVT_MASKED; > > >>>>> val &= (apic_lvt_mask[0] | > > >>> apic->lapic_timer.timer_mode_mask); > > >>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); > > >>>>> apic_update_lvtt(apic); > > >>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > > >>>>> + apic_lvtt_period(apic) && > > >>>>> + !hrtimer_active(&apic->lapic_timer.timer)) > > >>>>> + start_apic_timer(apic); > > >>>> > > >>>> Still, this needs some more explanation. Can you cover this, as well as > > >>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c > > >>>> testcase? Then we could try running it on bare metal and see what > > >>> happens. > > >> > > >> I looked at apic.c and test_apic_change_mode() might already be testing > > >> this. It sets oneshot & TMICT, waits for the current value to get > > >> half-way, changes the mode to periodic, and then tries to test that the > > >> value wraps back to the upper half. It then waits again for the half-way > > >> point, changes the mode back to oneshot, and waits for zero. After > > >> reaching zero it does: > > >> > > >> /* now tmcct == 0 and tmict != 0 */ > > >> apic_change_mode(APIC_LVT_TIMER_PERIODIC); > > >> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); > > >> > > >> which seems to be testing that oneshot->periodic won't reset the timer if > > >> it's already zero. A possible caveat is there's hardly any delay between > > >> the mode change and the timer read. Emulated hardware will react > > >> instantaneously (at least as seen from within the VM), but hardware might > > >> need more time to react (though offhand I'd expect HW to be fast enough for > > >> this particular timer). > > >> > > >> So, it looks like the code might already be ready to run on physical > > >> hardware, and if it has (or does already as part of a regular test), then > > >> that does raise some doubt on what's the appropriate code change to make > > >> this work. > > > > > > Nadav has been running tests on bare metal, maybe he can weigh in on > > > whether or not test_apic_change_mode() passes on bare metal. > > > > These tests pass on bare-metal. > > Good to know this. In addition, in linux apic driver, during mode > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > issue Matt report. So is it because there is no such stuff in windows > or the windows version which Matt testing is too old? I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to request higher frequency clocks, and this in turn can affect how the kernel configures HW timers. I may need to examine how Windows typically interacts with the APIC timer and see if/how this changes when Skype is used. The frequent timer mode changes are not something I'd expect a reasonably behaved kernel to do.
> On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: >>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>> >>> +Cc Nadav >>> >>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: >>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < >>>> sean.j.christopherson@intel.com> wrote: >>>> >>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: >>>>>> On 20/08/19 01:04, Matt delco wrote: >>>>>>> From: Matt Delco <delco@google.com> >>>>>>> >>>>>>> Time seems to eventually stop in a Windows VM when using Skype. >>>>>>> Instrumentation shows that the OS is frequently switching the APIC >>>>>>> timer between one-shot and periodic mode. The OS is typically writing >>>>>>> to both LVTT and TMICT. When time stops the sequence observed is that >>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to >>>>>>> LVTT (but not TMICT) to change to periodic mode. No future timer >>>>> events >>>>>>> are received by the OS since the timer is only re-armed on TMICT >>>>> writes. >>>>>>> With this change time continues to advance in the VM. TBD if physical >>>>>>> hardware will reset the current count if/when the mode is changed to >>>>>>> period and the current count is zero. >>>>>>> >>>>>>> Signed-off-by: Matt Delco <delco@google.com> >>>>>>> --- >>>>>>> arch/x86/kvm/lapic.c | 9 +++++++-- >>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>>>>>> index 685d17c11461..fddd810eeca5 100644 >>>>>>> --- a/arch/x86/kvm/lapic.c >>>>>>> +++ b/arch/x86/kvm/lapic.c >>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic >>>>> *apic, u32 reg, u32 val) >>>>>>> break; >>>>>>> >>>>>>> - case APIC_LVTT: >>>>>>> + case APIC_LVTT: { >>>>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; >>>>>>> if (!kvm_apic_sw_enabled(apic)) >>>>>>> val |= APIC_LVT_MASKED; >>>>>>> val &= (apic_lvt_mask[0] | >>>>> apic->lapic_timer.timer_mode_mask); >>>>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); >>>>>>> apic_update_lvtt(apic); >>>>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && >>>>>>> + apic_lvtt_period(apic) && >>>>>>> + !hrtimer_active(&apic->lapic_timer.timer)) >>>>>>> + start_apic_timer(apic); >>>>>> >>>>>> Still, this needs some more explanation. Can you cover this, as well as >>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c >>>>>> testcase? Then we could try running it on bare metal and see what >>>>> happens. >>>> >>>> I looked at apic.c and test_apic_change_mode() might already be testing >>>> this. It sets oneshot & TMICT, waits for the current value to get >>>> half-way, changes the mode to periodic, and then tries to test that the >>>> value wraps back to the upper half. It then waits again for the half-way >>>> point, changes the mode back to oneshot, and waits for zero. After >>>> reaching zero it does: >>>> >>>> /* now tmcct == 0 and tmict != 0 */ >>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC); >>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); >>>> >>>> which seems to be testing that oneshot->periodic won't reset the timer if >>>> it's already zero. A possible caveat is there's hardly any delay between >>>> the mode change and the timer read. Emulated hardware will react >>>> instantaneously (at least as seen from within the VM), but hardware might >>>> need more time to react (though offhand I'd expect HW to be fast enough for >>>> this particular timer). >>>> >>>> So, it looks like the code might already be ready to run on physical >>>> hardware, and if it has (or does already as part of a regular test), then >>>> that does raise some doubt on what's the appropriate code change to make >>>> this work. >>> >>> Nadav has been running tests on bare metal, maybe he can weigh in on >>> whether or not test_apic_change_mode() passes on bare metal. >> >> These tests pass on bare-metal. > > Good to know this. In addition, in linux apic driver, during mode > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > issue Matt report. So is it because there is no such stuff in windows > or the windows version which Matt testing is too old? I find it kind of disappointing that you (and others) did not try the kvm-unit-tests of bare-metal. :( It should be working, once Paolo (ahem..) applies the one pending patch. You do need a serial console though (which is usually available through ilo/idrac/etc). It should also work with UEFI/kexec, although I did not run such tests.
On Wed, 21 Aug 2019 at 00:33, Nadav Amit <nadav.amit@gmail.com> wrote: > > > On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > > > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: > >>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > >>> > >>> +Cc Nadav > >>> > >>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: > >>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < > >>>> sean.j.christopherson@intel.com> wrote: > >>>> > >>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: > >>>>>> On 20/08/19 01:04, Matt delco wrote: > >>>>>>> From: Matt Delco <delco@google.com> > >>>>>>> > >>>>>>> Time seems to eventually stop in a Windows VM when using Skype. > >>>>>>> Instrumentation shows that the OS is frequently switching the APIC > >>>>>>> timer between one-shot and periodic mode. The OS is typically writing > >>>>>>> to both LVTT and TMICT. When time stops the sequence observed is that > >>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to > >>>>>>> LVTT (but not TMICT) to change to periodic mode. No future timer > >>>>> events > >>>>>>> are received by the OS since the timer is only re-armed on TMICT > >>>>> writes. > >>>>>>> With this change time continues to advance in the VM. TBD if physical > >>>>>>> hardware will reset the current count if/when the mode is changed to > >>>>>>> period and the current count is zero. > >>>>>>> > >>>>>>> Signed-off-by: Matt Delco <delco@google.com> > >>>>>>> --- > >>>>>>> arch/x86/kvm/lapic.c | 9 +++++++-- > >>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >>>>>>> index 685d17c11461..fddd810eeca5 100644 > >>>>>>> --- a/arch/x86/kvm/lapic.c > >>>>>>> +++ b/arch/x86/kvm/lapic.c > >>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic > >>>>> *apic, u32 reg, u32 val) > >>>>>>> break; > >>>>>>> > >>>>>>> - case APIC_LVTT: > >>>>>>> + case APIC_LVTT: { > >>>>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; > >>>>>>> if (!kvm_apic_sw_enabled(apic)) > >>>>>>> val |= APIC_LVT_MASKED; > >>>>>>> val &= (apic_lvt_mask[0] | > >>>>> apic->lapic_timer.timer_mode_mask); > >>>>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); > >>>>>>> apic_update_lvtt(apic); > >>>>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && > >>>>>>> + apic_lvtt_period(apic) && > >>>>>>> + !hrtimer_active(&apic->lapic_timer.timer)) > >>>>>>> + start_apic_timer(apic); > >>>>>> > >>>>>> Still, this needs some more explanation. Can you cover this, as well as > >>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c > >>>>>> testcase? Then we could try running it on bare metal and see what > >>>>> happens. > >>>> > >>>> I looked at apic.c and test_apic_change_mode() might already be testing > >>>> this. It sets oneshot & TMICT, waits for the current value to get > >>>> half-way, changes the mode to periodic, and then tries to test that the > >>>> value wraps back to the upper half. It then waits again for the half-way > >>>> point, changes the mode back to oneshot, and waits for zero. After > >>>> reaching zero it does: > >>>> > >>>> /* now tmcct == 0 and tmict != 0 */ > >>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC); > >>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); > >>>> > >>>> which seems to be testing that oneshot->periodic won't reset the timer if > >>>> it's already zero. A possible caveat is there's hardly any delay between > >>>> the mode change and the timer read. Emulated hardware will react > >>>> instantaneously (at least as seen from within the VM), but hardware might > >>>> need more time to react (though offhand I'd expect HW to be fast enough for > >>>> this particular timer). > >>>> > >>>> So, it looks like the code might already be ready to run on physical > >>>> hardware, and if it has (or does already as part of a regular test), then > >>>> that does raise some doubt on what's the appropriate code change to make > >>>> this work. > >>> > >>> Nadav has been running tests on bare metal, maybe he can weigh in on > >>> whether or not test_apic_change_mode() passes on bare metal. > >> > >> These tests pass on bare-metal. > > > > Good to know this. In addition, in linux apic driver, during mode > > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > > issue Matt report. So is it because there is no such stuff in windows > > or the windows version which Matt testing is too old? > > I find it kind of disappointing that you (and others) did not try the > kvm-unit-tests of bare-metal. :( Origianlly xen guys confirm the testcase on bare-metal, thanks for your double confirm. Regards, Wanpeng Li
> On Aug 20, 2019, at 5:19 PM, Wanpeng Li <kernellwp@gmail.com> wrote: > > On Wed, 21 Aug 2019 at 00:33, Nadav Amit <nadav.amit@gmail.com> wrote: >>> On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote: >>> >>> On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: >>>>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>>> >>>>> +Cc Nadav >>>>> >>>>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote: >>>>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson < >>>>>> sean.j.christopherson@intel.com> wrote: >>>>>> >>>>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote: >>>>>>>> On 20/08/19 01:04, Matt delco wrote: >>>>>>>>> From: Matt Delco <delco@google.com> >>>>>>>>> >>>>>>>>> Time seems to eventually stop in a Windows VM when using Skype. >>>>>>>>> Instrumentation shows that the OS is frequently switching the APIC >>>>>>>>> timer between one-shot and periodic mode. The OS is typically writing >>>>>>>>> to both LVTT and TMICT. When time stops the sequence observed is that >>>>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to >>>>>>>>> LVTT (but not TMICT) to change to periodic mode. No future timer >>>>>>> events >>>>>>>>> are received by the OS since the timer is only re-armed on TMICT >>>>>>> writes. >>>>>>>>> With this change time continues to advance in the VM. TBD if physical >>>>>>>>> hardware will reset the current count if/when the mode is changed to >>>>>>>>> period and the current count is zero. >>>>>>>>> >>>>>>>>> Signed-off-by: Matt Delco <delco@google.com> >>>>>>>>> --- >>>>>>>>> arch/x86/kvm/lapic.c | 9 +++++++-- >>>>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>>>>>>>> index 685d17c11461..fddd810eeca5 100644 >>>>>>>>> --- a/arch/x86/kvm/lapic.c >>>>>>>>> +++ b/arch/x86/kvm/lapic.c >>>>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic >>>>>>> *apic, u32 reg, u32 val) >>>>>>>>> break; >>>>>>>>> >>>>>>>>> - case APIC_LVTT: >>>>>>>>> + case APIC_LVTT: { >>>>>>>>> + u32 timer_mode = apic->lapic_timer.timer_mode; >>>>>>>>> if (!kvm_apic_sw_enabled(apic)) >>>>>>>>> val |= APIC_LVT_MASKED; >>>>>>>>> val &= (apic_lvt_mask[0] | >>>>>>> apic->lapic_timer.timer_mode_mask); >>>>>>>>> kvm_lapic_set_reg(apic, APIC_LVTT, val); >>>>>>>>> apic_update_lvtt(apic); >>>>>>>>> + if (timer_mode == APIC_LVT_TIMER_ONESHOT && >>>>>>>>> + apic_lvtt_period(apic) && >>>>>>>>> + !hrtimer_active(&apic->lapic_timer.timer)) >>>>>>>>> + start_apic_timer(apic); >>>>>>>> >>>>>>>> Still, this needs some more explanation. Can you cover this, as well as >>>>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c >>>>>>>> testcase? Then we could try running it on bare metal and see what >>>>>>> happens. >>>>>> >>>>>> I looked at apic.c and test_apic_change_mode() might already be testing >>>>>> this. It sets oneshot & TMICT, waits for the current value to get >>>>>> half-way, changes the mode to periodic, and then tries to test that the >>>>>> value wraps back to the upper half. It then waits again for the half-way >>>>>> point, changes the mode back to oneshot, and waits for zero. After >>>>>> reaching zero it does: >>>>>> >>>>>> /* now tmcct == 0 and tmict != 0 */ >>>>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC); >>>>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT)); >>>>>> >>>>>> which seems to be testing that oneshot->periodic won't reset the timer if >>>>>> it's already zero. A possible caveat is there's hardly any delay between >>>>>> the mode change and the timer read. Emulated hardware will react >>>>>> instantaneously (at least as seen from within the VM), but hardware might >>>>>> need more time to react (though offhand I'd expect HW to be fast enough for >>>>>> this particular timer). >>>>>> >>>>>> So, it looks like the code might already be ready to run on physical >>>>>> hardware, and if it has (or does already as part of a regular test), then >>>>>> that does raise some doubt on what's the appropriate code change to make >>>>>> this work. >>>>> >>>>> Nadav has been running tests on bare metal, maybe he can weigh in on >>>>> whether or not test_apic_change_mode() passes on bare metal. >>>> >>>> These tests pass on bare-metal. >>> >>> Good to know this. In addition, in linux apic driver, during mode >>> switch __setup_APIC_LVTT() always sets lapic_timer_period(number of >>> clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the >>> issue Matt report. So is it because there is no such stuff in windows >>> or the windows version which Matt testing is too old? >> >> I find it kind of disappointing that you (and others) did not try the >> kvm-unit-tests of bare-metal. :( > > Origianlly xen guys confirm the testcase on bare-metal, thanks for > your double confirm. No worries, I don’t look for a “thank you” note. ;-)
On Tue, Aug 20, 2019 at 12:34:20AM -0700, Matt Delco wrote: > On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: > > > These tests pass on bare-metal. > > > > Good to know this. In addition, in linux apic driver, during mode > > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > > issue Matt report. So is it because there is no such stuff in windows > > or the windows version which Matt testing is too old? > > I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to > request higher frequency clocks, and this in turn can affect how the > kernel configures HW timers. I may need to examine how Windows > typically interacts with the APIC timer and see if/how this changes > when Skype is used. The frequent timer mode changes are not something > I'd expect a reasonably behaved kernel to do. Have you tried analyzing the guest code? If we're lucky, doing so might provide insight into what's going awry. E.g.: Are the LVTT/TMICT writes are coming from a single blob/sequence of code in the guest? Is the unpaired LVTT coming from the same code sequence or is it a new rip entirely? Can you dump the relevant asm code sequences?
On Wed, Aug 21, 2019 at 10:17 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > On Tue, Aug 20, 2019 at 12:34:20AM -0700, Matt Delco wrote: > > On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote: > > > > > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote: > > > > These tests pass on bare-metal. > > > > > > Good to know this. In addition, in linux apic driver, during mode > > > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > > > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > > > issue Matt report. So is it because there is no such stuff in windows > > > or the windows version which Matt testing is too old? > > > > I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to > > request higher frequency clocks, and this in turn can affect how the > > kernel configures HW timers. I may need to examine how Windows > > typically interacts with the APIC timer and see if/how this changes > > when Skype is used. The frequent timer mode changes are not something > > I'd expect a reasonably behaved kernel to do. > > Have you tried analyzing the guest code? If we're lucky, doing so might > provide insight into what's going awry. > > E.g.: > > Are the LVTT/TMICT writes are coming from a single blob/sequence of code > in the guest? > > Is the unpaired LVTT coming from the same code sequence or is it a new > rip entirely? > > Can you dump the relevant asm code sequences? I have changed gears to do runtime behavioral analysis, given the reports that the code change I proposed would deviate from hardware. The time between writes for TMICT-then-LVTT is typically quite small, and much smaller than the average for LVTT-then-TMICT. On the lead up to where time stops there's alternating writes to TMICT and LVTT, where each write to LVTT alternates between setting periodic vs. one-shot. The final write to LVTT (which sets periodic) comes more than 1.5 ms after the prior TMICT (which is about 100x the typical delay), which might mean the kernel opted to not write to TMICT but did on the next clock tick. The host kernel & kvm I've been testing with seems to be firing the timer callbacks sooner than requested, so if the guest kernel has optimizations based on whether it thinks there's time left on the APIC timer then this might be causing problems. I'm going to try to pull in some of the newer kvm changes that appear to compensate for the early delivery and see if that also makes the time hang symptom disappear (if not then I may start to examine things from the guest side). Thanks.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 685d17c11461..fddd810eeca5 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) break; - case APIC_LVTT: + case APIC_LVTT: { + u32 timer_mode = apic->lapic_timer.timer_mode; if (!kvm_apic_sw_enabled(apic)) val |= APIC_LVT_MASKED; val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask); kvm_lapic_set_reg(apic, APIC_LVTT, val); apic_update_lvtt(apic); + if (timer_mode == APIC_LVT_TIMER_ONESHOT && + apic_lvtt_period(apic) && + !hrtimer_active(&apic->lapic_timer.timer)) + start_apic_timer(apic); break; - + } case APIC_TMICT: if (apic_lvtt_tscdeadline(apic)) break;