diff mbox series

KVM: lapic: restart counter on change to periodic mode

Message ID 20190819230422.244888-1-delco@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: lapic: restart counter on change to periodic mode | expand

Commit Message

Matt Delco Aug. 19, 2019, 11:04 p.m. UTC
From: Matt Delco <delco@google.com>

Time seems to eventually stop in a Windows VM when using Skype.
Instrumentation shows that the OS is frequently switching the APIC
timer between one-shot and periodic mode.  The OS is typically writing
to both LVTT and TMICT.  When time stops the sequence observed is that
the APIC was in one-shot mode, the timer expired, and the OS writes to
LVTT (but not TMICT) to change to periodic mode.  No future timer events
are received by the OS since the timer is only re-armed on TMICT writes.

With this change time continues to advance in the VM.  TBD if physical
hardware will reset the current count if/when the mode is changed to
period and the current count is zero.

Signed-off-by: Matt Delco <delco@google.com>
---
 arch/x86/kvm/lapic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Aug. 19, 2019, 11:42 p.m. UTC | #1
On 20/08/19 01:04, Matt delco wrote:
> From: Matt Delco <delco@google.com>
> 
> Time seems to eventually stop in a Windows VM when using Skype.
> Instrumentation shows that the OS is frequently switching the APIC
> timer between one-shot and periodic mode.  The OS is typically writing
> to both LVTT and TMICT.  When time stops the sequence observed is that
> the APIC was in one-shot mode, the timer expired, and the OS writes to
> LVTT (but not TMICT) to change to periodic mode.  No future timer events
> are received by the OS since the timer is only re-armed on TMICT writes.
> 
> With this change time continues to advance in the VM.  TBD if physical
> hardware will reset the current count if/when the mode is changed to
> period and the current count is zero.
> 
> Signed-off-by: Matt Delco <delco@google.com>
> ---
>  arch/x86/kvm/lapic.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 685d17c11461..fddd810eeca5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  
>  		break;
>  
> -	case APIC_LVTT:
> +	case APIC_LVTT: {
> +		u32 timer_mode = apic->lapic_timer.timer_mode;
>  		if (!kvm_apic_sw_enabled(apic))
>  			val |= APIC_LVT_MASKED;
>  		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
>  		kvm_lapic_set_reg(apic, APIC_LVTT, val);
>  		apic_update_lvtt(apic);
> +		if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> +		    apic_lvtt_period(apic) &&
> +		    !hrtimer_active(&apic->lapic_timer.timer))
> +			start_apic_timer(apic);

The manual says "A write to the LVT Timer Register that changes the
timer mode disarms the local APIC timer", but we already know this is
not true (commit dedf9c5e216902c6d34b5a0d0c40f4acbb3706d8).

Still, this needs some more explanation.  Can you cover this, as well as
the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
testcase?  Then we could try running it on bare metal and see what happens.

Thanks,

Paolo


>  		break;
> -
> +	}
>  	case APIC_TMICT:
>  		if (apic_lvtt_tscdeadline(apic))
>  			break;
>
Sean Christopherson Aug. 20, 2019, 12:37 a.m. UTC | #2
On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
> On 20/08/19 01:04, Matt delco wrote:
> > From: Matt Delco <delco@google.com>
> > 
> > Time seems to eventually stop in a Windows VM when using Skype.
> > Instrumentation shows that the OS is frequently switching the APIC
> > timer between one-shot and periodic mode.  The OS is typically writing
> > to both LVTT and TMICT.  When time stops the sequence observed is that
> > the APIC was in one-shot mode, the timer expired, and the OS writes to
> > LVTT (but not TMICT) to change to periodic mode.  No future timer events
> > are received by the OS since the timer is only re-armed on TMICT writes.
> > 
> > With this change time continues to advance in the VM.  TBD if physical
> > hardware will reset the current count if/when the mode is changed to
> > period and the current count is zero.
> > 
> > Signed-off-by: Matt Delco <delco@google.com>
> > ---
> >  arch/x86/kvm/lapic.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 685d17c11461..fddd810eeca5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> >  
> >  		break;
> >  
> > -	case APIC_LVTT:
> > +	case APIC_LVTT: {
> > +		u32 timer_mode = apic->lapic_timer.timer_mode;
> >  		if (!kvm_apic_sw_enabled(apic))
> >  			val |= APIC_LVT_MASKED;
> >  		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
> >  		kvm_lapic_set_reg(apic, APIC_LVTT, val);
> >  		apic_update_lvtt(apic);
> > +		if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> > +		    apic_lvtt_period(apic) &&
> > +		    !hrtimer_active(&apic->lapic_timer.timer))
> > +			start_apic_timer(apic);
> 
> The manual says "A write to the LVT Timer Register that changes the
> timer mode disarms the local APIC timer", but we already know this is
> not true (commit dedf9c5e216902c6d34b5a0d0c40f4acbb3706d8).

That was a confirmed SDM bug that has been fixed as of the May 2019
version of the SDM.

> 
> Still, this needs some more explanation.  Can you cover this, as well as
> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
> testcase?  Then we could try running it on bare metal and see what happens.

Only transitions to/from deadline should disable the timer, i.e. this
blurb from the SDM was found to be correct.

  Transitioning between TSC-deadline mode and other timer modes also
  disarms the timer.

But yeah, tests are in order, at least for oneshot->periodic and vice
versa.  I can't find any internal code that tests whether transitioning
between oneshot and periodic actually rearms the timer or if it simply
doesn't disable it, and the SDM doesn't clarify what constitutes
"reprogrammed".

If possible, we should also test what happens if APIC_TMCCT != 0, though
that might be tricky and/or fragile.  If the timer is rearmed on a
transition between oneshot and periodic, then I would expect it to happen
for both APIC_TMCCT==0 and APIC_TMCCT!=0.

> 
> Thanks,
> 
> Paolo
> 
> 
> >  		break;
> > -
> > +	}
> >  	case APIC_TMICT:
> >  		if (apic_lvtt_tscdeadline(apic))
> >  			break;
> > 
>
Sean Christopherson Aug. 20, 2019, 1:56 a.m. UTC | #3
+Cc Nadav

On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
> sean.j.christopherson@intel.com> wrote:
> 
> > On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
> > > On 20/08/19 01:04, Matt delco wrote:
> > > > From: Matt Delco <delco@google.com>
> > > >
> > > > Time seems to eventually stop in a Windows VM when using Skype.
> > > > Instrumentation shows that the OS is frequently switching the APIC
> > > > timer between one-shot and periodic mode.  The OS is typically writing
> > > > to both LVTT and TMICT.  When time stops the sequence observed is that
> > > > the APIC was in one-shot mode, the timer expired, and the OS writes to
> > > > LVTT (but not TMICT) to change to periodic mode.  No future timer
> > events
> > > > are received by the OS since the timer is only re-armed on TMICT
> > writes.
> > > >
> > > > With this change time continues to advance in the VM.  TBD if physical
> > > > hardware will reset the current count if/when the mode is changed to
> > > > period and the current count is zero.
> > > >
> > > > Signed-off-by: Matt Delco <delco@google.com>
> > > > ---
> > > >  arch/x86/kvm/lapic.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 685d17c11461..fddd810eeca5 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
> > *apic, u32 reg, u32 val)
> > > >
> > > >             break;
> > > >
> > > > -   case APIC_LVTT:
> > > > +   case APIC_LVTT: {
> > > > +           u32 timer_mode = apic->lapic_timer.timer_mode;
> > > >             if (!kvm_apic_sw_enabled(apic))
> > > >                     val |= APIC_LVT_MASKED;
> > > >             val &= (apic_lvt_mask[0] |
> > apic->lapic_timer.timer_mode_mask);
> > > >             kvm_lapic_set_reg(apic, APIC_LVTT, val);
> > > >             apic_update_lvtt(apic);
> > > > +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> > > > +               apic_lvtt_period(apic) &&
> > > > +               !hrtimer_active(&apic->lapic_timer.timer))
> > > > +                   start_apic_timer(apic);
> > >
> > > Still, this needs some more explanation.  Can you cover this, as well as
> > > the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
> > > testcase?  Then we could try running it on bare metal and see what
> > happens.
> >
> 
> I looked at apic.c and test_apic_change_mode() might already be testing
> this.  It sets oneshot & TMICT, waits for the current value to get
> half-way, changes the mode to periodic, and then tries to test that the
> value wraps back to the upper half.  It then waits again for the half-way
> point, changes the mode back to oneshot, and waits for zero.  After
> reaching zero it does:
> 
> /* now tmcct == 0 and tmict != 0 */
> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
> 
> which seems to be testing that oneshot->periodic won't reset the timer if
> it's already zero.  A possible caveat is there's hardly any delay between
> the mode change and the timer read.  Emulated hardware will react
> instantaneously (at least as seen from within the VM), but hardware might
> need more time to react (though offhand I'd expect HW to be fast enough for
> this particular timer).
> 
> So, it looks like the code might already be ready to run on physical
> hardware, and if it has (or does already as part of a regular test), then
> that does raise some doubt on what's the appropriate code change to make
> this work.

Nadav has been running tests on bare metal, maybe he can weigh in on
whether or not test_apic_change_mode() passes on bare metal.
Nadav Amit Aug. 20, 2019, 4:08 a.m. UTC | #4
> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> +Cc Nadav
> 
> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
>> sean.j.christopherson@intel.com> wrote:
>> 
>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
>>>> On 20/08/19 01:04, Matt delco wrote:
>>>>> From: Matt Delco <delco@google.com>
>>>>> 
>>>>> Time seems to eventually stop in a Windows VM when using Skype.
>>>>> Instrumentation shows that the OS is frequently switching the APIC
>>>>> timer between one-shot and periodic mode.  The OS is typically writing
>>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
>>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
>>> events
>>>>> are received by the OS since the timer is only re-armed on TMICT
>>> writes.
>>>>> With this change time continues to advance in the VM.  TBD if physical
>>>>> hardware will reset the current count if/when the mode is changed to
>>>>> period and the current count is zero.
>>>>> 
>>>>> Signed-off-by: Matt Delco <delco@google.com>
>>>>> ---
>>>>> arch/x86/kvm/lapic.c | 9 +++++++--
>>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index 685d17c11461..fddd810eeca5 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
>>> *apic, u32 reg, u32 val)
>>>>>            break;
>>>>> 
>>>>> -   case APIC_LVTT:
>>>>> +   case APIC_LVTT: {
>>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
>>>>>            if (!kvm_apic_sw_enabled(apic))
>>>>>                    val |= APIC_LVT_MASKED;
>>>>>            val &= (apic_lvt_mask[0] |
>>> apic->lapic_timer.timer_mode_mask);
>>>>>            kvm_lapic_set_reg(apic, APIC_LVTT, val);
>>>>>            apic_update_lvtt(apic);
>>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
>>>>> +               apic_lvtt_period(apic) &&
>>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
>>>>> +                   start_apic_timer(apic);
>>>> 
>>>> Still, this needs some more explanation.  Can you cover this, as well as
>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
>>>> testcase?  Then we could try running it on bare metal and see what
>>> happens.
>> 
>> I looked at apic.c and test_apic_change_mode() might already be testing
>> this.  It sets oneshot & TMICT, waits for the current value to get
>> half-way, changes the mode to periodic, and then tries to test that the
>> value wraps back to the upper half.  It then waits again for the half-way
>> point, changes the mode back to oneshot, and waits for zero.  After
>> reaching zero it does:
>> 
>> /* now tmcct == 0 and tmict != 0 */
>> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
>> 
>> which seems to be testing that oneshot->periodic won't reset the timer if
>> it's already zero.  A possible caveat is there's hardly any delay between
>> the mode change and the timer read.  Emulated hardware will react
>> instantaneously (at least as seen from within the VM), but hardware might
>> need more time to react (though offhand I'd expect HW to be fast enough for
>> this particular timer).
>> 
>> So, it looks like the code might already be ready to run on physical
>> hardware, and if it has (or does already as part of a regular test), then
>> that does raise some doubt on what's the appropriate code change to make
>> this work.
> 
> Nadav has been running tests on bare metal, maybe he can weigh in on
> whether or not test_apic_change_mode() passes on bare metal.

These tests pass on bare-metal.
Wanpeng Li Aug. 20, 2019, 5:08 a.m. UTC | #5
On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
>
> > On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >
> > +Cc Nadav
> >
> > On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
> >> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
> >> sean.j.christopherson@intel.com> wrote:
> >>
> >>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
> >>>> On 20/08/19 01:04, Matt delco wrote:
> >>>>> From: Matt Delco <delco@google.com>
> >>>>>
> >>>>> Time seems to eventually stop in a Windows VM when using Skype.
> >>>>> Instrumentation shows that the OS is frequently switching the APIC
> >>>>> timer between one-shot and periodic mode.  The OS is typically writing
> >>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
> >>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
> >>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
> >>> events
> >>>>> are received by the OS since the timer is only re-armed on TMICT
> >>> writes.
> >>>>> With this change time continues to advance in the VM.  TBD if physical
> >>>>> hardware will reset the current count if/when the mode is changed to
> >>>>> period and the current count is zero.
> >>>>>
> >>>>> Signed-off-by: Matt Delco <delco@google.com>
> >>>>> ---
> >>>>> arch/x86/kvm/lapic.c | 9 +++++++--
> >>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>>>> index 685d17c11461..fddd810eeca5 100644
> >>>>> --- a/arch/x86/kvm/lapic.c
> >>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
> >>> *apic, u32 reg, u32 val)
> >>>>>            break;
> >>>>>
> >>>>> -   case APIC_LVTT:
> >>>>> +   case APIC_LVTT: {
> >>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
> >>>>>            if (!kvm_apic_sw_enabled(apic))
> >>>>>                    val |= APIC_LVT_MASKED;
> >>>>>            val &= (apic_lvt_mask[0] |
> >>> apic->lapic_timer.timer_mode_mask);
> >>>>>            kvm_lapic_set_reg(apic, APIC_LVTT, val);
> >>>>>            apic_update_lvtt(apic);
> >>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> >>>>> +               apic_lvtt_period(apic) &&
> >>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
> >>>>> +                   start_apic_timer(apic);
> >>>>
> >>>> Still, this needs some more explanation.  Can you cover this, as well as
> >>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
> >>>> testcase?  Then we could try running it on bare metal and see what
> >>> happens.
> >>
> >> I looked at apic.c and test_apic_change_mode() might already be testing
> >> this.  It sets oneshot & TMICT, waits for the current value to get
> >> half-way, changes the mode to periodic, and then tries to test that the
> >> value wraps back to the upper half.  It then waits again for the half-way
> >> point, changes the mode back to oneshot, and waits for zero.  After
> >> reaching zero it does:
> >>
> >> /* now tmcct == 0 and tmict != 0 */
> >> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
> >> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
> >>
> >> which seems to be testing that oneshot->periodic won't reset the timer if
> >> it's already zero.  A possible caveat is there's hardly any delay between
> >> the mode change and the timer read.  Emulated hardware will react
> >> instantaneously (at least as seen from within the VM), but hardware might
> >> need more time to react (though offhand I'd expect HW to be fast enough for
> >> this particular timer).
> >>
> >> So, it looks like the code might already be ready to run on physical
> >> hardware, and if it has (or does already as part of a regular test), then
> >> that does raise some doubt on what's the appropriate code change to make
> >> this work.
> >
> > Nadav has been running tests on bare metal, maybe he can weigh in on
> > whether or not test_apic_change_mode() passes on bare metal.
>
> These tests pass on bare-metal.

Good to know this. In addition, in linux apic driver, during mode
switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
issue Matt report. So is it because there is no such stuff in windows
or the windows version which Matt testing is too old?

Regards,
Wanpeng Li
Matt Delco Aug. 20, 2019, 7:34 a.m. UTC | #6
On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
> >
> > > On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > >
> > > +Cc Nadav
> > >
> > > On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
> > >> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
> > >> sean.j.christopherson@intel.com> wrote:
> > >>
> > >>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
> > >>>> On 20/08/19 01:04, Matt delco wrote:
> > >>>>> From: Matt Delco <delco@google.com>
> > >>>>>
> > >>>>> Time seems to eventually stop in a Windows VM when using Skype.
> > >>>>> Instrumentation shows that the OS is frequently switching the APIC
> > >>>>> timer between one-shot and periodic mode.  The OS is typically writing
> > >>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
> > >>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
> > >>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
> > >>> events
> > >>>>> are received by the OS since the timer is only re-armed on TMICT
> > >>> writes.
> > >>>>> With this change time continues to advance in the VM.  TBD if physical
> > >>>>> hardware will reset the current count if/when the mode is changed to
> > >>>>> period and the current count is zero.
> > >>>>>
> > >>>>> Signed-off-by: Matt Delco <delco@google.com>
> > >>>>> ---
> > >>>>> arch/x86/kvm/lapic.c | 9 +++++++--
> > >>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
> > >>>>>
> > >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > >>>>> index 685d17c11461..fddd810eeca5 100644
> > >>>>> --- a/arch/x86/kvm/lapic.c
> > >>>>> +++ b/arch/x86/kvm/lapic.c
> > >>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
> > >>> *apic, u32 reg, u32 val)
> > >>>>>            break;
> > >>>>>
> > >>>>> -   case APIC_LVTT:
> > >>>>> +   case APIC_LVTT: {
> > >>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
> > >>>>>            if (!kvm_apic_sw_enabled(apic))
> > >>>>>                    val |= APIC_LVT_MASKED;
> > >>>>>            val &= (apic_lvt_mask[0] |
> > >>> apic->lapic_timer.timer_mode_mask);
> > >>>>>            kvm_lapic_set_reg(apic, APIC_LVTT, val);
> > >>>>>            apic_update_lvtt(apic);
> > >>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> > >>>>> +               apic_lvtt_period(apic) &&
> > >>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
> > >>>>> +                   start_apic_timer(apic);
> > >>>>
> > >>>> Still, this needs some more explanation.  Can you cover this, as well as
> > >>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
> > >>>> testcase?  Then we could try running it on bare metal and see what
> > >>> happens.
> > >>
> > >> I looked at apic.c and test_apic_change_mode() might already be testing
> > >> this.  It sets oneshot & TMICT, waits for the current value to get
> > >> half-way, changes the mode to periodic, and then tries to test that the
> > >> value wraps back to the upper half.  It then waits again for the half-way
> > >> point, changes the mode back to oneshot, and waits for zero.  After
> > >> reaching zero it does:
> > >>
> > >> /* now tmcct == 0 and tmict != 0 */
> > >> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
> > >> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
> > >>
> > >> which seems to be testing that oneshot->periodic won't reset the timer if
> > >> it's already zero.  A possible caveat is there's hardly any delay between
> > >> the mode change and the timer read.  Emulated hardware will react
> > >> instantaneously (at least as seen from within the VM), but hardware might
> > >> need more time to react (though offhand I'd expect HW to be fast enough for
> > >> this particular timer).
> > >>
> > >> So, it looks like the code might already be ready to run on physical
> > >> hardware, and if it has (or does already as part of a regular test), then
> > >> that does raise some doubt on what's the appropriate code change to make
> > >> this work.
> > >
> > > Nadav has been running tests on bare metal, maybe he can weigh in on
> > > whether or not test_apic_change_mode() passes on bare metal.
> >
> > These tests pass on bare-metal.
>
> Good to know this. In addition, in linux apic driver, during mode
> switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
> clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
> issue Matt report. So is it because there is no such stuff in windows
> or the windows version which Matt testing is too old?

I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to
request higher frequency clocks, and this in turn can affect how the
kernel configures HW timers.  I may need to examine how Windows
typically interacts with the APIC timer and see if/how this changes
when Skype is used.  The frequent timer mode changes are not something
I'd expect a reasonably behaved kernel to do.
Nadav Amit Aug. 20, 2019, 4:33 p.m. UTC | #7
> On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 
> On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
>>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> +Cc Nadav
>>> 
>>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
>>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
>>>> sean.j.christopherson@intel.com> wrote:
>>>> 
>>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
>>>>>> On 20/08/19 01:04, Matt delco wrote:
>>>>>>> From: Matt Delco <delco@google.com>
>>>>>>> 
>>>>>>> Time seems to eventually stop in a Windows VM when using Skype.
>>>>>>> Instrumentation shows that the OS is frequently switching the APIC
>>>>>>> timer between one-shot and periodic mode.  The OS is typically writing
>>>>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
>>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
>>>>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
>>>>> events
>>>>>>> are received by the OS since the timer is only re-armed on TMICT
>>>>> writes.
>>>>>>> With this change time continues to advance in the VM.  TBD if physical
>>>>>>> hardware will reset the current count if/when the mode is changed to
>>>>>>> period and the current count is zero.
>>>>>>> 
>>>>>>> Signed-off-by: Matt Delco <delco@google.com>
>>>>>>> ---
>>>>>>> arch/x86/kvm/lapic.c | 9 +++++++--
>>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>>> index 685d17c11461..fddd810eeca5 100644
>>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
>>>>> *apic, u32 reg, u32 val)
>>>>>>>           break;
>>>>>>> 
>>>>>>> -   case APIC_LVTT:
>>>>>>> +   case APIC_LVTT: {
>>>>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
>>>>>>>           if (!kvm_apic_sw_enabled(apic))
>>>>>>>                   val |= APIC_LVT_MASKED;
>>>>>>>           val &= (apic_lvt_mask[0] |
>>>>> apic->lapic_timer.timer_mode_mask);
>>>>>>>           kvm_lapic_set_reg(apic, APIC_LVTT, val);
>>>>>>>           apic_update_lvtt(apic);
>>>>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
>>>>>>> +               apic_lvtt_period(apic) &&
>>>>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
>>>>>>> +                   start_apic_timer(apic);
>>>>>> 
>>>>>> Still, this needs some more explanation.  Can you cover this, as well as
>>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
>>>>>> testcase?  Then we could try running it on bare metal and see what
>>>>> happens.
>>>> 
>>>> I looked at apic.c and test_apic_change_mode() might already be testing
>>>> this.  It sets oneshot & TMICT, waits for the current value to get
>>>> half-way, changes the mode to periodic, and then tries to test that the
>>>> value wraps back to the upper half.  It then waits again for the half-way
>>>> point, changes the mode back to oneshot, and waits for zero.  After
>>>> reaching zero it does:
>>>> 
>>>> /* now tmcct == 0 and tmict != 0 */
>>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
>>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
>>>> 
>>>> which seems to be testing that oneshot->periodic won't reset the timer if
>>>> it's already zero.  A possible caveat is there's hardly any delay between
>>>> the mode change and the timer read.  Emulated hardware will react
>>>> instantaneously (at least as seen from within the VM), but hardware might
>>>> need more time to react (though offhand I'd expect HW to be fast enough for
>>>> this particular timer).
>>>> 
>>>> So, it looks like the code might already be ready to run on physical
>>>> hardware, and if it has (or does already as part of a regular test), then
>>>> that does raise some doubt on what's the appropriate code change to make
>>>> this work.
>>> 
>>> Nadav has been running tests on bare metal, maybe he can weigh in on
>>> whether or not test_apic_change_mode() passes on bare metal.
>> 
>> These tests pass on bare-metal.
> 
> Good to know this. In addition, in linux apic driver, during mode
> switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
> clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
> issue Matt report. So is it because there is no such stuff in windows
> or the windows version which Matt testing is too old?

I find it kind of disappointing that you (and others) did not try the
kvm-unit-tests of bare-metal. :(

It should be working, once Paolo (ahem..) applies the one pending patch. You
do need a serial console though (which is usually available through
ilo/idrac/etc). It should also work with UEFI/kexec, although I did not run
such tests.
Wanpeng Li Aug. 21, 2019, 12:19 a.m. UTC | #8
On Wed, 21 Aug 2019 at 00:33, Nadav Amit <nadav.amit@gmail.com> wrote:
>
> > On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
> >>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >>>
> >>> +Cc Nadav
> >>>
> >>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
> >>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
> >>>> sean.j.christopherson@intel.com> wrote:
> >>>>
> >>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
> >>>>>> On 20/08/19 01:04, Matt delco wrote:
> >>>>>>> From: Matt Delco <delco@google.com>
> >>>>>>>
> >>>>>>> Time seems to eventually stop in a Windows VM when using Skype.
> >>>>>>> Instrumentation shows that the OS is frequently switching the APIC
> >>>>>>> timer between one-shot and periodic mode.  The OS is typically writing
> >>>>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
> >>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
> >>>>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
> >>>>> events
> >>>>>>> are received by the OS since the timer is only re-armed on TMICT
> >>>>> writes.
> >>>>>>> With this change time continues to advance in the VM.  TBD if physical
> >>>>>>> hardware will reset the current count if/when the mode is changed to
> >>>>>>> period and the current count is zero.
> >>>>>>>
> >>>>>>> Signed-off-by: Matt Delco <delco@google.com>
> >>>>>>> ---
> >>>>>>> arch/x86/kvm/lapic.c | 9 +++++++--
> >>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>>>>>> index 685d17c11461..fddd810eeca5 100644
> >>>>>>> --- a/arch/x86/kvm/lapic.c
> >>>>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
> >>>>> *apic, u32 reg, u32 val)
> >>>>>>>           break;
> >>>>>>>
> >>>>>>> -   case APIC_LVTT:
> >>>>>>> +   case APIC_LVTT: {
> >>>>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
> >>>>>>>           if (!kvm_apic_sw_enabled(apic))
> >>>>>>>                   val |= APIC_LVT_MASKED;
> >>>>>>>           val &= (apic_lvt_mask[0] |
> >>>>> apic->lapic_timer.timer_mode_mask);
> >>>>>>>           kvm_lapic_set_reg(apic, APIC_LVTT, val);
> >>>>>>>           apic_update_lvtt(apic);
> >>>>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
> >>>>>>> +               apic_lvtt_period(apic) &&
> >>>>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
> >>>>>>> +                   start_apic_timer(apic);
> >>>>>>
> >>>>>> Still, this needs some more explanation.  Can you cover this, as well as
> >>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
> >>>>>> testcase?  Then we could try running it on bare metal and see what
> >>>>> happens.
> >>>>
> >>>> I looked at apic.c and test_apic_change_mode() might already be testing
> >>>> this.  It sets oneshot & TMICT, waits for the current value to get
> >>>> half-way, changes the mode to periodic, and then tries to test that the
> >>>> value wraps back to the upper half.  It then waits again for the half-way
> >>>> point, changes the mode back to oneshot, and waits for zero.  After
> >>>> reaching zero it does:
> >>>>
> >>>> /* now tmcct == 0 and tmict != 0 */
> >>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
> >>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
> >>>>
> >>>> which seems to be testing that oneshot->periodic won't reset the timer if
> >>>> it's already zero.  A possible caveat is there's hardly any delay between
> >>>> the mode change and the timer read.  Emulated hardware will react
> >>>> instantaneously (at least as seen from within the VM), but hardware might
> >>>> need more time to react (though offhand I'd expect HW to be fast enough for
> >>>> this particular timer).
> >>>>
> >>>> So, it looks like the code might already be ready to run on physical
> >>>> hardware, and if it has (or does already as part of a regular test), then
> >>>> that does raise some doubt on what's the appropriate code change to make
> >>>> this work.
> >>>
> >>> Nadav has been running tests on bare metal, maybe he can weigh in on
> >>> whether or not test_apic_change_mode() passes on bare metal.
> >>
> >> These tests pass on bare-metal.
> >
> > Good to know this. In addition, in linux apic driver, during mode
> > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
> > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
> > issue Matt report. So is it because there is no such stuff in windows
> > or the windows version which Matt testing is too old?
>
> I find it kind of disappointing that you (and others) did not try the
> kvm-unit-tests of bare-metal. :(

Origianlly xen guys confirm the testcase on bare-metal, thanks for
your double confirm.

Regards,
Wanpeng Li
Nadav Amit Aug. 21, 2019, 12:26 a.m. UTC | #9
> On Aug 20, 2019, at 5:19 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 
> On Wed, 21 Aug 2019 at 00:33, Nadav Amit <nadav.amit@gmail.com> wrote:
>>> On Aug 19, 2019, at 10:08 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> 
>>> On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
>>>>> On Aug 19, 2019, at 6:56 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>>> 
>>>>> +Cc Nadav
>>>>> 
>>>>> On Mon, Aug 19, 2019 at 06:07:01PM -0700, Matt Delco wrote:
>>>>>> On Mon, Aug 19, 2019 at 5:37 PM Sean Christopherson <
>>>>>> sean.j.christopherson@intel.com> wrote:
>>>>>> 
>>>>>>> On Tue, Aug 20, 2019 at 01:42:37AM +0200, Paolo Bonzini wrote:
>>>>>>>> On 20/08/19 01:04, Matt delco wrote:
>>>>>>>>> From: Matt Delco <delco@google.com>
>>>>>>>>> 
>>>>>>>>> Time seems to eventually stop in a Windows VM when using Skype.
>>>>>>>>> Instrumentation shows that the OS is frequently switching the APIC
>>>>>>>>> timer between one-shot and periodic mode.  The OS is typically writing
>>>>>>>>> to both LVTT and TMICT.  When time stops the sequence observed is that
>>>>>>>>> the APIC was in one-shot mode, the timer expired, and the OS writes to
>>>>>>>>> LVTT (but not TMICT) to change to periodic mode.  No future timer
>>>>>>> events
>>>>>>>>> are received by the OS since the timer is only re-armed on TMICT
>>>>>>> writes.
>>>>>>>>> With this change time continues to advance in the VM.  TBD if physical
>>>>>>>>> hardware will reset the current count if/when the mode is changed to
>>>>>>>>> period and the current count is zero.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Matt Delco <delco@google.com>
>>>>>>>>> ---
>>>>>>>>> arch/x86/kvm/lapic.c | 9 +++++++--
>>>>>>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>>>>> index 685d17c11461..fddd810eeca5 100644
>>>>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>>>>> @@ -1935,14 +1935,19 @@ int kvm_lapic_reg_write(struct kvm_lapic
>>>>>>> *apic, u32 reg, u32 val)
>>>>>>>>>          break;
>>>>>>>>> 
>>>>>>>>> -   case APIC_LVTT:
>>>>>>>>> +   case APIC_LVTT: {
>>>>>>>>> +           u32 timer_mode = apic->lapic_timer.timer_mode;
>>>>>>>>>          if (!kvm_apic_sw_enabled(apic))
>>>>>>>>>                  val |= APIC_LVT_MASKED;
>>>>>>>>>          val &= (apic_lvt_mask[0] |
>>>>>>> apic->lapic_timer.timer_mode_mask);
>>>>>>>>>          kvm_lapic_set_reg(apic, APIC_LVTT, val);
>>>>>>>>>          apic_update_lvtt(apic);
>>>>>>>>> +           if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
>>>>>>>>> +               apic_lvtt_period(apic) &&
>>>>>>>>> +               !hrtimer_active(&apic->lapic_timer.timer))
>>>>>>>>> +                   start_apic_timer(apic);
>>>>>>>> 
>>>>>>>> Still, this needs some more explanation.  Can you cover this, as well as
>>>>>>>> the oneshot->periodic transition, in kvm-unit-tests' x86/apic.c
>>>>>>>> testcase?  Then we could try running it on bare metal and see what
>>>>>>> happens.
>>>>>> 
>>>>>> I looked at apic.c and test_apic_change_mode() might already be testing
>>>>>> this.  It sets oneshot & TMICT, waits for the current value to get
>>>>>> half-way, changes the mode to periodic, and then tries to test that the
>>>>>> value wraps back to the upper half.  It then waits again for the half-way
>>>>>> point, changes the mode back to oneshot, and waits for zero.  After
>>>>>> reaching zero it does:
>>>>>> 
>>>>>> /* now tmcct == 0 and tmict != 0 */
>>>>>> apic_change_mode(APIC_LVT_TIMER_PERIODIC);
>>>>>> report("TMCCT should stay at zero", !apic_read(APIC_TMCCT));
>>>>>> 
>>>>>> which seems to be testing that oneshot->periodic won't reset the timer if
>>>>>> it's already zero.  A possible caveat is there's hardly any delay between
>>>>>> the mode change and the timer read.  Emulated hardware will react
>>>>>> instantaneously (at least as seen from within the VM), but hardware might
>>>>>> need more time to react (though offhand I'd expect HW to be fast enough for
>>>>>> this particular timer).
>>>>>> 
>>>>>> So, it looks like the code might already be ready to run on physical
>>>>>> hardware, and if it has (or does already as part of a regular test), then
>>>>>> that does raise some doubt on what's the appropriate code change to make
>>>>>> this work.
>>>>> 
>>>>> Nadav has been running tests on bare metal, maybe he can weigh in on
>>>>> whether or not test_apic_change_mode() passes on bare metal.
>>>> 
>>>> These tests pass on bare-metal.
>>> 
>>> Good to know this. In addition, in linux apic driver, during mode
>>> switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
>>> clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
>>> issue Matt report. So is it because there is no such stuff in windows
>>> or the windows version which Matt testing is too old?
>> 
>> I find it kind of disappointing that you (and others) did not try the
>> kvm-unit-tests of bare-metal. :(
> 
> Origianlly xen guys confirm the testcase on bare-metal, thanks for
> your double confirm.

No worries, I don’t look for a “thank you” note. ;-)
Sean Christopherson Aug. 21, 2019, 5:17 p.m. UTC | #10
On Tue, Aug 20, 2019 at 12:34:20AM -0700, Matt Delco wrote:
> On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
> > > These tests pass on bare-metal.
> >
> > Good to know this. In addition, in linux apic driver, during mode
> > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
> > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
> > issue Matt report. So is it because there is no such stuff in windows
> > or the windows version which Matt testing is too old?
> 
> I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to
> request higher frequency clocks, and this in turn can affect how the
> kernel configures HW timers.  I may need to examine how Windows
> typically interacts with the APIC timer and see if/how this changes
> when Skype is used.  The frequent timer mode changes are not something
> I'd expect a reasonably behaved kernel to do.

Have you tried analyzing the guest code?  If we're lucky, doing so might
provide insight into what's going awry.

E.g.:

  Are the LVTT/TMICT writes are coming from a single blob/sequence of code
  in the guest?

  Is the unpaired LVTT coming from the same code sequence or is it a new
  rip entirely?

  Can you dump the relevant asm code sequences?
Matt Delco Aug. 21, 2019, 6:03 p.m. UTC | #11
On Wed, Aug 21, 2019 at 10:17 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
> On Tue, Aug 20, 2019 at 12:34:20AM -0700, Matt Delco wrote:
> > On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@gmail.com> wrote:
> > > > These tests pass on bare-metal.
> > >
> > > Good to know this. In addition, in linux apic driver, during mode
> > > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of
> > > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the
> > > issue Matt report. So is it because there is no such stuff in windows
> > > or the windows version which Matt testing is too old?
> >
> > I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to
> > request higher frequency clocks, and this in turn can affect how the
> > kernel configures HW timers.  I may need to examine how Windows
> > typically interacts with the APIC timer and see if/how this changes
> > when Skype is used.  The frequent timer mode changes are not something
> > I'd expect a reasonably behaved kernel to do.
>
> Have you tried analyzing the guest code?  If we're lucky, doing so might
> provide insight into what's going awry.
>
> E.g.:
>
>   Are the LVTT/TMICT writes are coming from a single blob/sequence of code
>   in the guest?
>
>   Is the unpaired LVTT coming from the same code sequence or is it a new
>   rip entirely?
>
>   Can you dump the relevant asm code sequences?

I have changed gears to do runtime behavioral analysis, given the
reports that the code change I proposed would deviate from hardware.
The time between writes for TMICT-then-LVTT is typically quite small,
and much smaller than the average for LVTT-then-TMICT.  On the lead up
to where time stops there's alternating writes to TMICT and LVTT,
where each write to LVTT alternates between setting periodic vs.
one-shot.  The final write to LVTT (which sets periodic) comes more
than 1.5 ms after the prior TMICT (which is about 100x the typical
delay), which might mean the kernel opted to not write to TMICT but
did on the next clock tick.  The host kernel & kvm I've been testing
with seems to be firing the timer callbacks sooner than requested, so
if the guest kernel has optimizations based on whether it thinks
there's time left on the APIC timer then this might be causing
problems.  I'm going to try to pull in some of the newer kvm changes
that appear to compensate for the early delivery and see if that also
makes the time hang symptom disappear (if not then I may start to
examine things from the guest side).  Thanks.
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 685d17c11461..fddd810eeca5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1935,14 +1935,19 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 		break;
 
-	case APIC_LVTT:
+	case APIC_LVTT: {
+		u32 timer_mode = apic->lapic_timer.timer_mode;
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
 		val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
 		kvm_lapic_set_reg(apic, APIC_LVTT, val);
 		apic_update_lvtt(apic);
+		if (timer_mode == APIC_LVT_TIMER_ONESHOT &&
+		    apic_lvtt_period(apic) &&
+		    !hrtimer_active(&apic->lapic_timer.timer))
+			start_apic_timer(apic);
 		break;
-
+	}
 	case APIC_TMICT:
 		if (apic_lvtt_tscdeadline(apic))
 			break;