diff mbox

[v3] x86/apicv: fix RTC periodic timer and apicv issue

Message ID E0A769A898ADB6449596C41F51EF62C6AD751E@SZXEMI506-MBX.china.huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xuquan (Euler) Dec. 16, 2016, 9:40 a.m. UTC
From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17 00:00:00 2001
From: Quan Xu <xuquan8@huawei.com>
Date: Fri, 16 Dec 2016 17:24:01 +0800
Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.

And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.

So we set eoi_exit_bitmap for intack.vector when it's higher than
pending periodic time interrupts. This way we can guarantee there's
always a chance to post periodic time interrupts when periodic time
interrupts becomes the highest one.

Signed-off-by: Quan Xu <xuquan8@huawei.com>
---
 xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--
1.7.12.4

Comments

Tian, Kevin Dec. 20, 2016, 5:37 a.m. UTC | #1
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Friday, December 16, 2016 5:40 PM
> 
> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17 00:00:00 2001
> From: Quan Xu <xuquan8@huawei.com>
> Date: Fri, 16 Dec 2016 17:24:01 +0800
> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv issue
> 
> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> guest with high payload (with 2vCPU, captured from xentrace, in
> high payload, the count of IPI interrupt increases rapidly between
> these vCPUs).
> 
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> are both pending (index of bit set in vIRR), unfortunately, the IPI
> intrrupt is high priority than periodic timer interrupt. Xen updates
> IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
> priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
> within VMX non-root operation without a VM-Exit. Within VMX non-root
> operation, if periodic timer interrupt index of bit is set in vIRR and
> highest, the apicv delivers periodic timer interrupt within VMX non-root
> operation as well.
> 
> But in current code, if Xen doesn't update periodic timer interrupt bit
> set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
> of this case to decrease the count (pending_intr_nr) of pending periodic
> timer interrupt, then Xen will deliver a periodic timer interrupt again.
> 
> And that we update periodic timer interrupt in every VM-entry, there is
> a chance that already-injected instance (before EOI-induced exit happens)
> will incur another pending IRR setting if there is a VM-exit happens
> between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
> exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
> guest receives more periodic timer interrupt.
> 
> So we set eoi_exit_bitmap for intack.vector when it's higher than
> pending periodic time interrupts. This way we can guarantee there's
> always a chance to post periodic time interrupts when periodic time
> interrupts becomes the highest one.
> 
> Signed-off-by: Quan Xu <xuquan8@huawei.com>

I suppose you've verified this new version, but still would like get
your explicit confirmation - did you still see time accuracy issue
in your side? Have you tried other guest OS types other than 
Win7-32?

> ---
>  xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> index 639a705..d7a5716 100644
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -315,9 +315,17 @@ void vmx_intr_assist(void)
>          * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
>          * exit, then pending periodic time interrups have the chance to be injected
>          * for compensation
> +        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
> +        * periodic time interrupts. This way we can guarantee there's always a chance
> +        * to post periodic time interrupts when periodic time interrupts becomes the
> +        * highest one
>          */
> -        if (pt_vector != -1)
> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> +        if ( pt_vector != -1 ) {
> +            if ( intack.vector > pt_vector )
> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> +            else
> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> +        }

Above can be simplified as one line change:
	if ( pt_vector != -1 )
		vmx_set_eoi_exit_bitmap(v, intack.vector);

> 
>          /* we need update the RVI field */
>          __vmread(GUEST_INTR_STATUS, &status);
> @@ -334,7 +342,8 @@ void vmx_intr_assist(void)
>              __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>          }
> 
> -        pt_intr_post(v, intack);
> +        if ( intack.vector == pt_vector )
> +            pt_intr_post(v, intack);
>      }
>      else
>      {
> --
> 1.7.12.4
Xuquan (Euler) Dec. 20, 2016, 5:54 a.m. UTC | #2
On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Friday, December 16, 2016 5:40 PM
>>
>> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17 00:00:00
>2001
>> From: Quan Xu <xuquan8@huawei.com>
>> Date: Fri, 16 Dec 2016 17:24:01 +0800
>> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these
>> vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in vIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in vIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM-Exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> vIRR and highest, the apicv delivers periodic timer interrupt within
>> VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in vIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer interrupt
>again.
>>
>> And that we update periodic timer interrupt in every VM-entry, there
>> is a chance that already-injected instance (before EOI-induced exit
>> happens) will incur another pending IRR setting if there is a VM-exit
>> happens between virtual interrupt injection (vIRR->0, vISR->1) and
>> EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked
>> yet, then the guest receives more periodic timer interrupt.
>>
>> So we set eoi_exit_bitmap for intack.vector when it's higher than
>> pending periodic time interrupts. This way we can guarantee there's
>> always a chance to post periodic time interrupts when periodic time
>> interrupts becomes the highest one.
>>
>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>
>I suppose you've verified this new version, but still would like get your
>explicit confirmation - did you still see time accuracy issue in your side?
>Have you tried other guest OS types other than Win7-32?
>

I only verified it on win7-32 guest..
I will continue to verify it on other windows guest (I think windows is enough, right?)


>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 639a705..d7a5716 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -315,9 +315,17 @@ void vmx_intr_assist(void)
>>          * Set eoi_exit_bitmap for periodic timer interrup to cause
>EOI-induced VM
>>          * exit, then pending periodic time interrups have the chance
>to be injected
>>          * for compensation
>> +        * Set eoi_exit_bitmap for intack.vector when it's higher than
>pending
>> +        * periodic time interrupts. This way we can guarantee there's
>always a chance
>> +        * to post periodic time interrupts when periodic time
>interrupts becomes the
>> +        * highest one
>>          */
>> -        if (pt_vector != -1)
>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        if ( pt_vector != -1 ) {
>> +            if ( intack.vector > pt_vector )
>> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> +            else
>> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        }
>
>Above can be simplified as one line change:
>	if ( pt_vector != -1 )
>		vmx_set_eoi_exit_bitmap(v, intack.vector);
>

Agreed.. I found this change doesn't look good, but I had no idea to improve it.. thanks.
Also sorry for the late v3.

Quan
Jan Beulich Dec. 20, 2016, 8:32 a.m. UTC | #3
>>> On 20.12.16 at 06:54, <xuquan8@huawei.com> wrote:
> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>>> Sent: Friday, December 16, 2016 5:40 PM
>>>
>>> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17 00:00:00
>>2001
>>> From: Quan Xu <xuquan8@huawei.com>
>>> Date: Fri, 16 Dec 2016 17:24:01 +0800
>>> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv issue
>>>
>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>> payload, the count of IPI interrupt increases rapidly between these
>>> vCPUs).
>>>
>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>> 0xd1) are both pending (index of bit set in vIRR), unfortunately, the
>>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>>> updates IPI interrupt bit set in vIRR to guest interrupt status (RVI)
>>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>>> interrupt within VMX non-root operation without a VM-Exit. Within VMX
>>> non-root operation, if periodic timer interrupt index of bit is set in
>>> vIRR and highest, the apicv delivers periodic timer interrupt within
>>> VMX non-root operation as well.
>>>
>>> But in current code, if Xen doesn't update periodic timer interrupt
>>> bit set in vIRR to guest interrupt status (RVI) directly, Xen is not
>>> aware of this case to decrease the count (pending_intr_nr) of pending
>>> periodic timer interrupt, then Xen will deliver a periodic timer interrupt
>>again.
>>>
>>> And that we update periodic timer interrupt in every VM-entry, there
>>> is a chance that already-injected instance (before EOI-induced exit
>>> happens) will incur another pending IRR setting if there is a VM-exit
>>> happens between virtual interrupt injection (vIRR->0, vISR->1) and
>>> EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked
>>> yet, then the guest receives more periodic timer interrupt.
>>>
>>> So we set eoi_exit_bitmap for intack.vector when it's higher than
>>> pending periodic time interrupts. This way we can guarantee there's
>>> always a chance to post periodic time interrupts when periodic time
>>> interrupts becomes the highest one.
>>>
>>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>>
>>I suppose you've verified this new version, but still would like get your
>>explicit confirmation - did you still see time accuracy issue in your side?
>>Have you tried other guest OS types other than Win7-32?
>>
> 
> I only verified it on win7-32 guest..
> I will continue to verify it on other windows guest (I think windows is 
> enough, right?)

No, I don't think Windows alone is sufficient for verification. People
run all kinds of OSes as HVM guests, and your change should not
negatively impact them. At the very least you want to also try Linux.

Jan
Jan Beulich Dec. 20, 2016, 8:34 a.m. UTC | #4
>>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Friday, December 16, 2016 5:40 PM
>> -        if (pt_vector != -1)
>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        if ( pt_vector != -1 ) {
>> +            if ( intack.vector > pt_vector )
>> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> +            else
>> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        }
> 
> Above can be simplified as one line change:
> 	if ( pt_vector != -1 )
> 		vmx_set_eoi_exit_bitmap(v, intack.vector);

Hmm, I don't understand. Did you mean to use max() here? Or
else how is this an equivalent of the originally proposed code?

Jan
Tian, Kevin Dec. 20, 2016, 8:53 a.m. UTC | #5
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, December 20, 2016 4:35 PM
> 
> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Friday, December 16, 2016 5:40 PM
> >> -        if (pt_vector != -1)
> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        if ( pt_vector != -1 ) {
> >> +            if ( intack.vector > pt_vector )
> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> +            else
> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        }
> >
> > Above can be simplified as one line change:
> > 	if ( pt_vector != -1 )
> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
> 
> Hmm, I don't understand. Did you mean to use max() here? Or
> else how is this an equivalent of the originally proposed code?
> 

Original code is not 100% correct. The purpose is to set EOI exit
bitmap for any vector which may block injection of pt_vector - 
give chance to recognize pt_vector in future intack and then do pt 
intr post. The simplified code achieves this effect same as original
code if intack.vector >= vector. I cannot come up a case why
intack.vector might be smaller than vector. If this case happens,
we still need enable exit bitmap for intack.vector instead of
pt_vector for said purpose while original code did it wrong.

Thanks
Kevin
Jan Beulich Dec. 20, 2016, 8:57 a.m. UTC | #6
>>> On 20.12.16 at 09:53, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, December 20, 2016 4:35 PM
>> 
>> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> -        if (pt_vector != -1)
>> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        if ( pt_vector != -1 ) {
>> >> +            if ( intack.vector > pt_vector )
>> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> +            else
>> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        }
>> >
>> > Above can be simplified as one line change:
>> > 	if ( pt_vector != -1 )
>> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
>> 
>> Hmm, I don't understand. Did you mean to use max() here? Or
>> else how is this an equivalent of the originally proposed code?
>> 
> 
> Original code is not 100% correct. The purpose is to set EOI exit
> bitmap for any vector which may block injection of pt_vector - 
> give chance to recognize pt_vector in future intack and then do pt 
> intr post. The simplified code achieves this effect same as original
> code if intack.vector >= vector. I cannot come up a case why
> intack.vector might be smaller than vector. If this case happens,
> we still need enable exit bitmap for intack.vector instead of
> pt_vector for said purpose while original code did it wrong.

Ah, okay. Thanks for explaining this to me.

Jan
Xuquan (Euler) Dec. 20, 2016, 9:33 a.m. UTC | #7
On December 20, 2016 4:54 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, December 20, 2016 4:35 PM
>>
>> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> -        if (pt_vector != -1)
>> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        if ( pt_vector != -1 ) {
>> >> +            if ( intack.vector > pt_vector )
>> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> +            else
>> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        }
>> >
>> > Above can be simplified as one line change:
>> > 	if ( pt_vector != -1 )
>> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
>>
>> Hmm, I don't understand. Did you mean to use max() here? Or else how
>> is this an equivalent of the originally proposed code?
>>
>
>Original code is not 100% correct. The purpose is to set EOI exit bitmap for
>any vector which may block injection of pt_vector - give chance to recognize
>pt_vector in future intack and then do pt intr post. The simplified code
>achieves this effect same as original code if intack.vector >= vector. I cannot
>come up a case why intack.vector might be smaller than vector. If this case
>happens, we still need enable exit bitmap for intack.vector instead of
>pt_vector for said purpose while original code did it wrong.
>

Thanks for explaining this to me too!!
Your modification is better..

Quan
Xuquan (Euler) Dec. 20, 2016, 9:38 a.m. UTC | #8
On December 20, 2016 4:32 PM, Jan Beulich wrote:
>>>> On 20.12.16 at 06:54, <xuquan8@huawei.com> wrote:
>> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>>>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>>>> Sent: Friday, December 16, 2016 5:40 PM
>>>>
>>>> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17
>00:00:00
>>>2001
>>>> From: Quan Xu <xuquan8@huawei.com>
>>>> Date: Fri, 16 Dec 2016 17:24:01 +0800
>>>> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv
>>>> issue
>>>>
>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>>> payload, the count of IPI interrupt increases rapidly between these
>>>> vCPUs).
>>>>
>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>> 0xd1) are both pending (index of bit set in vIRR), unfortunately,
>>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
>>>> updates IPI interrupt bit set in vIRR to guest interrupt status
>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>> delivers IPI interrupt within VMX non-root operation without a
>>>> VM-Exit. Within VMX non-root operation, if periodic timer interrupt
>>>> index of bit is set in vIRR and highest, the apicv delivers periodic
>>>> timer interrupt within VMX non-root operation as well.
>>>>
>>>> But in current code, if Xen doesn't update periodic timer interrupt
>>>> bit set in vIRR to guest interrupt status (RVI) directly, Xen is not
>>>> aware of this case to decrease the count (pending_intr_nr) of
>>>> pending periodic timer interrupt, then Xen will deliver a periodic
>>>> timer interrupt
>>>again.
>>>>
>>>> And that we update periodic timer interrupt in every VM-entry, there
>>>> is a chance that already-injected instance (before EOI-induced exit
>>>> happens) will incur another pending IRR setting if there is a
>>>> VM-exit happens between virtual interrupt injection (vIRR->0,
>>>> vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post hasn't
>>>> been invoked yet, then the guest receives more periodic timer
>interrupt.
>>>>
>>>> So we set eoi_exit_bitmap for intack.vector when it's higher than
>>>> pending periodic time interrupts. This way we can guarantee there's
>>>> always a chance to post periodic time interrupts when periodic time
>>>> interrupts becomes the highest one.
>>>>
>>>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>>>
>>>I suppose you've verified this new version, but still would like get
>>>your explicit confirmation - did you still see time accuracy issue in your
>side?
>>>Have you tried other guest OS types other than Win7-32?
>>>
>>
>> I only verified it on win7-32 guest..
>> I will continue to verify it on other windows guest (I think windows
>> is enough, right?)
>
>No, I don't think Windows alone is sufficient for verification. People run all
>kinds of OSes as HVM guests, and your change should not negatively impact
>them. At the very least you want to also try Linux.
>

Cloud I use 'date' command to test it? As I only have server version of LINUX, no desktop version...


Quan
Jan Beulich Dec. 20, 2016, 9:57 a.m. UTC | #9
>>> On 20.12.16 at 10:38, <xuquan8@huawei.com> wrote:
> On December 20, 2016 4:32 PM, Jan Beulich wrote:
>>>>> On 20.12.16 at 06:54, <xuquan8@huawei.com> wrote:
>>> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>>>>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>>>>> Sent: Friday, December 16, 2016 5:40 PM
>>>>>
>>>>> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17
>>00:00:00
>>>>2001
>>>>> From: Quan Xu <xuquan8@huawei.com>
>>>>> Date: Fri, 16 Dec 2016 17:24:01 +0800
>>>>> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv
>>>>> issue
>>>>>
>>>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>>>> payload, the count of IPI interrupt increases rapidly between these
>>>>> vCPUs).
>>>>>
>>>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>>>> 0xd1) are both pending (index of bit set in vIRR), unfortunately,
>>>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
>>>>> updates IPI interrupt bit set in vIRR to guest interrupt status
>>>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>>>>> delivers IPI interrupt within VMX non-root operation without a
>>>>> VM-Exit. Within VMX non-root operation, if periodic timer interrupt
>>>>> index of bit is set in vIRR and highest, the apicv delivers periodic
>>>>> timer interrupt within VMX non-root operation as well.
>>>>>
>>>>> But in current code, if Xen doesn't update periodic timer interrupt
>>>>> bit set in vIRR to guest interrupt status (RVI) directly, Xen is not
>>>>> aware of this case to decrease the count (pending_intr_nr) of
>>>>> pending periodic timer interrupt, then Xen will deliver a periodic
>>>>> timer interrupt
>>>>again.
>>>>>
>>>>> And that we update periodic timer interrupt in every VM-entry, there
>>>>> is a chance that already-injected instance (before EOI-induced exit
>>>>> happens) will incur another pending IRR setting if there is a
>>>>> VM-exit happens between virtual interrupt injection (vIRR->0,
>>>>> vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post hasn't
>>>>> been invoked yet, then the guest receives more periodic timer
>>interrupt.
>>>>>
>>>>> So we set eoi_exit_bitmap for intack.vector when it's higher than
>>>>> pending periodic time interrupts. This way we can guarantee there's
>>>>> always a chance to post periodic time interrupts when periodic time
>>>>> interrupts becomes the highest one.
>>>>>
>>>>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
>>>>
>>>>I suppose you've verified this new version, but still would like get
>>>>your explicit confirmation - did you still see time accuracy issue in your
>>side?
>>>>Have you tried other guest OS types other than Win7-32?
>>>>
>>>
>>> I only verified it on win7-32 guest..
>>> I will continue to verify it on other windows guest (I think windows
>>> is enough, right?)
>>
>>No, I don't think Windows alone is sufficient for verification. People run all
>>kinds of OSes as HVM guests, and your change should not negatively impact
>>them. At the very least you want to also try Linux.
> 
> Cloud I use 'date' command to test it? As I only have server version of 
> LINUX, no desktop version...

Well - I'm really not sure how to best test this.

Jan
Xuquan (Euler) Dec. 20, 2016, 1:12 p.m. UTC | #10
On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Friday, December 16, 2016 5:40 PM
>I suppose you've verified this new version, but still would like get your
>explicit confirmation - did you still see time accuracy issue in your side?
>Have you tried other guest OS types other than Win7-32?
>

Kevin, I have tested it again..

__without__ my patch, for win7-64, the wall clock time looks working fine..
It seems the issue is only for win-32..

There is a easy way to reproduce:
*pCPU should be v3 ..(my pCPU is """Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz""")
* more than 2 vCPUs for win32 guest.

Than run the following .bat in win32 guest:

:abcd
echo 11111
goto :abcd





Could Intel test team help me verify it?

>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 639a705..d7a5716 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> -        if (pt_vector != -1)
>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        if ( pt_vector != -1 ) {
>> +            if ( intack.vector > pt_vector )
>> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> +            else
>> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +        }
>
>Above can be simplified as one line change:
>	if ( pt_vector != -1 )
>		vmx_set_eoi_exit_bitmap(v, intack.vector);
>

I have verified this change.. it is working..
Could I send out v4 with this changes?


Quan
Tian, Kevin Dec. 21, 2016, 2:29 a.m. UTC | #11
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Tuesday, December 20, 2016 9:12 PM
> 
> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Friday, December 16, 2016 5:40 PM
> >I suppose you've verified this new version, but still would like get your
> >explicit confirmation - did you still see time accuracy issue in your side?
> >Have you tried other guest OS types other than Win7-32?
> >
> 
> Kevin, I have tested it again..
> 
> __without__ my patch, for win7-64, the wall clock time looks working fine..
> It seems the issue is only for win-32..

You need verify both w/ or w/o patch. It's not impossible that win7-64 has
no problem w/o this fix while sees some regression w/ the patch. This is
the purpose of the thorough test.

> 
> There is a easy way to reproduce:
> *pCPU should be v3 ..(my pCPU is """Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz""")
> * more than 2 vCPUs for win32 guest.
> 
> Than run the following .bat in win32 guest:
> 
> :abcd
> echo 11111
> goto :abcd
> 
> 
> 
> 
> 
> Could Intel test team help me verify it?

Sure. Please work with Chao (CCed) offline on how you can cooperate to
have a complete test. Your help is still appreciated since you already have
the environment.

> 
> >> ---
> >>  xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
> >>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> >> index 639a705..d7a5716 100644
> >> --- a/xen/arch/x86/hvm/vmx/intr.c
> >> +++ b/xen/arch/x86/hvm/vmx/intr.c
> >> -        if (pt_vector != -1)
> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        if ( pt_vector != -1 ) {
> >> +            if ( intack.vector > pt_vector )
> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> +            else
> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        }
> >
> >Above can be simplified as one line change:
> >	if ( pt_vector != -1 )
> >		vmx_set_eoi_exit_bitmap(v, intack.vector);
> >
> 
> I have verified this change.. it is working..
> Could I send out v4 with this changes?
> 

Please. I'll ack when you complete the test.

Thanks
Kevin
Tian, Kevin Dec. 21, 2016, 2:32 a.m. UTC | #12
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Tuesday, December 20, 2016 5:39 PM
> 
> On December 20, 2016 4:32 PM, Jan Beulich wrote:
> >>>> On 20.12.16 at 06:54, <xuquan8@huawei.com> wrote:
> >> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
> >>>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >>>> Sent: Friday, December 16, 2016 5:40 PM
> >>>>
> >>>> From 89fffdd6b563b2723e24d17231715bb8c9f24f90 Mon Sep 17
> >00:00:00
> >>>2001
> >>>> From: Quan Xu <xuquan8@huawei.com>
> >>>> Date: Fri, 16 Dec 2016 17:24:01 +0800
> >>>> Subject: [PATCH v3] x86/apicv: fix RTC periodic timer and apicv
> >>>> issue
> >>>>
> >>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >>>> guest with high payload (with 2vCPU, captured from xentrace, in high
> >>>> payload, the count of IPI interrupt increases rapidly between these
> >>>> vCPUs).
> >>>>
> >>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> >>>> 0xd1) are both pending (index of bit set in vIRR), unfortunately,
> >>>> the IPI intrrupt is high priority than periodic timer interrupt. Xen
> >>>> updates IPI interrupt bit set in vIRR to guest interrupt status
> >>>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
> >>>> delivers IPI interrupt within VMX non-root operation without a
> >>>> VM-Exit. Within VMX non-root operation, if periodic timer interrupt
> >>>> index of bit is set in vIRR and highest, the apicv delivers periodic
> >>>> timer interrupt within VMX non-root operation as well.
> >>>>
> >>>> But in current code, if Xen doesn't update periodic timer interrupt
> >>>> bit set in vIRR to guest interrupt status (RVI) directly, Xen is not
> >>>> aware of this case to decrease the count (pending_intr_nr) of
> >>>> pending periodic timer interrupt, then Xen will deliver a periodic
> >>>> timer interrupt
> >>>again.
> >>>>
> >>>> And that we update periodic timer interrupt in every VM-entry, there
> >>>> is a chance that already-injected instance (before EOI-induced exit
> >>>> happens) will incur another pending IRR setting if there is a
> >>>> VM-exit happens between virtual interrupt injection (vIRR->0,
> >>>> vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post hasn't
> >>>> been invoked yet, then the guest receives more periodic timer
> >interrupt.
> >>>>
> >>>> So we set eoi_exit_bitmap for intack.vector when it's higher than
> >>>> pending periodic time interrupts. This way we can guarantee there's
> >>>> always a chance to post periodic time interrupts when periodic time
> >>>> interrupts becomes the highest one.
> >>>>
> >>>> Signed-off-by: Quan Xu <xuquan8@huawei.com>
> >>>
> >>>I suppose you've verified this new version, but still would like get
> >>>your explicit confirmation - did you still see time accuracy issue in your
> >side?
> >>>Have you tried other guest OS types other than Win7-32?
> >>>
> >>
> >> I only verified it on win7-32 guest..
> >> I will continue to verify it on other windows guest (I think windows
> >> is enough, right?)
> >
> >No, I don't think Windows alone is sufficient for verification. People run all
> >kinds of OSes as HVM guests, and your change should not negatively impact
> >them. At the very least you want to also try Linux.
> >
> 
> Cloud I use 'date' command to test it? As I only have server version of LINUX, no desktop
> version...
> 
> 

Using 'date' is OK. The key is that you need find a workload which
can impose enough IPIs as you observed in Windows guest side.
Anyway, think about the situation you described in the patch msg
and then generate a test environment accordingly. :-)

Thanks
Kevin
Xuquan (Euler) Dec. 21, 2016, 4:59 a.m. UTC | #13
On December 21, 2016 10:30 AM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Tuesday, December 20, 2016 9:12 PM
>>
>> On December 20, 2016 1:37 PM, Tian, Kevin wrote:
>> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Friday, December 16, 2016 5:40 PM
>> Could Intel test team help me verify it?
>
>Sure. Please work with Chao (CCed) offline on how you can cooperate to
>have a complete test. Your help is still appreciated since you already have
>the environment.
>

Thanks.. Chao, feel free to contact me, if you have some question..



>> >> ---
>> >>  xen/arch/x86/hvm/vmx/intr.c | 15 ++++++++++++---
>> >>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/xen/arch/x86/hvm/vmx/intr.c
>> >> b/xen/arch/x86/hvm/vmx/intr.c index 639a705..d7a5716 100644
>> >> --- a/xen/arch/x86/hvm/vmx/intr.c
>> >> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> >> -        if (pt_vector != -1)
>> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        if ( pt_vector != -1 ) {
>> >> +            if ( intack.vector > pt_vector )
>> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> +            else
>> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        }
>> >
>> >Above can be simplified as one line change:
>> >	if ( pt_vector != -1 )
>> >		vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >
>>
>> I have verified this change.. it is working..
>> Could I send out v4 with this changes?
>>
>
>Please. I'll ack when you complete the test.
>


I will send out soon.. BTW, I find an apicv performance issue, I also want to upstream it .. but I need more time
to prepare it.

Quan
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 639a705..d7a5716 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -315,9 +315,17 @@  void vmx_intr_assist(void)
         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
         * exit, then pending periodic time interrups have the chance to be injected
         * for compensation
+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
+        * periodic time interrupts. This way we can guarantee there's always a chance
+        * to post periodic time interrupts when periodic time interrupts becomes the
+        * highest one
         */
-        if (pt_vector != -1)
-            vmx_set_eoi_exit_bitmap(v, pt_vector);
+        if ( pt_vector != -1 ) {
+            if ( intack.vector > pt_vector )
+                vmx_set_eoi_exit_bitmap(v, intack.vector);
+            else
+                vmx_set_eoi_exit_bitmap(v, pt_vector);
+        }

         /* we need update the RVI field */
         __vmread(GUEST_INTR_STATUS, &status);
@@ -334,7 +342,8 @@  void vmx_intr_assist(void)
             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
         }

-        pt_intr_post(v, intack);
+        if ( intack.vector == pt_vector )
+            pt_intr_post(v, intack);
     }
     else
     {