diff mbox

[v4] x86/apicv: fix RTC periodic timer and apicv issue

Message ID E0A769A898ADB6449596C41F51EF62C6AD8101@SZXEMI506-MBX.china.huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xuquan (Euler) Dec. 21, 2016, 5:44 a.m. UTC
When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.

And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.

So we set eoi_exit_bitmap for intack.vector when it's higher than
pending periodic time interrupts. This way we can guarantee there's
always a chance to post periodic time interrupts when periodic time
interrupts becomes the highest one.

Signed-off-by: Quan Xu <xuquan8@huawei.com>
---
 xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--
1.8.3.4

Comments

Chao Gao Dec. 21, 2016, 8:02 p.m. UTC | #1
Hi, xuquan.
I have tested it on my skylake server. W/o this patch the inaccurate
wall clock time issue only exists in Win7-32 guest. Win7-64, Win8-32, Win8-64,
Win10-32 ,Win10-64 and linux-4.8.0+ guests don't have this issue.
W/ this v4 patch, the issue disappears in Win7-32 guest and no wall lock time
related regression is found on Win7-64, Win8-32, Win8-64, Win10-32, Win10-64 
and linux-4.8.0+ guest.

In windows guest, the test procedure is
1. Create a windows guest with 2 vCPU
2. run the following .bat in guest
    :abcd
    echo 111111
    goto abcd

3. Start a stop-watch outside the guest and monitor the clock at the lower right
   corner in guest. After 120 seconds according the guest clock, stop the stop-watch.
   If the time shows in the stop-watch is about 120 seconds, then I think
   there is no the above issue in the guest. Otherwise, the time is inaccurate. 

In Win7-32 case, the stop-watch time is about 70 seconds, so the clock in guest is
obviously inaccurate.

In linux guest, the test procedure is
1. Create a linux guest with 4 vCPU
2. insmod the following linux module
   (through output of /proc/interrupt, about 850000 ipis in 13 seconds)
3. use date command to get guest time, others are same as test in windows guest

#include <linux/init.h>                                                         
#include <linux/module.h>                                                       
#include <linux/kthread.h>                                                      
#include <linux/sched.h>                                                        
#include <asm/delay.h>                                                          
MODULE_LICENSE("GPL");                                                          
void workload(void *info)                                                       
{                                                                               
        asm volatile("nop");                                                    
}                                                                               
                                                                                
void msleep(unsigned int msecs);                                                
static int ipi_generator(void * info)                                           
{                                                                               
        int i;                                                                  
        while (!kthread_should_stop()) {                                        
                for(i=0; i< 5 * 10000; i++)                                     
                {                                                               
                        smp_call_function(workload, NULL,1);                    
                }                                                               
                msleep(1);                                                      
        }                                                                       
        return 0;                                                               
}                                                                               
struct task_struct *thread;                                                     
static int __init ipi_init(void)                                                
{                                                                               
        thread = kthread_run(ipi_generator, NULL, "IPI");                       
        if (IS_ERR(thread))                                                     
                return PTR_ERR(thread);                                         
        return 0;                                                               
}                                                                               
                                                                                
static void __exit ipi_exit(void)                                               
{                                                                               
        kthread_stop(thread);                                                   
}                                                                               
module_init(ipi_init);                                                          
module_exit(ipi_exit);

Are these tests sufficient? Please let me know if you have any other thoughts.

On Wed, Dec 21, 2016 at 05:44:08AM +0000, Xuquan (Quan Xu) wrote:
>When Xen apicv is enabled, wall clock time is faster on Windows7-32
>guest with high payload (with 2vCPU, captured from xentrace, in
>high payload, the count of IPI interrupt increases rapidly between
>these vCPUs).
>
>If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>are both pending (index of bit set in vIRR), unfortunately, the IPI
>intrrupt is high priority than periodic timer interrupt. Xen updates
>IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
>priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
>within VMX non-root operation without a VM-Exit. Within VMX non-root
>operation, if periodic timer interrupt index of bit is set in vIRR and
>highest, the apicv delivers periodic timer interrupt within VMX non-root
>operation as well.
>
>But in current code, if Xen doesn't update periodic timer interrupt bit
>set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
>of this case to decrease the count (pending_intr_nr) of pending periodic
>timer interrupt, then Xen will deliver a periodic timer interrupt again.
>
>And that we update periodic timer interrupt in every VM-entry, there is
>a chance that already-injected instance (before EOI-induced exit happens)
>will incur another pending IRR setting if there is a VM-exit happens
>between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
>exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
>guest receives more periodic timer interrupt.
>
>So we set eoi_exit_bitmap for intack.vector when it's higher than
>pending periodic time interrupts. This way we can guarantee there's
>always a chance to post periodic time interrupts when periodic time
>interrupts becomes the highest one.
>
>Signed-off-by: Quan Xu <xuquan8@huawei.com>
>---
> xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
>diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>index 639a705..0cf26b4 100644
>--- a/xen/arch/x86/hvm/vmx/intr.c
>+++ b/xen/arch/x86/hvm/vmx/intr.c
>@@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
>         * exit, then pending periodic time interrups have the chance to be injected
>         * for compensation
>+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
>+        * periodic time interrupts. This way we can guarantee there's always a chance
>+        * to post periodic time interrupts when periodic time interrupts becomes the
>+        * highest one
>         */
>         if (pt_vector != -1)
>-            vmx_set_eoi_exit_bitmap(v, pt_vector);
>+            vmx_set_eoi_exit_bitmap(v, intack.vector);
>
>         /* we need update the RVI field */
>         __vmread(GUEST_INTR_STATUS, &status);
>@@ -334,7 +338,8 @@ void vmx_intr_assist(void)
>             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>         }
>
>-        pt_intr_post(v, intack);
>+        if ( intack.vector == pt_vector )
>+            pt_intr_post(v, intack);
>     }
>     else
>     {
>--
>1.8.3.4
Xuquan (Euler) Dec. 22, 2016, 3:11 a.m. UTC | #2
Hi Chao,
To me, it is sufficient.. thanks for your verification!!


Quan

On December 22, 2016 4:02 AM, Chao Gao wrote:
>Hi, xuquan.
>I have tested it on my skylake server. W/o this patch the inaccurate wall
>clock time issue only exists in Win7-32 guest. Win7-64, Win8-32, Win8-64,
>Win10-32 ,Win10-64 and linux-4.8.0+ guests don't have this issue.
>W/ this v4 patch, the issue disappears in Win7-32 guest and no wall lock
>time related regression is found on Win7-64, Win8-32, Win8-64, Win10-32,
>Win10-64 and linux-4.8.0+ guest.
>
>In windows guest, the test procedure is
>1. Create a windows guest with 2 vCPU
>2. run the following .bat in guest
>    :abcd
>    echo 111111
>    goto abcd
>
>3. Start a stop-watch outside the guest and monitor the clock at the lower
>right
>   corner in guest. After 120 seconds according the guest clock, stop the
>stop-watch.
>   If the time shows in the stop-watch is about 120 seconds, then I think
>   there is no the above issue in the guest. Otherwise, the time is
>inaccurate.
>
>In Win7-32 case, the stop-watch time is about 70 seconds, so the clock in
>guest is obviously inaccurate.
>
>In linux guest, the test procedure is
>1. Create a linux guest with 4 vCPU
>2. insmod the following linux module
>   (through output of /proc/interrupt, about 850000 ipis in 13 seconds) 3.
>use date command to get guest time, others are same as test in windows
>guest
>
>#include <linux/init.h>
>#include <linux/module.h>
>#include <linux/kthread.h>
>#include <linux/sched.h>
>#include <asm/delay.h>
>MODULE_LICENSE("GPL");
>void workload(void *info)
>{
>        asm volatile("nop");
>}
>
>void msleep(unsigned int msecs);
>static int ipi_generator(void * info)
>{
>        int i;
>        while (!kthread_should_stop()) {
>                for(i=0; i< 5 * 10000; i++)
>                {
>                        smp_call_function(workload, NULL,1);
>                }
>                msleep(1);
>        }
>        return 0;
>}
>struct task_struct *thread;
>static int __init ipi_init(void)
>{
>        thread = kthread_run(ipi_generator, NULL, "IPI");
>        if (IS_ERR(thread))
>                return PTR_ERR(thread);
>        return 0;
>}
>
>static void __exit ipi_exit(void)
>{
>        kthread_stop(thread);
>}
>module_init(ipi_init);
>module_exit(ipi_exit);
>
>Are these tests sufficient? Please let me know if you have any other
>thoughts.
>
>On Wed, Dec 21, 2016 at 05:44:08AM +0000, Xuquan (Quan Xu) wrote:
>>When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>guest with high payload (with 2vCPU, captured from xentrace, in high
>>payload, the count of IPI interrupt increases rapidly between these
>>vCPUs).
>>
>>If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>0xd1) are both pending (index of bit set in vIRR), unfortunately, the
>>IPI intrrupt is high priority than periodic timer interrupt. Xen
>>updates IPI interrupt bit set in vIRR to guest interrupt status (RVI)
>>as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>>interrupt within VMX non-root operation without a VM-Exit. Within VMX
>>non-root operation, if periodic timer interrupt index of bit is set in
>>vIRR and highest, the apicv delivers periodic timer interrupt within
>>VMX non-root operation as well.
>>
>>But in current code, if Xen doesn't update periodic timer interrupt bit
>>set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
>>of this case to decrease the count (pending_intr_nr) of pending
>>periodic timer interrupt, then Xen will deliver a periodic timer interrupt
>again.
>>
>>And that we update periodic timer interrupt in every VM-entry, there is
>>a chance that already-injected instance (before EOI-induced exit
>>happens) will incur another pending IRR setting if there is a VM-exit
>>happens between virtual interrupt injection (vIRR->0, vISR->1) and
>>EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked yet,
>>then the guest receives more periodic timer interrupt.
>>
>>So we set eoi_exit_bitmap for intack.vector when it's higher than
>>pending periodic time interrupts. This way we can guarantee there's
>>always a chance to post periodic time interrupts when periodic time
>>interrupts becomes the highest one.
>>
>>Signed-off-by: Quan Xu <xuquan8@huawei.com>
>>---
>> xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>>diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>>index 639a705..0cf26b4 100644
>>--- a/xen/arch/x86/hvm/vmx/intr.c
>>+++ b/xen/arch/x86/hvm/vmx/intr.c
>>@@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>         * Set eoi_exit_bitmap for periodic timer interrup to cause
>EOI-induced VM
>>         * exit, then pending periodic time interrups have the chance to
>be injected
>>         * for compensation
>>+        * Set eoi_exit_bitmap for intack.vector when it's higher than
>pending
>>+        * periodic time interrupts. This way we can guarantee there's
>always a chance
>>+        * to post periodic time interrupts when periodic time
>interrupts becomes the
>>+        * highest one
>>         */
>>         if (pt_vector != -1)
>>-            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>+            vmx_set_eoi_exit_bitmap(v, intack.vector);
>>
>>         /* we need update the RVI field */
>>         __vmread(GUEST_INTR_STATUS, &status); @@ -334,7 +338,8
>@@ void
>>vmx_intr_assist(void)
>>             __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>         }
>>
>>-        pt_intr_post(v, intack);
>>+        if ( intack.vector == pt_vector )
>>+            pt_intr_post(v, intack);
>>     }
>>     else
>>     {
>>--
>>1.8.3.4
Chao Gao Dec. 22, 2016, 3:16 a.m. UTC | #3
On Thu, Dec 22, 2016 at 03:48:48PM +0800, Tian, Kevin wrote:
>sent too quick. I meant please add your tested-by here. :-)

Yes, of course. Tested-by: Chao Gao <chao.gao@intel.com>

>
>> From: Tian, Kevin
>> Sent: Thursday, December 22, 2016 3:48 PM
>> 
>> Thanks a lot!
>> 
>> > From: Gao, Chao
>> > Sent: Thursday, December 22, 2016 4:02 AM
>> >
>> > Hi, xuquan.
>> > I have tested it on my skylake server. W/o this patch the inaccurate
>> > wall clock time issue only exists in Win7-32 guest. Win7-64, Win8-32, Win8-64,
>> > Win10-32 ,Win10-64 and linux-4.8.0+ guests don't have this issue.
>> > W/ this v4 patch, the issue disappears in Win7-32 guest and no wall lock time
>> > related regression is found on Win7-64, Win8-32, Win8-64, Win10-32, Win10-64
>> > and linux-4.8.0+ guest.
>> >
>> > In windows guest, the test procedure is
>> > 1. Create a windows guest with 2 vCPU
>> > 2. run the following .bat in guest
>> >     :abcd
>> >     echo 111111
>> >     goto abcd
>> >
>> > 3. Start a stop-watch outside the guest and monitor the clock at the lower right
>> >    corner in guest. After 120 seconds according the guest clock, stop the stop-watch.
>> >    If the time shows in the stop-watch is about 120 seconds, then I think
>> >    there is no the above issue in the guest. Otherwise, the time is inaccurate.
>> >
>> > In Win7-32 case, the stop-watch time is about 70 seconds, so the clock in guest is
>> > obviously inaccurate.
>> >
>> > In linux guest, the test procedure is
>> > 1. Create a linux guest with 4 vCPU
>> > 2. insmod the following linux module
>> >    (through output of /proc/interrupt, about 850000 ipis in 13 seconds)
>> > 3. use date command to get guest time, others are same as test in windows guest
>> >
>> > #include <linux/init.h>
>> > #include <linux/module.h>
>> > #include <linux/kthread.h>
>> > #include <linux/sched.h>
>> > #include <asm/delay.h>
>> > MODULE_LICENSE("GPL");
>> > void workload(void *info)
>> > {
>> >         asm volatile("nop");
>> > }
>> >
>> > void msleep(unsigned int msecs);
>> > static int ipi_generator(void * info)
>> > {
>> >         int i;
>> >         while (!kthread_should_stop()) {
>> >                 for(i=0; i< 5 * 10000; i++)
>> >                 {
>> >                         smp_call_function(workload, NULL,1);
>> >                 }
>> >                 msleep(1);
>> >         }
>> >         return 0;
>> > }
>> > struct task_struct *thread;
>> > static int __init ipi_init(void)
>> > {
>> >         thread = kthread_run(ipi_generator, NULL, "IPI");
>> >         if (IS_ERR(thread))
>> >                 return PTR_ERR(thread);
>> >         return 0;
>> > }
>> >
>> > static void __exit ipi_exit(void)
>> > {
>> >         kthread_stop(thread);
>> > }
>> > module_init(ipi_init);
>> > module_exit(ipi_exit);
>> >
>> > Are these tests sufficient? Please let me know if you have any other thoughts.
>> >
>> > On Wed, Dec 21, 2016 at 05:44:08AM +0000, Xuquan (Quan Xu) wrote:
>> > >When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> > >guest with high payload (with 2vCPU, captured from xentrace, in
>> > >high payload, the count of IPI interrupt increases rapidly between
>> > >these vCPUs).
>> > >
>> > >If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>> > >are both pending (index of bit set in vIRR), unfortunately, the IPI
>> > >intrrupt is high priority than periodic timer interrupt. Xen updates
>> > >IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
>> > >priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
>> > >within VMX non-root operation without a VM-Exit. Within VMX non-root
>> > >operation, if periodic timer interrupt index of bit is set in vIRR and
>> > >highest, the apicv delivers periodic timer interrupt within VMX non-root
>> > >operation as well.
>> > >
>> > >But in current code, if Xen doesn't update periodic timer interrupt bit
>> > >set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
>> > >of this case to decrease the count (pending_intr_nr) of pending periodic
>> > >timer interrupt, then Xen will deliver a periodic timer interrupt again.
>> > >
>> > >And that we update periodic timer interrupt in every VM-entry, there is
>> > >a chance that already-injected instance (before EOI-induced exit happens)
>> > >will incur another pending IRR setting if there is a VM-exit happens
>> > >between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
>> > >exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
>> > >guest receives more periodic timer interrupt.
>> > >
>> > >So we set eoi_exit_bitmap for intack.vector when it's higher than
>> > >pending periodic time interrupts. This way we can guarantee there's
>> > >always a chance to post periodic time interrupts when periodic time
>> > >interrupts becomes the highest one.
>> > >
>> > >Signed-off-by: Quan Xu <xuquan8@huawei.com>
>> > >---
>> > > xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
>> > > 1 file changed, 7 insertions(+), 2 deletions(-)
>> > >
>> > >diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> > >index 639a705..0cf26b4 100644
>> > >--- a/xen/arch/x86/hvm/vmx/intr.c
>> > >+++ b/xen/arch/x86/hvm/vmx/intr.c
>> > >@@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>> > >         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
>> > >         * exit, then pending periodic time interrups have the chance to be injected
>> > >         * for compensation
>> > >+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
>> > >+        * periodic time interrupts. This way we can guarantee there's always a chance
>> > >+        * to post periodic time interrupts when periodic time interrupts becomes the
>> > >+        * highest one
>> > >         */
>> > >         if (pt_vector != -1)
>> > >-            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> > >+            vmx_set_eoi_exit_bitmap(v, intack.vector);
>> > >
>> > >         /* we need update the RVI field */
>> > >         __vmread(GUEST_INTR_STATUS, &status);
>> > >@@ -334,7 +338,8 @@ void vmx_intr_assist(void)
>> > >             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>> > >         }
>> > >
>> > >-        pt_intr_post(v, intack);
>> > >+        if ( intack.vector == pt_vector )
>> > >+            pt_intr_post(v, intack);
>> > >     }
>> > >     else
>> > >     {
>> > >--
>> > >1.8.3.4
Tian, Kevin Dec. 22, 2016, 7:47 a.m. UTC | #4
Thanks a lot!

> From: Gao, Chao
> Sent: Thursday, December 22, 2016 4:02 AM
> 
> Hi, xuquan.
> I have tested it on my skylake server. W/o this patch the inaccurate
> wall clock time issue only exists in Win7-32 guest. Win7-64, Win8-32, Win8-64,
> Win10-32 ,Win10-64 and linux-4.8.0+ guests don't have this issue.
> W/ this v4 patch, the issue disappears in Win7-32 guest and no wall lock time
> related regression is found on Win7-64, Win8-32, Win8-64, Win10-32, Win10-64
> and linux-4.8.0+ guest.
> 
> In windows guest, the test procedure is
> 1. Create a windows guest with 2 vCPU
> 2. run the following .bat in guest
>     :abcd
>     echo 111111
>     goto abcd
> 
> 3. Start a stop-watch outside the guest and monitor the clock at the lower right
>    corner in guest. After 120 seconds according the guest clock, stop the stop-watch.
>    If the time shows in the stop-watch is about 120 seconds, then I think
>    there is no the above issue in the guest. Otherwise, the time is inaccurate.
> 
> In Win7-32 case, the stop-watch time is about 70 seconds, so the clock in guest is
> obviously inaccurate.
> 
> In linux guest, the test procedure is
> 1. Create a linux guest with 4 vCPU
> 2. insmod the following linux module
>    (through output of /proc/interrupt, about 850000 ipis in 13 seconds)
> 3. use date command to get guest time, others are same as test in windows guest
> 
> #include <linux/init.h>
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/sched.h>
> #include <asm/delay.h>
> MODULE_LICENSE("GPL");
> void workload(void *info)
> {
>         asm volatile("nop");
> }
> 
> void msleep(unsigned int msecs);
> static int ipi_generator(void * info)
> {
>         int i;
>         while (!kthread_should_stop()) {
>                 for(i=0; i< 5 * 10000; i++)
>                 {
>                         smp_call_function(workload, NULL,1);
>                 }
>                 msleep(1);
>         }
>         return 0;
> }
> struct task_struct *thread;
> static int __init ipi_init(void)
> {
>         thread = kthread_run(ipi_generator, NULL, "IPI");
>         if (IS_ERR(thread))
>                 return PTR_ERR(thread);
>         return 0;
> }
> 
> static void __exit ipi_exit(void)
> {
>         kthread_stop(thread);
> }
> module_init(ipi_init);
> module_exit(ipi_exit);
> 
> Are these tests sufficient? Please let me know if you have any other thoughts.
> 
> On Wed, Dec 21, 2016 at 05:44:08AM +0000, Xuquan (Quan Xu) wrote:
> >When Xen apicv is enabled, wall clock time is faster on Windows7-32
> >guest with high payload (with 2vCPU, captured from xentrace, in
> >high payload, the count of IPI interrupt increases rapidly between
> >these vCPUs).
> >
> >If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> >are both pending (index of bit set in vIRR), unfortunately, the IPI
> >intrrupt is high priority than periodic timer interrupt. Xen updates
> >IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
> >priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
> >within VMX non-root operation without a VM-Exit. Within VMX non-root
> >operation, if periodic timer interrupt index of bit is set in vIRR and
> >highest, the apicv delivers periodic timer interrupt within VMX non-root
> >operation as well.
> >
> >But in current code, if Xen doesn't update periodic timer interrupt bit
> >set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
> >of this case to decrease the count (pending_intr_nr) of pending periodic
> >timer interrupt, then Xen will deliver a periodic timer interrupt again.
> >
> >And that we update periodic timer interrupt in every VM-entry, there is
> >a chance that already-injected instance (before EOI-induced exit happens)
> >will incur another pending IRR setting if there is a VM-exit happens
> >between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
> >exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
> >guest receives more periodic timer interrupt.
> >
> >So we set eoi_exit_bitmap for intack.vector when it's higher than
> >pending periodic time interrupts. This way we can guarantee there's
> >always a chance to post periodic time interrupts when periodic time
> >interrupts becomes the highest one.
> >
> >Signed-off-by: Quan Xu <xuquan8@huawei.com>
> >---
> > xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> >diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> >index 639a705..0cf26b4 100644
> >--- a/xen/arch/x86/hvm/vmx/intr.c
> >+++ b/xen/arch/x86/hvm/vmx/intr.c
> >@@ -315,9 +315,13 @@ void vmx_intr_assist(void)
> >         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
> >         * exit, then pending periodic time interrups have the chance to be injected
> >         * for compensation
> >+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
> >+        * periodic time interrupts. This way we can guarantee there's always a chance
> >+        * to post periodic time interrupts when periodic time interrupts becomes the
> >+        * highest one
> >         */
> >         if (pt_vector != -1)
> >-            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >+            vmx_set_eoi_exit_bitmap(v, intack.vector);
> >
> >         /* we need update the RVI field */
> >         __vmread(GUEST_INTR_STATUS, &status);
> >@@ -334,7 +338,8 @@ void vmx_intr_assist(void)
> >             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> >         }
> >
> >-        pt_intr_post(v, intack);
> >+        if ( intack.vector == pt_vector )
> >+            pt_intr_post(v, intack);
> >     }
> >     else
> >     {
> >--
> >1.8.3.4
Tian, Kevin Dec. 22, 2016, 7:48 a.m. UTC | #5
sent too quick. I meant please add your tested-by here. :-)

> From: Tian, Kevin
> Sent: Thursday, December 22, 2016 3:48 PM
> 
> Thanks a lot!
> 
> > From: Gao, Chao
> > Sent: Thursday, December 22, 2016 4:02 AM
> >
> > Hi, xuquan.
> > I have tested it on my skylake server. W/o this patch the inaccurate
> > wall clock time issue only exists in Win7-32 guest. Win7-64, Win8-32, Win8-64,
> > Win10-32 ,Win10-64 and linux-4.8.0+ guests don't have this issue.
> > W/ this v4 patch, the issue disappears in Win7-32 guest and no wall lock time
> > related regression is found on Win7-64, Win8-32, Win8-64, Win10-32, Win10-64
> > and linux-4.8.0+ guest.
> >
> > In windows guest, the test procedure is
> > 1. Create a windows guest with 2 vCPU
> > 2. run the following .bat in guest
> >     :abcd
> >     echo 111111
> >     goto abcd
> >
> > 3. Start a stop-watch outside the guest and monitor the clock at the lower right
> >    corner in guest. After 120 seconds according the guest clock, stop the stop-watch.
> >    If the time shows in the stop-watch is about 120 seconds, then I think
> >    there is no the above issue in the guest. Otherwise, the time is inaccurate.
> >
> > In Win7-32 case, the stop-watch time is about 70 seconds, so the clock in guest is
> > obviously inaccurate.
> >
> > In linux guest, the test procedure is
> > 1. Create a linux guest with 4 vCPU
> > 2. insmod the following linux module
> >    (through output of /proc/interrupt, about 850000 ipis in 13 seconds)
> > 3. use date command to get guest time, others are same as test in windows guest
> >
> > #include <linux/init.h>
> > #include <linux/module.h>
> > #include <linux/kthread.h>
> > #include <linux/sched.h>
> > #include <asm/delay.h>
> > MODULE_LICENSE("GPL");
> > void workload(void *info)
> > {
> >         asm volatile("nop");
> > }
> >
> > void msleep(unsigned int msecs);
> > static int ipi_generator(void * info)
> > {
> >         int i;
> >         while (!kthread_should_stop()) {
> >                 for(i=0; i< 5 * 10000; i++)
> >                 {
> >                         smp_call_function(workload, NULL,1);
> >                 }
> >                 msleep(1);
> >         }
> >         return 0;
> > }
> > struct task_struct *thread;
> > static int __init ipi_init(void)
> > {
> >         thread = kthread_run(ipi_generator, NULL, "IPI");
> >         if (IS_ERR(thread))
> >                 return PTR_ERR(thread);
> >         return 0;
> > }
> >
> > static void __exit ipi_exit(void)
> > {
> >         kthread_stop(thread);
> > }
> > module_init(ipi_init);
> > module_exit(ipi_exit);
> >
> > Are these tests sufficient? Please let me know if you have any other thoughts.
> >
> > On Wed, Dec 21, 2016 at 05:44:08AM +0000, Xuquan (Quan Xu) wrote:
> > >When Xen apicv is enabled, wall clock time is faster on Windows7-32
> > >guest with high payload (with 2vCPU, captured from xentrace, in
> > >high payload, the count of IPI interrupt increases rapidly between
> > >these vCPUs).
> > >
> > >If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> > >are both pending (index of bit set in vIRR), unfortunately, the IPI
> > >intrrupt is high priority than periodic timer interrupt. Xen updates
> > >IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
> > >priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
> > >within VMX non-root operation without a VM-Exit. Within VMX non-root
> > >operation, if periodic timer interrupt index of bit is set in vIRR and
> > >highest, the apicv delivers periodic timer interrupt within VMX non-root
> > >operation as well.
> > >
> > >But in current code, if Xen doesn't update periodic timer interrupt bit
> > >set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
> > >of this case to decrease the count (pending_intr_nr) of pending periodic
> > >timer interrupt, then Xen will deliver a periodic timer interrupt again.
> > >
> > >And that we update periodic timer interrupt in every VM-entry, there is
> > >a chance that already-injected instance (before EOI-induced exit happens)
> > >will incur another pending IRR setting if there is a VM-exit happens
> > >between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
> > >exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
> > >guest receives more periodic timer interrupt.
> > >
> > >So we set eoi_exit_bitmap for intack.vector when it's higher than
> > >pending periodic time interrupts. This way we can guarantee there's
> > >always a chance to post periodic time interrupts when periodic time
> > >interrupts becomes the highest one.
> > >
> > >Signed-off-by: Quan Xu <xuquan8@huawei.com>
> > >---
> > > xen/arch/x86/hvm/vmx/intr.c | 9 +++++++--
> > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > >diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
> > >index 639a705..0cf26b4 100644
> > >--- a/xen/arch/x86/hvm/vmx/intr.c
> > >+++ b/xen/arch/x86/hvm/vmx/intr.c
> > >@@ -315,9 +315,13 @@ void vmx_intr_assist(void)
> > >         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
> > >         * exit, then pending periodic time interrups have the chance to be injected
> > >         * for compensation
> > >+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
> > >+        * periodic time interrupts. This way we can guarantee there's always a chance
> > >+        * to post periodic time interrupts when periodic time interrupts becomes the
> > >+        * highest one
> > >         */
> > >         if (pt_vector != -1)
> > >-            vmx_set_eoi_exit_bitmap(v, pt_vector);
> > >+            vmx_set_eoi_exit_bitmap(v, intack.vector);
> > >
> > >         /* we need update the RVI field */
> > >         __vmread(GUEST_INTR_STATUS, &status);
> > >@@ -334,7 +338,8 @@ void vmx_intr_assist(void)
> > >             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
> > >         }
> > >
> > >-        pt_intr_post(v, intack);
> > >+        if ( intack.vector == pt_vector )
> > >+            pt_intr_post(v, intack);
> > >     }
> > >     else
> > >     {
> > >--
> > >1.8.3.4
Tian, Kevin Dec. 22, 2016, 7:50 a.m. UTC | #6
> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Wednesday, December 21, 2016 1:44 PM
> 
> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> guest with high payload (with 2vCPU, captured from xentrace, in
> high payload, the count of IPI interrupt increases rapidly between
> these vCPUs).
> 
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
> are both pending (index of bit set in vIRR), unfortunately, the IPI
> intrrupt is high priority than periodic timer interrupt. Xen updates
> IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
> priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
> within VMX non-root operation without a VM-Exit. Within VMX non-root
> operation, if periodic timer interrupt index of bit is set in vIRR and
> highest, the apicv delivers periodic timer interrupt within VMX non-root
> operation as well.
> 
> But in current code, if Xen doesn't update periodic timer interrupt bit
> set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
> of this case to decrease the count (pending_intr_nr) of pending periodic
> timer interrupt, then Xen will deliver a periodic timer interrupt again.
> 
> And that we update periodic timer interrupt in every VM-entry, there is
> a chance that already-injected instance (before EOI-induced exit happens)
> will incur another pending IRR setting if there is a VM-exit happens
> between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
> exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
> guest receives more periodic timer interrupt.
> 
> So we set eoi_exit_bitmap for intack.vector when it's higher than
> pending periodic time interrupts. This way we can guarantee there's
> always a chance to post periodic time interrupts when periodic time
> interrupts becomes the highest one.
> 
> Signed-off-by: Quan Xu <xuquan8@huawei.com>

Thanks for your consistent work on fixing this tricky issue:

Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich Dec. 22, 2016, 8:11 a.m. UTC | #7
>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>          * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
>          * exit, then pending periodic time interrups have the chance to be injected
>          * for compensation
> +        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
> +        * periodic time interrupts. This way we can guarantee there's always a chance
> +        * to post periodic time interrupts when periodic time interrupts becomes the
> +        * highest one
>          */
>          if (pt_vector != -1)
> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> +            vmx_set_eoi_exit_bitmap(v, intack.vector);

The comment does not clarify why max(pt_vector, intack.vector)
is not needed. And I'd expect you to add
ASSERT(intack.vector >= pt_vector) then, to prove this (and one
might argue that this addition could be sufficient documentation,
albeit perhaps a brief comment next to the assertion would help
readers of this non-trivial piece of code).

Jan
Xuquan (Euler) Dec. 23, 2016, 12:24 p.m. UTC | #8
On December 22, 2016 4:12 PM, Jan Beulich wrote:
>>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>          * Set eoi_exit_bitmap for periodic timer interrup to cause
>EOI-induced VM
>>          * exit, then pending periodic time interrups have the chance
>to be injected
>>          * for compensation
>> +        * Set eoi_exit_bitmap for intack.vector when it's higher than
>pending
>> +        * periodic time interrupts. This way we can guarantee there's
>always a chance
>> +        * to post periodic time interrupts when periodic time
>interrupts becomes the
>> +        * highest one
>>          */
>>          if (pt_vector != -1)
>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> +            vmx_set_eoi_exit_bitmap(v, intack.vector);
>
>The comment does not clarify why max(pt_vector, intack.vector) is not
>needed. And I'd expect you to add ASSERT(intack.vector >= pt_vector) then,
>to prove this (and one might argue that this addition could be sufficient
>documentation, albeit perhaps a brief comment next to the assertion
>would help readers of this non-trivial piece of code).
>
Kevin or Jan..
ASSERT(...) is ok to me.. Could you help me give a brief comment? 

Quan
Jan Beulich Jan. 3, 2017, 7:34 a.m. UTC | #9
>>> On 23.12.16 at 13:24, <xuquan8@huawei.com> wrote:
> On December 22, 2016 4:12 PM, Jan Beulich wrote:
>>>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>>          * Set eoi_exit_bitmap for periodic timer interrup to cause
>>EOI-induced VM
>>>          * exit, then pending periodic time interrups have the chance
>>to be injected
>>>          * for compensation
>>> +        * Set eoi_exit_bitmap for intack.vector when it's higher than
>>pending
>>> +        * periodic time interrupts. This way we can guarantee there's
>>always a chance
>>> +        * to post periodic time interrupts when periodic time
>>interrupts becomes the
>>> +        * highest one
>>>          */
>>>          if (pt_vector != -1)
>>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>> +            vmx_set_eoi_exit_bitmap(v, intack.vector);
>>
>>The comment does not clarify why max(pt_vector, intack.vector) is not
>>needed. And I'd expect you to add ASSERT(intack.vector >= pt_vector) then,
>>to prove this (and one might argue that this addition could be sufficient
>>documentation, albeit perhaps a brief comment next to the assertion
>>would help readers of this non-trivial piece of code).
>>
> Kevin or Jan..
> ASSERT(...) is ok to me.. Could you help me give a brief comment? 

I don't see why you couldn't simply use what Kevin said in reply
to my earlier question regarding the lack of max() here.

Jan
Xuquan (Euler) Jan. 3, 2017, 8:15 a.m. UTC | #10
>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@suse.com]
>Sent: Tuesday, January 03, 2017 3:35 PM
>To: Xuquan (Quan Xu)
>Cc: Andrew Cooper; George Dunlap; yang.zhang.wz@gmail.com; Chao Gao;
>Jun Nakajima; Kevin Tian; Lan Tianyu; xen-devel@lists.xen.org
>Subject: RE: [PATCH v4] x86/apicv: fix RTC periodic timer and apicv issue
>
>>>> On 23.12.16 at 13:24, <xuquan8@huawei.com> wrote:
>> On December 22, 2016 4:12 PM, Jan Beulich wrote:
>>>>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
>>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>>> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>>>          * Set eoi_exit_bitmap for periodic timer interrup to cause
>>>EOI-induced VM
>>>>          * exit, then pending periodic time interrups have the
>>>> chance
>>>to be injected
>>>>          * for compensation
>>>> +        * Set eoi_exit_bitmap for intack.vector when it's higher
>>>> + than
>>>pending
>>>> +        * periodic time interrupts. This way we can guarantee
>>>> + there's
>>>always a chance
>>>> +        * to post periodic time interrupts when periodic time
>>>interrupts becomes the
>>>> +        * highest one
>>>>          */
>>>>          if (pt_vector != -1)
>>>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>>> +            vmx_set_eoi_exit_bitmap(v, intack.vector);
>>>
>>>The comment does not clarify why max(pt_vector, intack.vector) is not
>>>needed. And I'd expect you to add ASSERT(intack.vector >= pt_vector)
>>>then, to prove this (and one might argue that this addition could be
>>>sufficient documentation, albeit perhaps a brief comment next to the
>>>assertion would help readers of this non-trivial piece of code).
>>>
>> Kevin or Jan..
>> ASSERT(...) is ok to me.. Could you help me give a brief comment?
>
>I don't see why you couldn't simply use what Kevin said in reply to my
>earlier question regarding the lack of max() here.
>

What about """ Set EOI exit bitmap for any vector which may block injection of pt_vector - give a chance to recognize pt_vector in future intack and then do pt interrupts post"""?

Quan
Jan Beulich Jan. 3, 2017, 8:22 a.m. UTC | #11
>>> On 03.01.17 at 09:15, <xuquan8@huawei.com> wrote:

> 
>>-----Original Message-----
>>From: Jan Beulich [mailto:JBeulich@suse.com]
>>Sent: Tuesday, January 03, 2017 3:35 PM
>>To: Xuquan (Quan Xu)
>>Cc: Andrew Cooper; George Dunlap; yang.zhang.wz@gmail.com; Chao Gao;
>>Jun Nakajima; Kevin Tian; Lan Tianyu; xen-devel@lists.xen.org 
>>Subject: RE: [PATCH v4] x86/apicv: fix RTC periodic timer and apicv issue
>>
>>>>> On 23.12.16 at 13:24, <xuquan8@huawei.com> wrote:
>>> On December 22, 2016 4:12 PM, Jan Beulich wrote:
>>>>>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>>>> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>>>>          * Set eoi_exit_bitmap for periodic timer interrup to cause
>>>>EOI-induced VM
>>>>>          * exit, then pending periodic time interrups have the
>>>>> chance
>>>>to be injected
>>>>>          * for compensation
>>>>> +        * Set eoi_exit_bitmap for intack.vector when it's higher
>>>>> + than
>>>>pending
>>>>> +        * periodic time interrupts. This way we can guarantee
>>>>> + there's
>>>>always a chance
>>>>> +        * to post periodic time interrupts when periodic time
>>>>interrupts becomes the
>>>>> +        * highest one
>>>>>          */
>>>>>          if (pt_vector != -1)
>>>>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>>>> +            vmx_set_eoi_exit_bitmap(v, intack.vector);
>>>>
>>>>The comment does not clarify why max(pt_vector, intack.vector) is not
>>>>needed. And I'd expect you to add ASSERT(intack.vector >= pt_vector)
>>>>then, to prove this (and one might argue that this addition could be
>>>>sufficient documentation, albeit perhaps a brief comment next to the
>>>>assertion would help readers of this non-trivial piece of code).
>>>>
>>> Kevin or Jan..
>>> ASSERT(...) is ok to me.. Could you help me give a brief comment?
>>
>>I don't see why you couldn't simply use what Kevin said in reply to my
>>earlier question regarding the lack of max() here.
>>
> 
> What about """ Set EOI exit bitmap for any vector which may block injection 
> of pt_vector - give a chance to recognize pt_vector in future intack and then 
> do pt interrupts post"""?

To me that doesn't explain the absence of max(pt_vector, intack.vector)
here.

Jan
Xuquan (Euler) Jan. 4, 2017, 1:58 a.m. UTC | #12
On January 03, 2017 4:23 PM, Jan Beulich wrote:
>>>> On 03.01.17 at 09:15, <xuquan8@huawei.com> wrote:
>
>>
>>>-----Original Message-----
>>>From: Jan Beulich [mailto:JBeulich@suse.com]
>>>Sent: Tuesday, January 03, 2017 3:35 PM
>>>To: Xuquan (Quan Xu)
>>>Cc: Andrew Cooper; George Dunlap; yang.zhang.wz@gmail.com; Chao
>Gao;
>>>Jun Nakajima; Kevin Tian; Lan Tianyu; xen-devel@lists.xen.org
>>>Subject: RE: [PATCH v4] x86/apicv: fix RTC periodic timer and apicv
>>>issue
>>>
>>>>>> On 23.12.16 at 13:24, <xuquan8@huawei.com> wrote:
>>>> On December 22, 2016 4:12 PM, Jan Beulich wrote:
>>>>>>>> On 21.12.16 at 06:44, <xuquan8@huawei.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>>>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>>>>> @@ -315,9 +315,13 @@ void vmx_intr_assist(void)
>>>>>>          * Set eoi_exit_bitmap for periodic timer interrup to
>>>>>> cause
>>>>>EOI-induced VM
>>>>>>          * exit, then pending periodic time interrups have the
>>>>>> chance
>>>>>to be injected
>>>>>>          * for compensation
>>>>>> +        * Set eoi_exit_bitmap for intack.vector when it's higher
>>>>>> + than
>>>>>pending
>>>>>> +        * periodic time interrupts. This way we can guarantee
>>>>>> + there's
>>>>>always a chance
>>>>>> +        * to post periodic time interrupts when periodic time
>>>>>interrupts becomes the
>>>>>> +        * highest one
>>>>>>          */
>>>>>>          if (pt_vector != -1)
>>>>>> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>>>>> +            vmx_set_eoi_exit_bitmap(v, intack.vector);
>>>>>
>>>>>The comment does not clarify why max(pt_vector, intack.vector) is
>>>>>not needed. And I'd expect you to add ASSERT(intack.vector >=
>>>>>pt_vector) then, to prove this (and one might argue that this
>>>>>addition could be sufficient documentation, albeit perhaps a brief
>>>>>comment next to the assertion would help readers of this non-trivial
>piece of code).
>>>>>
>>>> Kevin or Jan..
>>>> ASSERT(...) is ok to me.. Could you help me give a brief comment?
>>>
>>>I don't see why you couldn't simply use what Kevin said in reply to my
>>>earlier question regarding the lack of max() here.
>>>
>>
>> What about """ Set EOI exit bitmap for any vector which may block
>> injection of pt_vector - give a chance to recognize pt_vector in
>> future intack and then do pt interrupts post"""?
>
>To me that doesn't explain the absence of max(pt_vector, intack.vector)
>here.
>

intack.vector is the highest priority vector..

Jan, thanks for your patience.. then, I'd say:
"""
intack.vector is the highest priority vector. So we set eoi_exit_bitmap
for intack.vector - give a chance to post periodic time interrupts when
periodic time interrupts become the highest one.
"""

If this is still not the comment you want, could you help me enhance it?

Quan
Jan Beulich Jan. 4, 2017, 9:17 a.m. UTC | #13
>>> On 04.01.17 at 02:58, <xuquan8@huawei.com> wrote:
> """
> intack.vector is the highest priority vector. So we set eoi_exit_bitmap
> for intack.vector - give a chance to post periodic time interrupts when
> periodic time interrupts become the highest one.
> """

Sounds fine.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 639a705..0cf26b4 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -315,9 +315,13 @@  void vmx_intr_assist(void)
         * Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced VM
         * exit, then pending periodic time interrups have the chance to be injected
         * for compensation
+        * Set eoi_exit_bitmap for intack.vector when it's higher than pending
+        * periodic time interrupts. This way we can guarantee there's always a chance
+        * to post periodic time interrupts when periodic time interrupts becomes the
+        * highest one
         */
         if (pt_vector != -1)
-            vmx_set_eoi_exit_bitmap(v, pt_vector);
+            vmx_set_eoi_exit_bitmap(v, intack.vector);

         /* we need update the RVI field */
         __vmread(GUEST_INTR_STATUS, &status);
@@ -334,7 +338,8 @@  void vmx_intr_assist(void)
             __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
         }

-        pt_intr_post(v, intack);
+        if ( intack.vector == pt_vector )
+            pt_intr_post(v, intack);
     }
     else
     {