diff mbox

[v3] x86/apicv: Enhance posted-interrupt processing

Message ID 1488419389-8016-1-git-send-email-chao.gao@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chao Gao March 2, 2017, 1:49 a.m. UTC
__vmx_deliver_posted_interrupt() wrongly used a softirq bit to decide whether
to suppress an IPI. Its logic was: the first time an IPI was sent, we set
the softirq bit. Next time, we would check that softirq bit before sending
another IPI. If the 1st IPI arrived at the pCPU which was in
non-root mode, the hardware would consume the IPI and sync PIR to vIRR.
During the process, no one (both hardware and software) will clear the
softirq bit. As a result, the following IPI would be wrongly suppressed.

This patch discards the suppression check, always sending IPI.
The softirq also need to be raised. But there is a little change.
This patch moves the place where we raise a softirq for
'cpu != smp_processor_id()' case to the IPI interrupt handler.
Namely, don't raise a softirq for this case and set the interrupt handler
to pi_notification_interrupt()(in which a softirq is raised) regardless of
posted interrupt enabled or not. The only difference is when the IPI arrives
at the pCPU which is happened in non-root mode, the patch will not raise a
useless softirq since the IPI is consumed by hardware rather than raise a
softirq unconditionally.

Quan doesn't have enough time to upstream this fix patch. He asks me to do
this. Merge another his related patch
(https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02885.html).

Signed-off-by: Quan Xu <xuquan8@huawei.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 56 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 7 deletions(-)

Comments

Chao Gao March 2, 2017, 4:28 a.m. UTC | #1
On Thu, Mar 02, 2017 at 02:41:55AM -0700, Jan Beulich wrote:
>>>> On 02.03.17 at 02:49, <chao.gao@intel.com> wrote:
>
>The patch title, btw, makes it looks like this isn't a bug fix, which is
>contrary to the understanding I've gained so far.

Thanks to your patience and your changing so much description for me. Also 
to the questions you asked. I agree to the comments i don't reply to.
How about this:
Fix a wrongly IPI suppression during posted interrupt delivery

>
>> __vmx_deliver_posted_interrupt() wrongly used a softirq bit to decide whether
>> to suppress an IPI. Its logic was: the first time an IPI was sent, we set
>> the softirq bit. Next time, we would check that softirq bit before sending
>> another IPI. If the 1st IPI arrived at the pCPU which was in
>> non-root mode, the hardware would consume the IPI and sync PIR to vIRR.
>> During the process, no one (both hardware and software) will clear the
>> softirq bit. As a result, the following IPI would be wrongly suppressed.
>> 
>> This patch discards the suppression check, always sending IPI.
>> The softirq also need to be raised. But there is a little change.
>> This patch moves the place where we raise a softirq for
>> 'cpu != smp_processor_id()' case to the IPI interrupt handler.
>> Namely, don't raise a softirq for this case and set the interrupt handler
>> to pi_notification_interrupt()(in which a softirq is raised) regardless of
>> posted interrupt enabled or not.
>
>As using a PI thing even in the non-PI case is counterintuitive, this
>needs expanding on imo. Maybe not here but in a code comment.
>Or perhaps, looking at the code, this is just not precise enough a
>description: The code is inside a cpu_has_vmx_posted_intr_processing
>conditional, and what you do is move code out of the iommu_intpost
>specific section. I.e. a reference to IOMMU may simply be missing
>here.

Yes. The right description is "regardless of VT-d PI enabled or not".

>
>> The only difference is when the IPI arrives
>> at the pCPU which is happened in non-root mode, the patch will not raise a
>
>s/patch/code/ (or some such)
>
>> useless softirq since the IPI is consumed by hardware rather than raise a
>> softirq unconditionally.
>> 
>> Quan doesn't have enough time to upstream this fix patch. He asks me to do
>> this. Merge another his related patch
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02885.html).
>
>This doesn't belong in the commit message, and hence should be after
>the first ---.
>

Will take care of this later.

>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1842,13 +1842,59 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>>      bool_t running = v->is_running;
>>  
>>      vcpu_unblock(v);
>> +    /*
>> +     * The underlying 'IF' excludes two cases which we don't need further
>
>Who or what is 'IF'?
>

I meaned the 'if sentence'.

>> +     * operation to make sure the target vCPU (@v) will sync PIR to vIRR ASAP.
>> +     * Specifically, the two cases are:
>> +     * 1. The target vCPU is not running, meaning it is blocked or runnable.
>> +     * Since we have unblocked the target vCPU above, the target vCPU will
>> +     * sync PIR to vIRR when it is chosen to run.
>> +     * 2. The target vCPU is the current vCPU and in_irq() is false. It means
>> +     * the function is called in noninterrupt context.
>
>     * 2. The target vCPU is the current vCPU and we're in
>     * non-interrupt context.
>
>> Since we don't call
>> +     * the function in noninterrupt context after the last time a vCPU syncs
>> +     * PIR to vIRR, excluding this case is also safe.
>
>It is not really clear to me what "the function" here refers to. Surely
>the function the comment is in won't call itself, no matter what
>context.

Yes, it is not clear. How about:
The target vCPU is the current vCPU and we're in non-interrupt context.
we can't be in the window between the call to vmx_intr_assist()
and interrupts getting disabled since no existed code reachs here in
non-interrupt context. Considering that, it is safe to just leave without
further operation. 

Do you think this is correct? I have an assumption here to explain why
(in_irq() || (v != current)) is reasonable. It is totally my guess.

>
>> +     */
>>      if ( running && (in_irq() || (v != current)) )
>>      {
>> +        /*
>> +         * Note: Only two cases will reach here:
>> +         * 1. The target vCPU is running on other pCPU.
>> +         * 2. The target vCPU is running on the same pCPU with the current
>> +         * vCPU
>
>This is an impossible thing: There can't be two vCPU-s running on
>the same pCPU-s at the same time. Hence ...
>
>> and the current vCPU is in interrupt context. That's to say,
>> +         * the target vCPU is the current vCPU.
>
>... just this last statement is sufficient here.
>
>> +         * Note2: Don't worry the v->processor may change since at least when
>> +         * the target vCPU is chosen to run or be blocked, there is a chance
>> +         * to sync PIR to vIRR.
>> +         */
>
>"There is a chance" reads as if it may or may not happen. How about
>"The vCPU being moved to another processor is guaranteed to sync
>PIR to vIRR, due to the involved scheduling cycle"? Of course only if
>this matches reality.

I think your description is right.

>
>>          unsigned int cpu = v->processor;
>>  
>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> -             && (cpu != smp_processor_id()) )
>> +        /*
>> +         * For case 1, we send a IPI to the pCPU. When the IPI arrives, the
>
>... an IPI ... (also at least once more below)
>
>> +         * target vCPU maybe is running in non-root mode, running in root
>> +         * mode, runnable or blocked. If the target vCPU is running in
>> +         * non-root mode, the hardware will sync PIR to vIRR for the IPI
>> +         * vector is special to the pCPU. If the target vCPU is running in
>> +         * root-mode, the interrupt handler starts to run. In order to make
>> +         * sure the target vCPU could go back to vmx_intr_assist(), the
>> +         * interrupt handler should raise a softirq if no pending softirq.
>
>I don't understand this part: Calling vmx_intr_assist() is part of any
>VM entry, so if we're in root mode, the function will necessarily be
>called. Are you perhaps worried about the window between the call
>to vmx_intr_assist() and interrupts getting disabled (or any other
>similar one - I didn't make an attempt to collect a complete set)? If
>so, that's what needs to be mentioned here.

Yes. I only recognized this window. Will mention this window in next version.

>
>> +         * If the target vCPU is runnable, it will sync PIR to vIRR next time
>> +         * it is chose to run. In this case, a IPI and a softirq is sent to
>> +         * a wrong vCPU which we think it is not a big issue. If the target
>
>Maybe "... which will not have any adverse effect"?
>
>> +         * vCPU is blocked, since vcpu_block() checks whether there is an
>> +         * event to be delivered through local_events_need_delivery() just
>> +         * after blocking, the vCPU must have synced PIR to vIRR. Similarly,
>> +         * there is a IPI and a softirq sent to a wrong vCPU.
>> +         */
>> +        if ( cpu != smp_process_id() )
>
>Did you not even build test this patch? I don't think the construct
>above compiles.

Really sorry. I forgot to commit after I fixed this.
Acturally, I found this, fixed it and tested this patch several times( with or
without VT-d PI) to avoid some obvious regression.

>
>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> +        /*
>> +         * For case 2, raising a softirq can cause vmx_intr_assist() where PIR
>> +         * has a chance to be synced to vIRR to be called. As an optimization,
>> +         * We only need to raise a softirq when no pending softirq.
>
>How about "As any softirq will do, as an optimization we only raise
>one if none is pending already"? Again, if this is a valid statement
>only, of course.

Your description is correct imo and better.

Thanks,
Chao

>
>Jan
Jan Beulich March 2, 2017, 9:41 a.m. UTC | #2
>>> On 02.03.17 at 02:49, <chao.gao@intel.com> wrote:

The patch title, btw, makes it looks like this isn't a bug fix, which is
contrary to the understanding I've gained so far.

> __vmx_deliver_posted_interrupt() wrongly used a softirq bit to decide whether
> to suppress an IPI. Its logic was: the first time an IPI was sent, we set
> the softirq bit. Next time, we would check that softirq bit before sending
> another IPI. If the 1st IPI arrived at the pCPU which was in
> non-root mode, the hardware would consume the IPI and sync PIR to vIRR.
> During the process, no one (both hardware and software) will clear the
> softirq bit. As a result, the following IPI would be wrongly suppressed.
> 
> This patch discards the suppression check, always sending IPI.
> The softirq also need to be raised. But there is a little change.
> This patch moves the place where we raise a softirq for
> 'cpu != smp_processor_id()' case to the IPI interrupt handler.
> Namely, don't raise a softirq for this case and set the interrupt handler
> to pi_notification_interrupt()(in which a softirq is raised) regardless of
> posted interrupt enabled or not.

As using a PI thing even in the non-PI case is counterintuitive, this
needs expanding on imo. Maybe not here but in a code comment.
Or perhaps, looking at the code, this is just not precise enough a
description: The code is inside a cpu_has_vmx_posted_intr_processing
conditional, and what you do is move code out of the iommu_intpost
specific section. I.e. a reference to IOMMU may simply be missing
here.

> The only difference is when the IPI arrives
> at the pCPU which is happened in non-root mode, the patch will not raise a

s/patch/code/ (or some such)

> useless softirq since the IPI is consumed by hardware rather than raise a
> softirq unconditionally.
> 
> Quan doesn't have enough time to upstream this fix patch. He asks me to do
> this. Merge another his related patch
> (https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02885.html).

This doesn't belong in the commit message, and hence should be after
the first ---.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1842,13 +1842,59 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>      bool_t running = v->is_running;
>  
>      vcpu_unblock(v);
> +    /*
> +     * The underlying 'IF' excludes two cases which we don't need further

Who or what is 'IF'?

> +     * operation to make sure the target vCPU (@v) will sync PIR to vIRR ASAP.
> +     * Specifically, the two cases are:
> +     * 1. The target vCPU is not running, meaning it is blocked or runnable.
> +     * Since we have unblocked the target vCPU above, the target vCPU will
> +     * sync PIR to vIRR when it is chosen to run.
> +     * 2. The target vCPU is the current vCPU and in_irq() is false. It means
> +     * the function is called in noninterrupt context.

     * 2. The target vCPU is the current vCPU and we're in
     * non-interrupt context.

> Since we don't call
> +     * the function in noninterrupt context after the last time a vCPU syncs
> +     * PIR to vIRR, excluding this case is also safe.

It is not really clear to me what "the function" here refers to. Surely
the function the comment is in won't call itself, no matter what
context.

> +     */
>      if ( running && (in_irq() || (v != current)) )
>      {
> +        /*
> +         * Note: Only two cases will reach here:
> +         * 1. The target vCPU is running on other pCPU.
> +         * 2. The target vCPU is running on the same pCPU with the current
> +         * vCPU

This is an impossible thing: There can't be two vCPU-s running on
the same pCPU-s at the same time. Hence ...

> and the current vCPU is in interrupt context. That's to say,
> +         * the target vCPU is the current vCPU.

... just this last statement is sufficient here.

> +         * Note2: Don't worry the v->processor may change since at least when
> +         * the target vCPU is chosen to run or be blocked, there is a chance
> +         * to sync PIR to vIRR.
> +         */

"There is a chance" reads as if it may or may not happen. How about
"The vCPU being moved to another processor is guaranteed to sync
PIR to vIRR, due to the involved scheduling cycle"? Of course only if
this matches reality.

>          unsigned int cpu = v->processor;
>  
> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> -             && (cpu != smp_processor_id()) )
> +        /*
> +         * For case 1, we send a IPI to the pCPU. When the IPI arrives, the

... an IPI ... (also at least once more below)

> +         * target vCPU maybe is running in non-root mode, running in root
> +         * mode, runnable or blocked. If the target vCPU is running in
> +         * non-root mode, the hardware will sync PIR to vIRR for the IPI
> +         * vector is special to the pCPU. If the target vCPU is running in
> +         * root-mode, the interrupt handler starts to run. In order to make
> +         * sure the target vCPU could go back to vmx_intr_assist(), the
> +         * interrupt handler should raise a softirq if no pending softirq.

I don't understand this part: Calling vmx_intr_assist() is part of any
VM entry, so if we're in root mode, the function will necessarily be
called. Are you perhaps worried about the window between the call
to vmx_intr_assist() and interrupts getting disabled (or any other
similar one - I didn't make an attempt to collect a complete set)? If
so, that's what needs to be mentioned here.

> +         * If the target vCPU is runnable, it will sync PIR to vIRR next time
> +         * it is chose to run. In this case, a IPI and a softirq is sent to
> +         * a wrong vCPU which we think it is not a big issue. If the target

Maybe "... which will not have any adverse effect"?

> +         * vCPU is blocked, since vcpu_block() checks whether there is an
> +         * event to be delivered through local_events_need_delivery() just
> +         * after blocking, the vCPU must have synced PIR to vIRR. Similarly,
> +         * there is a IPI and a softirq sent to a wrong vCPU.
> +         */
> +        if ( cpu != smp_process_id() )

Did you not even build test this patch? I don't think the construct
above compiles.

>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> +        /*
> +         * For case 2, raising a softirq can cause vmx_intr_assist() where PIR
> +         * has a chance to be synced to vIRR to be called. As an optimization,
> +         * We only need to raise a softirq when no pending softirq.

How about "As any softirq will do, as an optimization we only raise
one if none is pending already"? Again, if this is a valid statement
only, of course.

Jan
Jan Beulich March 2, 2017, 12:03 p.m. UTC | #3
>>> On 02.03.17 at 05:28, <chao.gao@intel.com> wrote:
> On Thu, Mar 02, 2017 at 02:41:55AM -0700, Jan Beulich wrote:
>>>>> On 02.03.17 at 02:49, <chao.gao@intel.com> wrote:
>>
>>The patch title, btw, makes it looks like this isn't a bug fix, which is
>>contrary to the understanding I've gained so far.
> 
> Thanks to your patience and your changing so much description for me. Also 
> to the questions you asked. I agree to the comments i don't reply to.
> How about this:
> Fix a wrongly IPI suppression during posted interrupt delivery

fix wrong IPI suppression during posted interrupt delivery

>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1842,13 +1842,59 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>>>      bool_t running = v->is_running;
>>>  
>>>      vcpu_unblock(v);
>>> +    /*
>>> +     * The underlying 'IF' excludes two cases which we don't need further
>>
>>Who or what is 'IF'?
>>
> 
> I meaned the 'if sentence'.

I'm sorry, but this doesn't make it any more clear. DYM some if()
statement? But then why "underlying"? Was that meant to be
"following"?

>>> +     * operation to make sure the target vCPU (@v) will sync PIR to vIRR ASAP.
>>> +     * Specifically, the two cases are:
>>> +     * 1. The target vCPU is not running, meaning it is blocked or runnable.
>>> +     * Since we have unblocked the target vCPU above, the target vCPU will
>>> +     * sync PIR to vIRR when it is chosen to run.
>>> +     * 2. The target vCPU is the current vCPU and in_irq() is false. It means
>>> +     * the function is called in noninterrupt context.
>>
>>     * 2. The target vCPU is the current vCPU and we're in
>>     * non-interrupt context.
>>
>>> Since we don't call
>>> +     * the function in noninterrupt context after the last time a vCPU syncs
>>> +     * PIR to vIRR, excluding this case is also safe.
>>
>>It is not really clear to me what "the function" here refers to. Surely
>>the function the comment is in won't call itself, no matter what
>>context.
> 
> Yes, it is not clear. How about:
> The target vCPU is the current vCPU and we're in non-interrupt context.
> we can't be in the window between the call to vmx_intr_assist()

"we can't be" looks misleading. Perhaps "We're not being called from
the window ..."?

> and interrupts getting disabled since no existed code reachs here in
> non-interrupt context. Considering that, it is safe to just leave without
> further operation. 
> 
> Do you think this is correct? I have an assumption here to explain why
> (in_irq() || (v != current)) is reasonable. It is totally my guess.

As suggested earlier, feel free to simply refer to vcpu_kick() doing
the same.

Jan
Tian, Kevin March 3, 2017, 8:01 a.m. UTC | #4
> From: Gao, Chao
> Sent: Thursday, March 02, 2017 12:28 PM
> 
> On Thu, Mar 02, 2017 at 02:41:55AM -0700, Jan Beulich wrote:
> >>>> On 02.03.17 at 02:49, <chao.gao@intel.com> wrote:
> >> +        if ( cpu != smp_process_id() )
> >
> >Did you not even build test this patch? I don't think the construct
> >above compiles.
> 
> Really sorry. I forgot to commit after I fixed this.
> Acturally, I found this, fixed it and tested this patch several times( with or
> without VT-d PI) to avoid some obvious regression.
> 

I'm OK with the code change with above line being fixed. Since I'll
take vacation next week, I can give my ack here instead of being
bottle-neck given likely next version is only for comment change
(hope Jan will help ack that part):

Acked-by: Kevin Tian <kevin.tian@intel.com>

of course if next version has any functional change, then please
ignore this ack. :-)

Thanks
Kevin
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5b1717d..9db4bd0 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1842,13 +1842,59 @@  static void __vmx_deliver_posted_interrupt(struct vcpu *v)
     bool_t running = v->is_running;
 
     vcpu_unblock(v);
+    /*
+     * The underlying 'IF' excludes two cases which we don't need further
+     * operation to make sure the target vCPU (@v) will sync PIR to vIRR ASAP.
+     * Specifically, the two cases are:
+     * 1. The target vCPU is not running, meaning it is blocked or runnable.
+     * Since we have unblocked the target vCPU above, the target vCPU will
+     * sync PIR to vIRR when it is chosen to run.
+     * 2. The target vCPU is the current vCPU and in_irq() is false. It means
+     * the function is called in noninterrupt context. Since we don't call
+     * the function in noninterrupt context after the last time a vCPU syncs
+     * PIR to vIRR, excluding this case is also safe.
+     */
     if ( running && (in_irq() || (v != current)) )
     {
+        /*
+         * Note: Only two cases will reach here:
+         * 1. The target vCPU is running on other pCPU.
+         * 2. The target vCPU is running on the same pCPU with the current
+         * vCPU and the current vCPU is in interrupt context. That's to say,
+         * the target vCPU is the current vCPU.
+         *
+         * Note2: Don't worry the v->processor may change since at least when
+         * the target vCPU is chosen to run or be blocked, there is a chance
+         * to sync PIR to vIRR.
+         */
         unsigned int cpu = v->processor;
 
-        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
-             && (cpu != smp_processor_id()) )
+        /*
+         * For case 1, we send a IPI to the pCPU. When the IPI arrives, the
+         * target vCPU maybe is running in non-root mode, running in root
+         * mode, runnable or blocked. If the target vCPU is running in
+         * non-root mode, the hardware will sync PIR to vIRR for the IPI
+         * vector is special to the pCPU. If the target vCPU is running in
+         * root-mode, the interrupt handler starts to run. In order to make
+         * sure the target vCPU could go back to vmx_intr_assist(), the
+         * interrupt handler should raise a softirq if no pending softirq.
+         * If the target vCPU is runnable, it will sync PIR to vIRR next time
+         * it is chose to run. In this case, a IPI and a softirq is sent to
+         * a wrong vCPU which we think it is not a big issue. If the target
+         * vCPU is blocked, since vcpu_block() checks whether there is an
+         * event to be delivered through local_events_need_delivery() just
+         * after blocking, the vCPU must have synced PIR to vIRR. Similarly,
+         * there is a IPI and a softirq sent to a wrong vCPU.
+         */
+        if ( cpu != smp_process_id() )
             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
+        /*
+         * For case 2, raising a softirq can cause vmx_intr_assist() where PIR
+         * has a chance to be synced to vIRR to be called. As an optimization,
+         * We only need to raise a softirq when no pending softirq.
+         */
+        else if ( !softirq_pending(cpu) )
+            raise_softirq(VCPU_KICK_SOFTIRQ);
     }
 }
 
@@ -2281,13 +2327,9 @@  const struct hvm_function_table * __init start_vmx(void)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
+        alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
         if ( iommu_intpost )
-        {
-            alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
             alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
-        }
-        else
-            alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
     }
     else
     {