[for-4.13] x86/vmx: always sync PIR to IRR before vmentry
diff mbox series

Message ID 20191118101600.94645-1-roger.pau@citrix.com
State Superseded
Headers show
Series
  • [for-4.13] x86/vmx: always sync PIR to IRR before vmentry
Related show

Commit Message

Roger Pau Monne Nov. 18, 2019, 10:16 a.m. UTC
When using posted interrupts on Intel hardware it's possible that the
vCPU resumes execution with a stale local APIC IRR register because
depending on the interrupts to be injected vlapic_has_pending_irq
might not be called, and thus PIR won't be synced into IRR.

Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
regardless of what interrupts are pending.

While there also simplify the code in __vmx_deliver_posted_interrupt:
there's no need to raise a softirq if the destination vCPU is the one
currently running on the pCPU. The sync of PIR into IRR will be
performed by vmx_intr_assist regardless of whether there's a softirq
pending.

Reported-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Joe Jin <joe.jin@oracle.com>
---
 xen/arch/x86/hvm/vmx/intr.c       |  8 +++++
 xen/arch/x86/hvm/vmx/vmx.c        | 60 ++++++++++---------------------
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 28 insertions(+), 41 deletions(-)

Comments

Jan Beulich Nov. 18, 2019, 12:01 p.m. UTC | #1
On 18.11.2019 11:16, Roger Pau Monne wrote:
> When using posted interrupts on Intel hardware it's possible that the
> vCPU resumes execution with a stale local APIC IRR register because
> depending on the interrupts to be injected vlapic_has_pending_irq
> might not be called, and thus PIR won't be synced into IRR.
> 
> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> regardless of what interrupts are pending.

For this part, did you consider pulling ahead to the beginning
of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
I ask because ...

> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -232,6 +232,14 @@ void vmx_intr_assist(void)
>      enum hvm_intblk intblk;
>      int pt_vector;
>  
> +    if ( cpu_has_vmx_posted_intr_processing )
> +        /*
> +         * Always force PIR to be synced to IRR before vmentry, this is also
> +         * done by vlapic_has_pending_irq but it's possible other pending
> +         * interrupts prevent the execution of that function.
> +         */
> +        vmx_sync_pir_to_irr(v);

... this addition looks more like papering over some issue than
actually taking care of it.

Then again I wonder whether the PIR->IRR sync is actually
legitimate to perform when v != current. If it's not, then there
might be a wider set of problems (see e.g.
hvm_local_events_need_delivery()). But of course the adjustment
to hvm_vcpu_has_pending_irq() could also be to make the call
early only when v == current.

A last question is that on the consequences of overly aggressive
sync-ing - that'll harm performance, but shouldn't affect
correctness if I'm not mistaken.

Jan
Jan Beulich Nov. 18, 2019, 12:26 p.m. UTC | #2
On 18.11.2019 11:16, Roger Pau Monne wrote:
> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
>       * context.
>       */
> -    if ( running && (in_irq() || (v != current)) )
> -    {
> +    if ( vcpu_runnable(v) && v != current )

I'm afraid you need to be more careful with the running vs runnable
distinction here. The comment above here becomes stale with the
change (also wrt the removal of in_irq(), which I'm at least uneasy
about), and the new commentary below also largely says/assumes
"running", not "runnable".

In general I think "runnable" is the more appropriate state to
check for, as "running" is even more likely to change behind our
backs. But of course there are caveats: When observing "running",
v->processor is sufficiently certain to hold the pCPU the vCPU is
running on or has been running on last. For "runnable" that's
less helpful, because by the time you look at v->processor it may
already have changed to whichever the vCPU is (about to be)
running on.

>          /*
> -         * Note: Only two cases will reach here:
> -         * 1. The target vCPU is running on other pCPU.
> -         * 2. The target vCPU is the current vCPU.
> +         * If the vCPU is running on another pCPU send an IPI to the pCPU. When
> +         * the IPI arrives, the target vCPU may be running in non-root mode,
> +         * running in root mode, runnable or blocked. If the target vCPU is
> +         * running in non-root mode, the hardware will sync PIR to vIRR for
> +         * 'posted_intr_vector' is a special vector handled directly by the
> +         * hardware.
>           *
> -         * Note2: Don't worry the v->processor may change. The vCPU being
> -         * moved to another processor is guaranteed to sync PIR to vIRR,
> -         * due to the involved scheduling cycle.
> +         * If the target vCPU is running in root-mode, the interrupt handler
> +         * starts to run.  Considering an IPI may arrive in the window between
> +         * the call to vmx_intr_assist() and interrupts getting disabled, the
> +         * interrupt handler should raise a softirq to ensure events will be
> +         * delivered in time.
>           */
> -        unsigned int cpu = v->processor;
> +        send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
>  
> -        /*
> -         * For case 1, we send an IPI to the pCPU. When an IPI arrives, the
> -         * target vCPU maybe is running in non-root mode, running in root
> -         * mode, runnable or blocked. If the target vCPU is running in
> -         * non-root mode, the hardware will sync PIR to vIRR for
> -         * 'posted_intr_vector' is special to the pCPU. If the target vCPU is
> -         * running in root-mode, the interrupt handler starts to run.
> -         * Considering an IPI may arrive in the window between the call to
> -         * vmx_intr_assist() and interrupts getting disabled, the interrupt
> -         * handler should raise a softirq to ensure events will be delivered
> -         * in time. If the target vCPU is runnable, it will sync PIR to
> -         * vIRR next time it is chose to run. In this case, a IPI and a
> -         * softirq is sent to a wrong vCPU which will not have any adverse
> -         * effect. If the target vCPU is blocked, since vcpu_block() checks
> -         * whether there is an event to be delivered through
> -         * local_events_need_delivery() just after blocking, the vCPU must
> -         * have synced PIR to vIRR. Similarly, there is a IPI and a softirq
> -         * sent to a wrong vCPU.
> -         */
> -        if ( cpu != smp_processor_id() )
> -            send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> -        /*
> -         * For case 2, raising a softirq ensures PIR will be synced to vIRR.
> -         * As any softirq will do, as an optimization we only raise one if
> -         * none is pending already.
> -         */
> -        else if ( !softirq_pending(cpu) )
> -            raise_softirq(VCPU_KICK_SOFTIRQ);
> -    }
> +    /*
> +     * If the vCPU is not runnable or if it's the one currently running in this
> +     * pCPU there's nothing to do, the vmentry code will already sync the PIR
> +     * to IRR when resuming execution.
> +     */
>  }

Just for my own understanding - the "already" here relates to the code
addition you make to vmx_intr_assist()?

And then - is this true even for an interrupt hitting between
vmx_intr_assist() returning and the subsequent CLI in
vmx_asm_vmexit_handler()?

Jan
Roger Pau Monne Nov. 18, 2019, 1:46 p.m. UTC | #3
On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > When using posted interrupts on Intel hardware it's possible that the
> > vCPU resumes execution with a stale local APIC IRR register because
> > depending on the interrupts to be injected vlapic_has_pending_irq
> > might not be called, and thus PIR won't be synced into IRR.
> > 
> > Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> > regardless of what interrupts are pending.
> 
> For this part, did you consider pulling ahead to the beginning
> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?

I assumed the order in hvm_vcpu_has_pending_irq is there for a reason.
I could indeed move vlapic_has_pending_irq to the top, but then either
the result is discarded if for example a NMI is pending injection
(in which case there's no need to go through all the logic in
vlapic_has_pending_irq), or we invert the priority of event
injection.

I have to admit I have doubts about the code in
hvm_vcpu_has_pending_irq. I'm not sure what's the motivation for
example to give priority to HVMIRQ_callback_vector over other vectors
from the lapic.

> I ask because ...
> 
> > --- a/xen/arch/x86/hvm/vmx/intr.c
> > +++ b/xen/arch/x86/hvm/vmx/intr.c
> > @@ -232,6 +232,14 @@ void vmx_intr_assist(void)
> >      enum hvm_intblk intblk;
> >      int pt_vector;
> >  
> > +    if ( cpu_has_vmx_posted_intr_processing )
> > +        /*
> > +         * Always force PIR to be synced to IRR before vmentry, this is also
> > +         * done by vlapic_has_pending_irq but it's possible other pending
> > +         * interrupts prevent the execution of that function.
> > +         */
> > +        vmx_sync_pir_to_irr(v);
> 
> ... this addition looks more like papering over some issue than
> actually taking care of it.

Xen needs to make sure PIR is synced to IRR before entering
non-root mode. I could place the call somewhere else, or alternatively
Xen could disable interrupts, send a self-ipi with the posted vector
and enter non-root mode. That should IMO force a resync of PIR to IRR
when resuming vCPU execution, but is overly complicated.

> Then again I wonder whether the PIR->IRR sync is actually
> legitimate to perform when v != current.

IMO this is fine as long as the vCPU is not running, as in that case
the hardware is not in control of IRR.

> If it's not, then there
> might be a wider set of problems (see e.g.
> hvm_local_events_need_delivery()). But of course the adjustment
> to hvm_vcpu_has_pending_irq() could also be to make the call
> early only when v == current.

I don't think we should be that restrictive, v == current ||
!vcpu_runable(v) ought to be safe. I've also forgot to send my
pre-patch to introduce an assert to that effect:

https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html

> A last question is that on the consequences of overly aggressive
> sync-ing - that'll harm performance, but shouldn't affect
> correctness if I'm not mistaken.

That's correct, as long as the vcpu is the current one or it's not
running.

Roger.
Jan Beulich Nov. 18, 2019, 2 p.m. UTC | #4
On 18.11.2019 14:46, Roger Pau Monné  wrote:
> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
>> On 18.11.2019 11:16, Roger Pau Monne wrote:
>>> When using posted interrupts on Intel hardware it's possible that the
>>> vCPU resumes execution with a stale local APIC IRR register because
>>> depending on the interrupts to be injected vlapic_has_pending_irq
>>> might not be called, and thus PIR won't be synced into IRR.
>>>
>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
>>> regardless of what interrupts are pending.
>>
>> For this part, did you consider pulling ahead to the beginning
>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> 
> I assumed the order in hvm_vcpu_has_pending_irq is there for a reason.
> I could indeed move vlapic_has_pending_irq to the top, but then either
> the result is discarded if for example a NMI is pending injection
> (in which case there's no need to go through all the logic in
> vlapic_has_pending_irq), or we invert the priority of event
> injection.

Changing the order of events injected is not an option afaict. The
pointless processing done is a valid concern, yet the suggestion
was specifically to have (part of) this processing to occur early.
The discarding of the result, in turn, is not a problem afaict, as
a subsequent call will return the same result (unless a higher
priority interrupt has surfaced in the meantime).

> I have to admit I have doubts about the code in
> hvm_vcpu_has_pending_irq. I'm not sure what's the motivation for
> example to give priority to HVMIRQ_callback_vector over other vectors
> from the lapic.

I vaguely recall there being a reason, but I guess it would take
some git archaeology to find out.

>> I ask because ...
>>
>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>> @@ -232,6 +232,14 @@ void vmx_intr_assist(void)
>>>      enum hvm_intblk intblk;
>>>      int pt_vector;
>>>  
>>> +    if ( cpu_has_vmx_posted_intr_processing )
>>> +        /*
>>> +         * Always force PIR to be synced to IRR before vmentry, this is also
>>> +         * done by vlapic_has_pending_irq but it's possible other pending
>>> +         * interrupts prevent the execution of that function.
>>> +         */
>>> +        vmx_sync_pir_to_irr(v);
>>
>> ... this addition looks more like papering over some issue than
>> actually taking care of it.
> 
> Xen needs to make sure PIR is synced to IRR before entering
> non-root mode. I could place the call somewhere else, or alternatively
> Xen could disable interrupts, send a self-ipi with the posted vector
> and enter non-root mode. That should IMO force a resync of PIR to IRR
> when resuming vCPU execution, but is overly complicated.

Indeed, further complicating things can't be the goal. But
finding the most suitable place to make the call might still be
worthwhile.

>> Then again I wonder whether the PIR->IRR sync is actually
>> legitimate to perform when v != current.
> 
> IMO this is fine as long as the vCPU is not running, as in that case
> the hardware is not in control of IRR.

Here and ...

>> If it's not, then there
>> might be a wider set of problems (see e.g.
>> hvm_local_events_need_delivery()). But of course the adjustment
>> to hvm_vcpu_has_pending_irq() could also be to make the call
>> early only when v == current.
> 
> I don't think we should be that restrictive, v == current ||
> !vcpu_runable(v) ought to be safe. I've also forgot to send my
> pre-patch to introduce an assert to that effect:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html
> 
>> A last question is that on the consequences of overly aggressive
>> sync-ing - that'll harm performance, but shouldn't affect
>> correctness if I'm not mistaken.
> 
> That's correct, as long as the vcpu is the current one or it's not
> running.

... here I continue to be worried of races: Any check for a vCPU to
be non-running (or non-runnable) is stale the moment you inspect the
result of the check. Unless, of course, you suppress scheduling
(actions potentially making a vCPU runnable) during that time window.

Jan
Roger Pau Monne Nov. 18, 2019, 2:03 p.m. UTC | #5
On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> >       * 2. The target vCPU is the current vCPU and we're in non-interrupt
> >       * context.
> >       */
> > -    if ( running && (in_irq() || (v != current)) )
> > -    {
> > +    if ( vcpu_runnable(v) && v != current )
> 
> I'm afraid you need to be more careful with the running vs runnable
> distinction here. The comment above here becomes stale with the
> change (also wrt the removal of in_irq(), which I'm at least uneasy
> about), and the new commentary below also largely says/assumes
> "running", not "runnable".

I've missed to fix that comment, will take care in the next version.
Note also that the comment is quite pointless, it only states what the
code below is supposed to do, but doesn't give any reasoning as to why
in_irq is relevant here.

TBH I'm not sure of the point of the in_irq check, I don't think it's
relevant for the code here. It only matters whether the vCPU is
running, and whether it's running in this pCPU. Whether
__vmx_deliver_posted_interrupt is called from an irq context is
irrelevant AFAICT.

> 
> In general I think "runnable" is the more appropriate state to
> check for, as "running" is even more likely to change behind our
> backs. But of course there are caveats: When observing "running",
> v->processor is sufficiently certain to hold the pCPU the vCPU is
> running on or has been running on last. For "runnable" that's
> less helpful, because by the time you look at v->processor it may
> already have changed to whichever the vCPU is (about to be)
> running on.

I think this is fine. In fact it would also be fine from a safety PoV
to always send a posted interrupt IPI to the pCPU in v->processor: if
the vCPU is running PIR will be synced into IRR, otherwise a dummy
softirq will be recorded and PIR will be synced into IRR before
entering non-root mode.

v->is_running might be better in order to prevent sending pointless
IPIs around, will fix in v2, since we would only like to send the IPI
when the vCPU is executing on a pCPU.

> >          /*
> > -         * Note: Only two cases will reach here:
> > -         * 1. The target vCPU is running on other pCPU.
> > -         * 2. The target vCPU is the current vCPU.
> > +         * If the vCPU is running on another pCPU send an IPI to the pCPU. When
> > +         * the IPI arrives, the target vCPU may be running in non-root mode,
> > +         * running in root mode, runnable or blocked. If the target vCPU is
> > +         * running in non-root mode, the hardware will sync PIR to vIRR for
> > +         * 'posted_intr_vector' is a special vector handled directly by the
> > +         * hardware.
> >           *
> > -         * Note2: Don't worry the v->processor may change. The vCPU being
> > -         * moved to another processor is guaranteed to sync PIR to vIRR,
> > -         * due to the involved scheduling cycle.
> > +         * If the target vCPU is running in root-mode, the interrupt handler
> > +         * starts to run.  Considering an IPI may arrive in the window between
> > +         * the call to vmx_intr_assist() and interrupts getting disabled, the
> > +         * interrupt handler should raise a softirq to ensure events will be
> > +         * delivered in time.
> >           */
> > -        unsigned int cpu = v->processor;
> > +        send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
> >  
> > -        /*
> > -         * For case 1, we send an IPI to the pCPU. When an IPI arrives, the
> > -         * target vCPU maybe is running in non-root mode, running in root
> > -         * mode, runnable or blocked. If the target vCPU is running in
> > -         * non-root mode, the hardware will sync PIR to vIRR for
> > -         * 'posted_intr_vector' is special to the pCPU. If the target vCPU is
> > -         * running in root-mode, the interrupt handler starts to run.
> > -         * Considering an IPI may arrive in the window between the call to
> > -         * vmx_intr_assist() and interrupts getting disabled, the interrupt
> > -         * handler should raise a softirq to ensure events will be delivered
> > -         * in time. If the target vCPU is runnable, it will sync PIR to
> > -         * vIRR next time it is chose to run. In this case, a IPI and a
> > -         * softirq is sent to a wrong vCPU which will not have any adverse
> > -         * effect. If the target vCPU is blocked, since vcpu_block() checks
> > -         * whether there is an event to be delivered through
> > -         * local_events_need_delivery() just after blocking, the vCPU must
> > -         * have synced PIR to vIRR. Similarly, there is a IPI and a softirq
> > -         * sent to a wrong vCPU.
> > -         */
> > -        if ( cpu != smp_processor_id() )
> > -            send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> > -        /*
> > -         * For case 2, raising a softirq ensures PIR will be synced to vIRR.
> > -         * As any softirq will do, as an optimization we only raise one if
> > -         * none is pending already.
> > -         */
> > -        else if ( !softirq_pending(cpu) )
> > -            raise_softirq(VCPU_KICK_SOFTIRQ);
> > -    }
> > +    /*
> > +     * If the vCPU is not runnable or if it's the one currently running in this
> > +     * pCPU there's nothing to do, the vmentry code will already sync the PIR
> > +     * to IRR when resuming execution.
> > +     */
> >  }
> 
> Just for my own understanding - the "already" here relates to the code
> addition you make to vmx_intr_assist()?

Yes.

> And then - is this true even for an interrupt hitting between
> vmx_intr_assist() returning and the subsequent CLI in
> vmx_asm_vmexit_handler()?

Should be, as that IPI will raise a softirq that would force a jump
into vmx_process_softirqs, which in turn will jump to the start of
vmx_do_vmentry, thus calling vmx_intr_assist again.

Thanks, Roger.
Jan Beulich Nov. 18, 2019, 2:19 p.m. UTC | #6
On 18.11.2019 15:03, Roger Pau Monné  wrote:
> On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
>> On 18.11.2019 11:16, Roger Pau Monne wrote:
>>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>>>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
>>>       * context.
>>>       */
>>> -    if ( running && (in_irq() || (v != current)) )
>>> -    {
>>> +    if ( vcpu_runnable(v) && v != current )
>>
>> I'm afraid you need to be more careful with the running vs runnable
>> distinction here. The comment above here becomes stale with the
>> change (also wrt the removal of in_irq(), which I'm at least uneasy
>> about), and the new commentary below also largely says/assumes
>> "running", not "runnable".
> 
> I've missed to fix that comment, will take care in the next version.
> Note also that the comment is quite pointless, it only states what the
> code below is supposed to do, but doesn't give any reasoning as to why
> in_irq is relevant here.

It's main "value" is to refer to vcpu_kick(), which has ...

> TBH I'm not sure of the point of the in_irq check, I don't think it's
> relevant for the code here.

... a similar in_irq() check. Sadly that one, while having a bigger
comment, also doesn't explain what it's needed for. It looks like I
should recall the reason, but I'm sorry - I don't right now.

Jan
Roger Pau Monne Nov. 18, 2019, 2:20 p.m. UTC | #7
On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
> On 18.11.2019 14:46, Roger Pau Monné  wrote:
> > On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> >> On 18.11.2019 11:16, Roger Pau Monne wrote:
> >>> When using posted interrupts on Intel hardware it's possible that the
> >>> vCPU resumes execution with a stale local APIC IRR register because
> >>> depending on the interrupts to be injected vlapic_has_pending_irq
> >>> might not be called, and thus PIR won't be synced into IRR.
> >>>
> >>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> >>> regardless of what interrupts are pending.
> >>
> >> For this part, did you consider pulling ahead to the beginning
> >> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> > 
> > I assumed the order in hvm_vcpu_has_pending_irq is there for a reason.
> > I could indeed move vlapic_has_pending_irq to the top, but then either
> > the result is discarded if for example a NMI is pending injection
> > (in which case there's no need to go through all the logic in
> > vlapic_has_pending_irq), or we invert the priority of event
> > injection.
> 
> Changing the order of events injected is not an option afaict. The
> pointless processing done is a valid concern, yet the suggestion
> was specifically to have (part of) this processing to occur early.
> The discarding of the result, in turn, is not a problem afaict, as
> a subsequent call will return the same result (unless a higher
> priority interrupt has surfaced in the meantime).

Yes, that's fine. So you would prefer to move the call to
vlapic_has_pending_irq before any exit path in
hvm_vcpu_has_pending_irq?

> >> Then again I wonder whether the PIR->IRR sync is actually
> >> legitimate to perform when v != current.
> > 
> > IMO this is fine as long as the vCPU is not running, as in that case
> > the hardware is not in control of IRR.
> 
> Here and ...
> 
> >> If it's not, then there
> >> might be a wider set of problems (see e.g.
> >> hvm_local_events_need_delivery()). But of course the adjustment
> >> to hvm_vcpu_has_pending_irq() could also be to make the call
> >> early only when v == current.
> > 
> > I don't think we should be that restrictive, v == current ||
> > !vcpu_runable(v) ought to be safe. I've also forgot to send my
> > pre-patch to introduce an assert to that effect:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html
> > 
> >> A last question is that on the consequences of overly aggressive
> >> sync-ing - that'll harm performance, but shouldn't affect
> >> correctness if I'm not mistaken.
> > 
> > That's correct, as long as the vcpu is the current one or it's not
> > running.
> 
> ... here I continue to be worried of races: Any check for a vCPU to
> be non-running (or non-runnable) is stale the moment you inspect the
> result of the check. Unless, of course, you suppress scheduling
> (actions potentially making a vCPU runnable) during that time window.

Hm, it's indeed true that syncing PIR into IRR for a vCPU not running
in the current pCPU is troublesome. Maybe syncing PIR into IRR should
only be done when v == current?

The only alternative I can think of is something like:

if ( v != current )
    vcpu_pause(v);
sync_pir_irr(v);
if ( v != current )
    vcpu_unpause(v);

Is there a need to check the IRR of vCPUs that are not running, and
does it matter if the IRR is not fully updated in that case?

Thanks, Roger.
Roger Pau Monne Nov. 18, 2019, 2:55 p.m. UTC | #8
On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> On 18.11.2019 15:03, Roger Pau Monné  wrote:
> > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
> >> On 18.11.2019 11:16, Roger Pau Monne wrote:
> >>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> >>>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
> >>>       * context.
> >>>       */
> >>> -    if ( running && (in_irq() || (v != current)) )
> >>> -    {
> >>> +    if ( vcpu_runnable(v) && v != current )
> >>
> >> I'm afraid you need to be more careful with the running vs runnable
> >> distinction here. The comment above here becomes stale with the
> >> change (also wrt the removal of in_irq(), which I'm at least uneasy
> >> about), and the new commentary below also largely says/assumes
> >> "running", not "runnable".
> > 
> > I've missed to fix that comment, will take care in the next version.
> > Note also that the comment is quite pointless, it only states what the
> > code below is supposed to do, but doesn't give any reasoning as to why
> > in_irq is relevant here.
> 
> It's main "value" is to refer to vcpu_kick(), which has ...
> 
> > TBH I'm not sure of the point of the in_irq check, I don't think it's
> > relevant for the code here.
> 
> ... a similar in_irq() check. Sadly that one, while having a bigger
> comment, also doesn't explain what it's needed for. It looks like I
> should recall the reason, but I'm sorry - I don't right now.

By reading the message of the commit that introduced the in_irq check
in vcpu_kick:

"The drawback is that {vmx,svm}_intr_assist() now races new event
notifications delivered by IRQ or IPI. We close down this race by
having vcpu_kick() send a dummy softirq -- this gets picked up in
IRQ-sage context and will cause retry of *_intr_assist(). We avoid
delivering the softirq where possible by avoiding it when we are
running in the non-IRQ context of the VCPU to be kicked."

AFAICT in the vcpu_kick case this is done because the softirq should
only be raised when in IRQ context in order to trigger the code in
vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant
if vcpu_kick is issued from an irq handler executed after
vmx_intr_assist and before the disabling interrupts in
vmx_do_vmentry.

I think we need something along the lines of:

if ( v->is_running && v != current )
    send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) )
    raise_softirq(VCPU_KICK_SOFTIRQ);

So that vmx_intr_assist is also retried if a vector is signaled in PIR
on the vCPU currently running between the call to vmx_intr_assist and
the disabling of interrupts in vmx_do_vmentry.

Thanks, Roger.
Jan Beulich Nov. 18, 2019, 4 p.m. UTC | #9
On 18.11.2019 15:20, Roger Pau Monné  wrote:
> On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
>> On 18.11.2019 14:46, Roger Pau Monné  wrote:
>>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
>>>> On 18.11.2019 11:16, Roger Pau Monne wrote:
>>>>> When using posted interrupts on Intel hardware it's possible that the
>>>>> vCPU resumes execution with a stale local APIC IRR register because
>>>>> depending on the interrupts to be injected vlapic_has_pending_irq
>>>>> might not be called, and thus PIR won't be synced into IRR.
>>>>>
>>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
>>>>> regardless of what interrupts are pending.
>>>>
>>>> For this part, did you consider pulling ahead to the beginning
>>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
>>>
>>> I assumed the order in hvm_vcpu_has_pending_irq is there for a reason.
>>> I could indeed move vlapic_has_pending_irq to the top, but then either
>>> the result is discarded if for example a NMI is pending injection
>>> (in which case there's no need to go through all the logic in
>>> vlapic_has_pending_irq), or we invert the priority of event
>>> injection.
>>
>> Changing the order of events injected is not an option afaict. The
>> pointless processing done is a valid concern, yet the suggestion
>> was specifically to have (part of) this processing to occur early.
>> The discarding of the result, in turn, is not a problem afaict, as
>> a subsequent call will return the same result (unless a higher
>> priority interrupt has surfaced in the meantime).
> 
> Yes, that's fine. So you would prefer to move the call to
> vlapic_has_pending_irq before any exit path in
> hvm_vcpu_has_pending_irq?

"Prefer" isn't really the way I would put it. I'd like this to be
considered as an alternative because, as said, I think the current
placement look more like a plaster than a cure. I'm also open for
other suggestions. But first of all I'd like to see what the VMX
maintainers think.

>>>> Then again I wonder whether the PIR->IRR sync is actually
>>>> legitimate to perform when v != current.
>>>
>>> IMO this is fine as long as the vCPU is not running, as in that case
>>> the hardware is not in control of IRR.
>>
>> Here and ...
>>
>>>> If it's not, then there
>>>> might be a wider set of problems (see e.g.
>>>> hvm_local_events_need_delivery()). But of course the adjustment
>>>> to hvm_vcpu_has_pending_irq() could also be to make the call
>>>> early only when v == current.
>>>
>>> I don't think we should be that restrictive, v == current ||
>>> !vcpu_runable(v) ought to be safe. I've also forgot to send my
>>> pre-patch to introduce an assert to that effect:
>>>
>>> https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg00635.html
>>>
>>>> A last question is that on the consequences of overly aggressive
>>>> sync-ing - that'll harm performance, but shouldn't affect
>>>> correctness if I'm not mistaken.
>>>
>>> That's correct, as long as the vcpu is the current one or it's not
>>> running.
>>
>> ... here I continue to be worried of races: Any check for a vCPU to
>> be non-running (or non-runnable) is stale the moment you inspect the
>> result of the check. Unless, of course, you suppress scheduling
>> (actions potentially making a vCPU runnable) during that time window.
> 
> Hm, it's indeed true that syncing PIR into IRR for a vCPU not running
> in the current pCPU is troublesome. Maybe syncing PIR into IRR should
> only be done when v == current?
> 
> The only alternative I can think of is something like:
> 
> if ( v != current )
>     vcpu_pause(v);
> sync_pir_irr(v);
> if ( v != current )
>     vcpu_unpause(v);
> 
> Is there a need to check the IRR of vCPUs that are not running, and
> does it matter if the IRR is not fully updated in that case?

That's one of the problems with function parameters decribed
"struct vcpu *v" - you don't know whether there's any expectation
that v == current at all times. And tracking down all uses of
certain functions can be rather tedious. IOW without quite a bit
of code auditing I'm afraid I can't tell whether there's possibly
some corner case where this might be needed.

Jan
Jan Beulich Nov. 18, 2019, 4:11 p.m. UTC | #10
On 18.11.2019 15:55, Roger Pau Monné  wrote:
> On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
>> On 18.11.2019 15:03, Roger Pau Monné  wrote:
>>> On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
>>>> On 18.11.2019 11:16, Roger Pau Monne wrote:
>>>>> @@ -1954,48 +1952,28 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>>>>>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
>>>>>       * context.
>>>>>       */
>>>>> -    if ( running && (in_irq() || (v != current)) )
>>>>> -    {
>>>>> +    if ( vcpu_runnable(v) && v != current )
>>>>
>>>> I'm afraid you need to be more careful with the running vs runnable
>>>> distinction here. The comment above here becomes stale with the
>>>> change (also wrt the removal of in_irq(), which I'm at least uneasy
>>>> about), and the new commentary below also largely says/assumes
>>>> "running", not "runnable".
>>>
>>> I've missed to fix that comment, will take care in the next version.
>>> Note also that the comment is quite pointless, it only states what the
>>> code below is supposed to do, but doesn't give any reasoning as to why
>>> in_irq is relevant here.
>>
>> It's main "value" is to refer to vcpu_kick(), which has ...
>>
>>> TBH I'm not sure of the point of the in_irq check, I don't think it's
>>> relevant for the code here.
>>
>> ... a similar in_irq() check. Sadly that one, while having a bigger
>> comment, also doesn't explain what it's needed for. It looks like I
>> should recall the reason, but I'm sorry - I don't right now.
> 
> By reading the message of the commit that introduced the in_irq check
> in vcpu_kick:
> 
> "The drawback is that {vmx,svm}_intr_assist() now races new event
> notifications delivered by IRQ or IPI. We close down this race by
> having vcpu_kick() send a dummy softirq -- this gets picked up in
> IRQ-sage context and will cause retry of *_intr_assist(). We avoid
> delivering the softirq where possible by avoiding it when we are
> running in the non-IRQ context of the VCPU to be kicked."
> 
> AFAICT in the vcpu_kick case this is done because the softirq should
> only be raised when in IRQ context in order to trigger the code in
> vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant
> if vcpu_kick is issued from an irq handler executed after
> vmx_intr_assist and before the disabling interrupts in
> vmx_do_vmentry.
> 
> I think we need something along the lines of:
> 
> if ( v->is_running && v != current )
>     send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
> else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) )
>     raise_softirq(VCPU_KICK_SOFTIRQ);
> 
> So that vmx_intr_assist is also retried if a vector is signaled in PIR
> on the vCPU currently running between the call to vmx_intr_assist and
> the disabling of interrupts in vmx_do_vmentry.

Looks plausible.

Jan
Roger Pau Monne Nov. 21, 2019, 9:26 a.m. UTC | #11
On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote:
> On 18.11.2019 15:20, Roger Pau Monné  wrote:
> > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
> >> On 18.11.2019 14:46, Roger Pau Monné  wrote:
> >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> >>>> On 18.11.2019 11:16, Roger Pau Monne wrote:
> >>>>> When using posted interrupts on Intel hardware it's possible that the
> >>>>> vCPU resumes execution with a stale local APIC IRR register because
> >>>>> depending on the interrupts to be injected vlapic_has_pending_irq
> >>>>> might not be called, and thus PIR won't be synced into IRR.
> >>>>>
> >>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> >>>>> regardless of what interrupts are pending.
> >>>>
> >>>> For this part, did you consider pulling ahead to the beginning
> >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> >>>
> >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a reason.
> >>> I could indeed move vlapic_has_pending_irq to the top, but then either
> >>> the result is discarded if for example a NMI is pending injection
> >>> (in which case there's no need to go through all the logic in
> >>> vlapic_has_pending_irq), or we invert the priority of event
> >>> injection.
> >>
> >> Changing the order of events injected is not an option afaict. The
> >> pointless processing done is a valid concern, yet the suggestion
> >> was specifically to have (part of) this processing to occur early.
> >> The discarding of the result, in turn, is not a problem afaict, as
> >> a subsequent call will return the same result (unless a higher
> >> priority interrupt has surfaced in the meantime).
> > 
> > Yes, that's fine. So you would prefer to move the call to
> > vlapic_has_pending_irq before any exit path in
> > hvm_vcpu_has_pending_irq?
> 
> "Prefer" isn't really the way I would put it. I'd like this to be
> considered as an alternative because, as said, I think the current
> placement look more like a plaster than a cure. I'm also open for
> other suggestions. But first of all I'd like to see what the VMX
> maintainers think.

Kevin/Jun, can we please get your opinion on the above item?

Thanks, Roger.
Tian, Kevin Nov. 25, 2019, 7:38 a.m. UTC | #12
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: Thursday, November 21, 2019 5:26 PM
> 
> On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote:
> > On 18.11.2019 15:20, Roger Pau Monné  wrote:
> > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
> > >> On 18.11.2019 14:46, Roger Pau Monné  wrote:
> > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> > >>>> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > >>>>> When using posted interrupts on Intel hardware it's possible that
> the
> > >>>>> vCPU resumes execution with a stale local APIC IRR register
> because
> > >>>>> depending on the interrupts to be injected vlapic_has_pending_irq
> > >>>>> might not be called, and thus PIR won't be synced into IRR.
> > >>>>>
> > >>>>> Fix this by making sure PIR is always synced to IRR in
> vmx_intr_assist
> > >>>>> regardless of what interrupts are pending.
> > >>>>
> > >>>> For this part, did you consider pulling ahead to the beginning
> > >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> > >>>
> > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a
> reason.
> > >>> I could indeed move vlapic_has_pending_irq to the top, but then
> either
> > >>> the result is discarded if for example a NMI is pending injection
> > >>> (in which case there's no need to go through all the logic in
> > >>> vlapic_has_pending_irq), or we invert the priority of event
> > >>> injection.
> > >>
> > >> Changing the order of events injected is not an option afaict. The
> > >> pointless processing done is a valid concern, yet the suggestion
> > >> was specifically to have (part of) this processing to occur early.
> > >> The discarding of the result, in turn, is not a problem afaict, as
> > >> a subsequent call will return the same result (unless a higher
> > >> priority interrupt has surfaced in the meantime).
> > >
> > > Yes, that's fine. So you would prefer to move the call to
> > > vlapic_has_pending_irq before any exit path in
> > > hvm_vcpu_has_pending_irq?
> >
> > "Prefer" isn't really the way I would put it. I'd like this to be
> > considered as an alternative because, as said, I think the current
> > placement look more like a plaster than a cure. I'm also open for
> > other suggestions. But first of all I'd like to see what the VMX
> > maintainers think.
> 
> Kevin/Jun, can we please get your opinion on the above item?

Give me some time to catch up...

Thanks
Kevin
Tian, Kevin Nov. 27, 2019, 2:05 a.m. UTC | #13
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Thursday, November 21, 2019 5:26 PM
> 
> On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote:
> > On 18.11.2019 15:20, Roger Pau Monné  wrote:
> > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
> > >> On 18.11.2019 14:46, Roger Pau Monné  wrote:
> > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> > >>>> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > >>>>> When using posted interrupts on Intel hardware it's possible that the
> > >>>>> vCPU resumes execution with a stale local APIC IRR register because
> > >>>>> depending on the interrupts to be injected vlapic_has_pending_irq
> > >>>>> might not be called, and thus PIR won't be synced into IRR.
> > >>>>>
> > >>>>> Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> > >>>>> regardless of what interrupts are pending.
> > >>>>
> > >>>> For this part, did you consider pulling ahead to the beginning
> > >>>> of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> > >>>
> > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a
> reason.
> > >>> I could indeed move vlapic_has_pending_irq to the top, but then either
> > >>> the result is discarded if for example a NMI is pending injection
> > >>> (in which case there's no need to go through all the logic in
> > >>> vlapic_has_pending_irq), or we invert the priority of event
> > >>> injection.
> > >>
> > >> Changing the order of events injected is not an option afaict. The
> > >> pointless processing done is a valid concern, yet the suggestion
> > >> was specifically to have (part of) this processing to occur early.
> > >> The discarding of the result, in turn, is not a problem afaict, as
> > >> a subsequent call will return the same result (unless a higher
> > >> priority interrupt has surfaced in the meantime).
> > >
> > > Yes, that's fine. So you would prefer to move the call to
> > > vlapic_has_pending_irq before any exit path in
> > > hvm_vcpu_has_pending_irq?
> >
> > "Prefer" isn't really the way I would put it. I'd like this to be
> > considered as an alternative because, as said, I think the current
> > placement look more like a plaster than a cure. I'm also open for
> > other suggestions. But first of all I'd like to see what the VMX
> > maintainers think.
> 
> Kevin/Jun, can we please get your opinion on the above item?
> 

putting the sync within hvm_vcpu_has_pending_irq sounds better,
implying that all intermediate states must be synced back to 
architectural states anytime when software wants to check virtual
interrupt.

Thanks
Kevin
Tian, Kevin Nov. 27, 2019, 2:07 a.m. UTC | #14
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Monday, November 18, 2019 10:56 PM
> 
> On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> > On 18.11.2019 15:03, Roger Pau Monné  wrote:
> > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
> > >> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > >>> @@ -1954,48 +1952,28 @@ static void
> __vmx_deliver_posted_interrupt(struct vcpu *v)
> > >>>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
> > >>>       * context.
> > >>>       */
> > >>> -    if ( running && (in_irq() || (v != current)) )
> > >>> -    {
> > >>> +    if ( vcpu_runnable(v) && v != current )
> > >>
> > >> I'm afraid you need to be more careful with the running vs runnable
> > >> distinction here. The comment above here becomes stale with the
> > >> change (also wrt the removal of in_irq(), which I'm at least uneasy
> > >> about), and the new commentary below also largely says/assumes
> > >> "running", not "runnable".
> > >
> > > I've missed to fix that comment, will take care in the next version.
> > > Note also that the comment is quite pointless, it only states what the
> > > code below is supposed to do, but doesn't give any reasoning as to why
> > > in_irq is relevant here.
> >
> > It's main "value" is to refer to vcpu_kick(), which has ...
> >
> > > TBH I'm not sure of the point of the in_irq check, I don't think it's
> > > relevant for the code here.
> >
> > ... a similar in_irq() check. Sadly that one, while having a bigger
> > comment, also doesn't explain what it's needed for. It looks like I
> > should recall the reason, but I'm sorry - I don't right now.
> 
> By reading the message of the commit that introduced the in_irq check
> in vcpu_kick:
> 
> "The drawback is that {vmx,svm}_intr_assist() now races new event
> notifications delivered by IRQ or IPI. We close down this race by
> having vcpu_kick() send a dummy softirq -- this gets picked up in
> IRQ-sage context and will cause retry of *_intr_assist(). We avoid
> delivering the softirq where possible by avoiding it when we are
> running in the non-IRQ context of the VCPU to be kicked."
> 
> AFAICT in the vcpu_kick case this is done because the softirq should
> only be raised when in IRQ context in order to trigger the code in
> vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant
> if vcpu_kick is issued from an irq handler executed after
> vmx_intr_assist and before the disabling interrupts in
> vmx_do_vmentry.
> 
> I think we need something along the lines of:
> 
> if ( v->is_running && v != current )
>     send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
> else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) )
>     raise_softirq(VCPU_KICK_SOFTIRQ);

Then what's the difference from original logic?

> 
> So that vmx_intr_assist is also retried if a vector is signaled in PIR
> on the vCPU currently running between the call to vmx_intr_assist and
> the disabling of interrupts in vmx_do_vmentry.
>
Roger Pau Monne Nov. 27, 2019, 11:03 a.m. UTC | #15
On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Monday, November 18, 2019 10:56 PM
> > 
> > On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> > > On 18.11.2019 15:03, Roger Pau Monné  wrote:
> > > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
> > > >> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > > >>> @@ -1954,48 +1952,28 @@ static void
> > __vmx_deliver_posted_interrupt(struct vcpu *v)
> > > >>>       * 2. The target vCPU is the current vCPU and we're in non-interrupt
> > > >>>       * context.
> > > >>>       */
> > > >>> -    if ( running && (in_irq() || (v != current)) )
> > > >>> -    {
> > > >>> +    if ( vcpu_runnable(v) && v != current )
> > > >>
> > > >> I'm afraid you need to be more careful with the running vs runnable
> > > >> distinction here. The comment above here becomes stale with the
> > > >> change (also wrt the removal of in_irq(), which I'm at least uneasy
> > > >> about), and the new commentary below also largely says/assumes
> > > >> "running", not "runnable".
> > > >
> > > > I've missed to fix that comment, will take care in the next version.
> > > > Note also that the comment is quite pointless, it only states what the
> > > > code below is supposed to do, but doesn't give any reasoning as to why
> > > > in_irq is relevant here.
> > >
> > > It's main "value" is to refer to vcpu_kick(), which has ...
> > >
> > > > TBH I'm not sure of the point of the in_irq check, I don't think it's
> > > > relevant for the code here.
> > >
> > > ... a similar in_irq() check. Sadly that one, while having a bigger
> > > comment, also doesn't explain what it's needed for. It looks like I
> > > should recall the reason, but I'm sorry - I don't right now.
> > 
> > By reading the message of the commit that introduced the in_irq check
> > in vcpu_kick:
> > 
> > "The drawback is that {vmx,svm}_intr_assist() now races new event
> > notifications delivered by IRQ or IPI. We close down this race by
> > having vcpu_kick() send a dummy softirq -- this gets picked up in
> > IRQ-sage context and will cause retry of *_intr_assist(). We avoid
> > delivering the softirq where possible by avoiding it when we are
> > running in the non-IRQ context of the VCPU to be kicked."
> > 
> > AFAICT in the vcpu_kick case this is done because the softirq should
> > only be raised when in IRQ context in order to trigger the code in
> > vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant
> > if vcpu_kick is issued from an irq handler executed after
> > vmx_intr_assist and before the disabling interrupts in
> > vmx_do_vmentry.
> > 
> > I think we need something along the lines of:
> > 
> > if ( v->is_running && v != current )
> >     send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
> > else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) )
> >     raise_softirq(VCPU_KICK_SOFTIRQ);
> 
> Then what's the difference from original logic?

The original logic is:

if ( running && (in_irq() || (v != current)) )
{
        unsigned int cpu = v->processor;

        if ( cpu != smp_processor_id() )
            send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
        else if ( !softirq_pending(cpu) )
            raise_softirq(VCPU_KICK_SOFTIRQ);
}

Which I find much harder to understand. For example I'm not sure of
what's the benefit of doing the cpu != smp_processor_id() check
instead of simply doing v != current (like in the outer if condition).
My suggestion removes one level of nesting and IMO makes the condition
clearer, but maybe that's just my taste.

Also the original comments don't mention at all why a softirq should
be raised if v == current && in_irq, and it took me some time to
figure out why that's required. My proposed change clarifies why this
is needed, and also makes it more obvious that the softirq will only
be raised when in irq context.

Anyway, I'm not going to insist and will drop the change if it's not
deemed useful.

Thanks, Roger.
Jan Beulich Nov. 27, 2019, 11:16 a.m. UTC | #16
On 27.11.2019 12:03, Roger Pau Monné  wrote:
> On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote:
>> Then what's the difference from original logic?
> 
> The original logic is:
> 
> if ( running && (in_irq() || (v != current)) )
> {
>         unsigned int cpu = v->processor;
> 
>         if ( cpu != smp_processor_id() )
>             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>         else if ( !softirq_pending(cpu) )
>             raise_softirq(VCPU_KICK_SOFTIRQ);
> }
> 
> Which I find much harder to understand. For example I'm not sure of
> what's the benefit of doing the cpu != smp_processor_id() check
> instead of simply doing v != current (like in the outer if condition).

There are two aspects to consider: One is that v->processor
may equal smp_processor_id() also for v != current. The other
is that without this check in the if() it would need adding
to the else-if(). I'm not sure to what degree which of the
two matters functionality wise.

Jan
Roger Pau Monne Nov. 27, 2019, 11:29 a.m. UTC | #17
On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote:
> On 27.11.2019 12:03, Roger Pau Monné  wrote:
> > On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote:
> >> Then what's the difference from original logic?
> > 
> > The original logic is:
> > 
> > if ( running && (in_irq() || (v != current)) )
> > {
> >         unsigned int cpu = v->processor;
> > 
> >         if ( cpu != smp_processor_id() )
> >             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >         else if ( !softirq_pending(cpu) )
> >             raise_softirq(VCPU_KICK_SOFTIRQ);
> > }
> > 
> > Which I find much harder to understand. For example I'm not sure of
> > what's the benefit of doing the cpu != smp_processor_id() check
> > instead of simply doing v != current (like in the outer if condition).
> 
> There are two aspects to consider: One is that v->processor
> may equal smp_processor_id() also for v != current. The other
> is that without this check in the if() it would need adding
> to the else-if(). I'm not sure to what degree which of the
> two matters functionality wise.

Since the vCPU is running v->processor can only equal smp_processor_id
if v == current, and hence I think both checks achieve exactly the
same end result, it's just that IMO doing the outer one with v !=
current and the inner one with cpu != smp_processor_id() is confusing.

Maybe I'm missing something else that actually requires doing the
inner check with v->processor and smp_processor_id().

Roger.
Jan Beulich Nov. 27, 2019, 11:34 a.m. UTC | #18
On 27.11.2019 12:29, Roger Pau Monné  wrote:
> On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote:
>> On 27.11.2019 12:03, Roger Pau Monné  wrote:
>>> On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote:
>>>> Then what's the difference from original logic?
>>>
>>> The original logic is:
>>>
>>> if ( running && (in_irq() || (v != current)) )
>>> {
>>>         unsigned int cpu = v->processor;
>>>
>>>         if ( cpu != smp_processor_id() )
>>>             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>         else if ( !softirq_pending(cpu) )
>>>             raise_softirq(VCPU_KICK_SOFTIRQ);
>>> }
>>>
>>> Which I find much harder to understand. For example I'm not sure of
>>> what's the benefit of doing the cpu != smp_processor_id() check
>>> instead of simply doing v != current (like in the outer if condition).
>>
>> There are two aspects to consider: One is that v->processor
>> may equal smp_processor_id() also for v != current. The other
>> is that without this check in the if() it would need adding
>> to the else-if(). I'm not sure to what degree which of the
>> two matters functionality wise.
> 
> Since the vCPU is running v->processor can only equal smp_processor_id
> if v == current,

What tells you that it is running? It had been running at the
time the flag was latched (before vcpu_unblock()), but may
have got de-scheduled in the meantime.

Jan
Roger Pau Monne Nov. 27, 2019, 11:53 a.m. UTC | #19
On Wed, Nov 27, 2019 at 12:34:09PM +0100, Jan Beulich wrote:
> On 27.11.2019 12:29, Roger Pau Monné  wrote:
> > On Wed, Nov 27, 2019 at 12:16:37PM +0100, Jan Beulich wrote:
> >> On 27.11.2019 12:03, Roger Pau Monné  wrote:
> >>> On Wed, Nov 27, 2019 at 02:07:16AM +0000, Tian, Kevin wrote:
> >>>> Then what's the difference from original logic?
> >>>
> >>> The original logic is:
> >>>
> >>> if ( running && (in_irq() || (v != current)) )
> >>> {
> >>>         unsigned int cpu = v->processor;
> >>>
> >>>         if ( cpu != smp_processor_id() )
> >>>             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >>>         else if ( !softirq_pending(cpu) )
> >>>             raise_softirq(VCPU_KICK_SOFTIRQ);
> >>> }
> >>>
> >>> Which I find much harder to understand. For example I'm not sure of
> >>> what's the benefit of doing the cpu != smp_processor_id() check
> >>> instead of simply doing v != current (like in the outer if condition).
> >>
> >> There are two aspects to consider: One is that v->processor
> >> may equal smp_processor_id() also for v != current. The other
> >> is that without this check in the if() it would need adding
> >> to the else-if(). I'm not sure to what degree which of the
> >> two matters functionality wise.
> > 
> > Since the vCPU is running v->processor can only equal smp_processor_id
> > if v == current,
> 
> What tells you that it is running? It had been running at the
> time the flag was latched (before vcpu_unblock()), but may
> have got de-scheduled in the meantime.

Right, but if it's not running then it doesn't really matter that we
send an IPI or raise a softirq, the PIR to IRR sync will happen anyway
before the vCPU is resumed.

Thanks, Roger.

Patch
diff mbox series

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 0d097cf1f2..ce70f4bc75 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -232,6 +232,14 @@  void vmx_intr_assist(void)
     enum hvm_intblk intblk;
     int pt_vector;
 
+    if ( cpu_has_vmx_posted_intr_processing )
+        /*
+         * Always force PIR to be synced to IRR before vmentry, this is also
+         * done by vlapic_has_pending_irq but it's possible other pending
+         * interrupts prevent the execution of that function.
+         */
+        vmx_sync_pir_to_irr(v);
+
     /* Block event injection when single step with MTF. */
     if ( unlikely(v->arch.hvm.single_step) )
     {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 33e68eaddf..82a1b972c5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1945,8 +1945,6 @@  static void vmx_process_isr(int isr, struct vcpu *v)
 
 static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 {
-    bool_t running = v->is_running;
-
     vcpu_unblock(v);
     /*
      * Just like vcpu_kick(), nothing is needed for the following two cases:
@@ -1954,48 +1952,28 @@  static void __vmx_deliver_posted_interrupt(struct vcpu *v)
      * 2. The target vCPU is the current vCPU and we're in non-interrupt
      * context.
      */
-    if ( running && (in_irq() || (v != current)) )
-    {
+    if ( vcpu_runnable(v) && v != current )
         /*
-         * Note: Only two cases will reach here:
-         * 1. The target vCPU is running on other pCPU.
-         * 2. The target vCPU is the current vCPU.
+         * If the vCPU is running on another pCPU send an IPI to the pCPU. When
+         * the IPI arrives, the target vCPU may be running in non-root mode,
+         * running in root mode, runnable or blocked. If the target vCPU is
+         * running in non-root mode, the hardware will sync PIR to vIRR for
+         * 'posted_intr_vector' is a special vector handled directly by the
+         * hardware.
          *
-         * Note2: Don't worry the v->processor may change. The vCPU being
-         * moved to another processor is guaranteed to sync PIR to vIRR,
-         * due to the involved scheduling cycle.
+         * If the target vCPU is running in root-mode, the interrupt handler
+         * starts to run.  Considering an IPI may arrive in the window between
+         * the call to vmx_intr_assist() and interrupts getting disabled, the
+         * interrupt handler should raise a softirq to ensure events will be
+         * delivered in time.
          */
-        unsigned int cpu = v->processor;
+        send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
 
-        /*
-         * For case 1, we send an IPI to the pCPU. When an IPI arrives, the
-         * target vCPU maybe is running in non-root mode, running in root
-         * mode, runnable or blocked. If the target vCPU is running in
-         * non-root mode, the hardware will sync PIR to vIRR for
-         * 'posted_intr_vector' is special to the pCPU. If the target vCPU is
-         * running in root-mode, the interrupt handler starts to run.
-         * Considering an IPI may arrive in the window between the call to
-         * vmx_intr_assist() and interrupts getting disabled, the interrupt
-         * handler should raise a softirq to ensure events will be delivered
-         * in time. If the target vCPU is runnable, it will sync PIR to
-         * vIRR next time it is chose to run. In this case, a IPI and a
-         * softirq is sent to a wrong vCPU which will not have any adverse
-         * effect. If the target vCPU is blocked, since vcpu_block() checks
-         * whether there is an event to be delivered through
-         * local_events_need_delivery() just after blocking, the vCPU must
-         * have synced PIR to vIRR. Similarly, there is a IPI and a softirq
-         * sent to a wrong vCPU.
-         */
-        if ( cpu != smp_processor_id() )
-            send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
-        /*
-         * For case 2, raising a softirq ensures PIR will be synced to vIRR.
-         * As any softirq will do, as an optimization we only raise one if
-         * none is pending already.
-         */
-        else if ( !softirq_pending(cpu) )
-            raise_softirq(VCPU_KICK_SOFTIRQ);
-    }
+    /*
+     * If the vCPU is not runnable or if it's the one currently running in this
+     * pCPU there's nothing to do, the vmentry code will already sync the PIR
+     * to IRR when resuming execution.
+     */
 }
 
 static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
@@ -2048,7 +2026,7 @@  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
     vcpu_kick(v);
 }
 
-static void vmx_sync_pir_to_irr(struct vcpu *v)
+void vmx_sync_pir_to_irr(struct vcpu *v)
 {
     struct vlapic *vlapic = vcpu_vlapic(v);
     unsigned int group, i;
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index ebaa74449b..c43fab7c4f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -101,6 +101,7 @@  void vmx_update_debug_state(struct vcpu *v);
 void vmx_update_exception_bitmap(struct vcpu *v);
 void vmx_update_cpu_exec_control(struct vcpu *v);
 void vmx_update_secondary_exec_control(struct vcpu *v);
+void vmx_sync_pir_to_irr(struct vcpu *v);
 
 #define POSTED_INTR_ON  0
 #define POSTED_INTR_SN  1