[2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
diff mbox series

Message ID 20200320190737.42110-3-roger.pau@citrix.com
State New
Headers show
Series
  • x86/nvmx: attempt to fix interrupt injection
Related show

Commit Message

Roger Pau Monné March 20, 2020, 7:07 p.m. UTC
The current usage of nvmx_update_apicv is not clear: it is deeply
intertwined with the Ack interrupt on exit VMEXIT control.

The code in nvmx_update_apicv should update the SVI (in service interrupt)
field of the guest interrupt status only when the Ack interrupt on
exit is set, as it is used to record that the interrupt being
serviced is signaled in a vmcs field, and hence hasn't been injected
as on native. It's important to record the current in service
interrupt on the guest interrupt status field, or else further
interrupts won't respect the priority of the in service one.

While clarifying the usage make sure that the SVI is only updated when
Ack on exit is set and the nested vmcs interrupt info field is valid. Or
else a guest not using the Ack on exit feature would loose interrupts as
they would be signaled as being in service on the guest interrupt
status field but won't actually be recorded on the interrupt info vmcs
field, neither injected in any other way.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Tian, Kevin March 24, 2020, 6:03 a.m. UTC | #1
> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Saturday, March 21, 2020 3:08 AM
> 
> The current usage of nvmx_update_apicv is not clear: it is deeply
> intertwined with the Ack interrupt on exit VMEXIT control.
> 
> The code in nvmx_update_apicv should update the SVI (in service interrupt)
> field of the guest interrupt status only when the Ack interrupt on
> exit is set, as it is used to record that the interrupt being
> serviced is signaled in a vmcs field, and hence hasn't been injected
> as on native. It's important to record the current in service
> interrupt on the guest interrupt status field, or else further
> interrupts won't respect the priority of the in service one.
> 
> While clarifying the usage make sure that the SVI is only updated when
> Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> else a guest not using the Ack on exit feature would loose interrupts as
> they would be signaled as being in service on the guest interrupt
> status field but won't actually be recorded on the interrupt info vmcs
> field, neither injected in any other way.

It is insufficient. You also need to update RVI to enable virtual injection
when Ack on exit is cleared.

> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 1b8461ba30..180d01e385 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1383,7 +1383,7 @@ static void nvmx_update_apicv(struct vcpu *v)
>  {
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>      unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> -    uint32_t intr_info = nvmx->intr.intr_info;
> +    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);

well, above reminds me an interesting question. Combining last and this
patch, we'd see essentially that it goes back to the state before f96e1469
(at least when Ack on exit is true). iirc, that commit was introduced to enable
nested x2apic with apicv, and your very first version even just removed
the whole nvmx_update_apicv. Then now with the new reverted logic,
are you still suffering x2apic problem? If not, does it imply the real fix
is actually coming from patch 3/3 for eoi bitmap update?

> 
>      if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>           nvmx->intr.source == hvm_intsrc_lapic &&
> @@ -1399,6 +1399,15 @@ static void nvmx_update_apicv(struct vcpu *v)
>          ppr = vlapic_set_ppr(vlapic);
>          WARN_ON((ppr & 0xf0) != (vector & 0xf0));
> 
> +        /*
> +         * SVI must be updated when the interrupt has been signaled using the
> +         * Ack on exit feature, or else the currently in-service interrupt
> +         * won't be respected.
> +         *
> +         * Note that this is specific to the fact that when doing a VMEXIT an
> +         * interrupt might get delivered using the interrupt info vmcs field
> +         * instead of being injected normally.
> +         */

I'm not sure mentioning SVI specifically here is necessary, since all steps
here are required - updating iir, isr, rvi, svi, ppr, etc. It is just an emulation
of updating virtual APIC state as if a virtual interrupt delivery happens.

>          status = vector << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>          rvi = vlapic_has_pending_irq(v);
>          if ( rvi != -1 )
> --
> 2.25.0
Roger Pau Monné March 24, 2020, 9:50 a.m. UTC | #2
On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > Sent: Saturday, March 21, 2020 3:08 AM
> > 
> > The current usage of nvmx_update_apicv is not clear: it is deeply
> > intertwined with the Ack interrupt on exit VMEXIT control.
> > 
> > The code in nvmx_update_apicv should update the SVI (in service interrupt)
> > field of the guest interrupt status only when the Ack interrupt on
> > exit is set, as it is used to record that the interrupt being
> > serviced is signaled in a vmcs field, and hence hasn't been injected
> > as on native. It's important to record the current in service
> > interrupt on the guest interrupt status field, or else further
> > interrupts won't respect the priority of the in service one.
> > 
> > While clarifying the usage make sure that the SVI is only updated when
> > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > else a guest not using the Ack on exit feature would loose interrupts as
> > they would be signaled as being in service on the guest interrupt
> > status field but won't actually be recorded on the interrupt info vmcs
> > field, neither injected in any other way.
> 
> It is insufficient. You also need to update RVI to enable virtual injection
> when Ack on exit is cleared.

But RVI should be updated in vmx_intr_assist in that case, since
nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
handled normally.

> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> > index 1b8461ba30..180d01e385 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> >  {
> >      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> >      unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > -    uint32_t intr_info = nvmx->intr.intr_info;
> > +    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> 
> well, above reminds me an interesting question. Combining last and this
> patch, we'd see essentially that it goes back to the state before f96e1469
> (at least when Ack on exit is true).

Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
gets us to that state.

This patch is an attempt to clarify that nvmx_update_apicv is closely
related to the Ack on exit feature, as it modifies SVI in order to
signal the vector currently being serviced by the EXIT_INTR_INFO vmcs
field. This was not obvious to me, as at first sight I assumed
nvmx_update_apicv was actually injecting that vector into the guest.

> iirc, that commit was introduced to enable
> nested x2apic with apicv, and your very first version even just removed
> the whole nvmx_update_apicv. Then now with the new reverted logic,
> are you still suffering x2apic problem? If not, does it imply the real fix
> is actually coming from patch 3/3 for eoi bitmap update?

Yes, patch 3/3 is the one that fixed the issue. Note however that
strangely enough removing the call to nvmx_update_apicv (as my first
attempt to solve the issue) did fix it on one of my boxes. It depends
a lot on the available vmx features AFAICT.

> > 
> >      if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> >           nvmx->intr.source == hvm_intsrc_lapic &&
> > @@ -1399,6 +1399,15 @@ static void nvmx_update_apicv(struct vcpu *v)
> >          ppr = vlapic_set_ppr(vlapic);
> >          WARN_ON((ppr & 0xf0) != (vector & 0xf0));
> > 
> > +        /*
> > +         * SVI must be updated when the interrupt has been signaled using the
> > +         * Ack on exit feature, or else the currently in-service interrupt
> > +         * won't be respected.
> > +         *
> > +         * Note that this is specific to the fact that when doing a VMEXIT an
> > +         * interrupt might get delivered using the interrupt info vmcs field
> > +         * instead of being injected normally.
> > +         */
> 
> I'm not sure mentioning SVI specifically here is necessary, since all steps
> here are required - updating iir, isr, rvi, svi, ppr, etc. It is just an emulation
> of updating virtual APIC state as if a virtual interrupt delivery happens.

Hm, it was hard for me to figure out that SVI is modified here in
order to signal that the Ack on exit vector is currently in service,
as opposed to being injected using the virtual interrupt delivery
mechanism.

I just wanted to clarify that the purpose of this function is not to
inject the vector in intr_info, but rather to signal that such vector
has already been injected using a different mechanism.

I'm happy to reword this, but IMO it should be clear that the purpose
of the function is not so much to inject an interrupt, but to update
the virtual interrupt delivery field in order to signal that an
interrupt has been injected using a different mechanism.

Thanks, Roger.
Tian, Kevin March 24, 2020, 10:11 a.m. UTC | #3
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Tuesday, March 24, 2020 5:51 PM
> 
> On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > Sent: Saturday, March 21, 2020 3:08 AM
> > >
> > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > intertwined with the Ack interrupt on exit VMEXIT control.
> > >
> > > The code in nvmx_update_apicv should update the SVI (in service
> interrupt)
> > > field of the guest interrupt status only when the Ack interrupt on
> > > exit is set, as it is used to record that the interrupt being
> > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > as on native. It's important to record the current in service
> > > interrupt on the guest interrupt status field, or else further
> > > interrupts won't respect the priority of the in service one.
> > >
> > > While clarifying the usage make sure that the SVI is only updated when
> > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > else a guest not using the Ack on exit feature would loose interrupts as
> > > they would be signaled as being in service on the guest interrupt
> > > status field but won't actually be recorded on the interrupt info vmcs
> > > field, neither injected in any other way.
> >
> > It is insufficient. You also need to update RVI to enable virtual injection
> > when Ack on exit is cleared.
> 
> But RVI should be updated in vmx_intr_assist in that case, since
> nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> handled normally.

As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest.
It is incorrectly to update RVI at that time since it might be still vmcs02 being 
active (if no pending softirq to make it invoked again).

Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:

        if ( intack.source == hvm_intsrc_pic ||
                 intack.source == hvm_intsrc_lapic )
        {
            vmx_inject_extint(intack.vector, intack.source);

            ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
            if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
            {
                /* for now, duplicate the ack path in vmx_intr_assist */
                hvm_vcpu_ack_pending_irq(v, intack);
                pt_intr_post(v, intack);

                intack = hvm_vcpu_has_pending_irq(v);
                if ( unlikely(intack.source != hvm_intsrc_none) )
                    vmx_enable_intr_window(v, intack);
            }
            else if ( !cpu_has_vmx_virtual_intr_delivery )
                vmx_enable_intr_window(v, intack);

            return 1; <<<<<<<<
        }

> 
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> b/xen/arch/x86/hvm/vmx/vvmx.c
> > > index 1b8461ba30..180d01e385 100644
> > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > >  {
> > >      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> > >      unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > -    uint32_t intr_info = nvmx->intr.intr_info;
> > > +    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> >
> > well, above reminds me an interesting question. Combining last and this
> > patch, we'd see essentially that it goes back to the state before f96e1469
> > (at least when Ack on exit is true).
> 
> Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> gets us to that state.

you are right. I just wanted to point out that this patch alone doesn't
do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0
is skipped from that function. So it won't be the one fixing your previous
problem. 
Roger Pau Monné March 24, 2020, 11:22 a.m. UTC | #4
On Tue, Mar 24, 2020 at 10:11:15AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Tuesday, March 24, 2020 5:51 PM
> > 
> > On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > >
> > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > >
> > > > The code in nvmx_update_apicv should update the SVI (in service
> > interrupt)
> > > > field of the guest interrupt status only when the Ack interrupt on
> > > > exit is set, as it is used to record that the interrupt being
> > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > as on native. It's important to record the current in service
> > > > interrupt on the guest interrupt status field, or else further
> > > > interrupts won't respect the priority of the in service one.
> > > >
> > > > While clarifying the usage make sure that the SVI is only updated when
> > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > > else a guest not using the Ack on exit feature would loose interrupts as
> > > > they would be signaled as being in service on the guest interrupt
> > > > status field but won't actually be recorded on the interrupt info vmcs
> > > > field, neither injected in any other way.
> > >
> > > It is insufficient. You also need to update RVI to enable virtual injection
> > > when Ack on exit is cleared.
> > 
> > But RVI should be updated in vmx_intr_assist in that case, since
> > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > handled normally.
> 
> As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest.
> 
> It is incorrectly to update RVI at that time since it might be still vmcs02 being 
> active (if no pending softirq to make it invoked again).
> 
> Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> 
>         if ( intack.source == hvm_intsrc_pic ||
>                  intack.source == hvm_intsrc_lapic )
>         {
>             vmx_inject_extint(intack.vector, intack.source);
> 
>             ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
>             if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
>             {
>                 /* for now, duplicate the ack path in vmx_intr_assist */
>                 hvm_vcpu_ack_pending_irq(v, intack);
>                 pt_intr_post(v, intack);
> 
>                 intack = hvm_vcpu_has_pending_irq(v);
>                 if ( unlikely(intack.source != hvm_intsrc_none) )
>                     vmx_enable_intr_window(v, intack);
>             }
>             else if ( !cpu_has_vmx_virtual_intr_delivery )
>                 vmx_enable_intr_window(v, intack);
> 
>             return 1; <<<<<<<<

Right, I always get confused by the switch happening in the vmentry
path.

That only happens when the vcpu is in nested mode
(nestedhvm_vcpu_in_guestmode == true). That would be the case before a
vmexit, at least for the first call to vmx_intr_assist if there are
pending softirqs.

Note that if there are pending softirqs the second time
nvmx_intr_intercept will return 0.

>         }
> 
> > 
> > > >
> > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > ---
> > > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> > b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > index 1b8461ba30..180d01e385 100644
> > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > > >  {
> > > >      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> > > >      unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > > -    uint32_t intr_info = nvmx->intr.intr_info;
> > > > +    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> > >
> > > well, above reminds me an interesting question. Combining last and this
> > > patch, we'd see essentially that it goes back to the state before f96e1469
> > > (at least when Ack on exit is true).
> > 
> > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> > gets us to that state.
> 
> you are right. I just wanted to point out that this patch alone doesn't
> do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0
> is skipped from that function. So it won't be the one fixing your previous
> problem. 
Tian, Kevin March 24, 2020, 11:33 a.m. UTC | #5
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Tuesday, March 24, 2020 7:23 PM
> 
> On Tue, Mar 24, 2020 at 10:11:15AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: Tuesday, March 24, 2020 5:51 PM
> > >
> > > On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > > >
> > > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > > >
> > > > > The code in nvmx_update_apicv should update the SVI (in service
> > > interrupt)
> > > > > field of the guest interrupt status only when the Ack interrupt on
> > > > > exit is set, as it is used to record that the interrupt being
> > > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > > as on native. It's important to record the current in service
> > > > > interrupt on the guest interrupt status field, or else further
> > > > > interrupts won't respect the priority of the in service one.
> > > > >
> > > > > While clarifying the usage make sure that the SVI is only updated
> when
> > > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > > > else a guest not using the Ack on exit feature would loose interrupts
> as
> > > > > they would be signaled as being in service on the guest interrupt
> > > > > status field but won't actually be recorded on the interrupt info vmcs
> > > > > field, neither injected in any other way.
> > > >
> > > > It is insufficient. You also need to update RVI to enable virtual injection
> > > > when Ack on exit is cleared.
> > >
> > > But RVI should be updated in vmx_intr_assist in that case, since
> > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > > handled normally.
> >
> > As we discussed before, vmx_intr_assist is invoked before
> nvmx_switch_guest.
> >
> > It is incorrectly to update RVI at that time since it might be still vmcs02
> being
> > active (if no pending softirq to make it invoked again).
> >
> > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> >
> >         if ( intack.source == hvm_intsrc_pic ||
> >                  intack.source == hvm_intsrc_lapic )
> >         {
> >             vmx_inject_extint(intack.vector, intack.source);
> >
> >             ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
> >             if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
> >             {
> >                 /* for now, duplicate the ack path in vmx_intr_assist */
> >                 hvm_vcpu_ack_pending_irq(v, intack);
> >                 pt_intr_post(v, intack);
> >
> >                 intack = hvm_vcpu_has_pending_irq(v);
> >                 if ( unlikely(intack.source != hvm_intsrc_none) )
> >                     vmx_enable_intr_window(v, intack);
> >             }
> >             else if ( !cpu_has_vmx_virtual_intr_delivery )
> >                 vmx_enable_intr_window(v, intack);
> >
> >             return 1; <<<<<<<<
> 
> Right, I always get confused by the switch happening in the vmentry
> path.
> 
> That only happens when the vcpu is in nested mode
> (nestedhvm_vcpu_in_guestmode == true). That would be the case before a
> vmexit, at least for the first call to vmx_intr_assist if there are
> pending softirqs.
> 
> Note that if there are pending softirqs the second time
> nvmx_intr_intercept will return 0.

Exactly.

> 
> >         }
> >
> > >
> > > > >
> > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > > ---
> > > > >  xen/arch/x86/hvm/vmx/vvmx.c | 11 ++++++++++-
> > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> > > b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > index 1b8461ba30..180d01e385 100644
> > > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v)
> > > > >  {
> > > > >      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> > > > >      unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> > > > > -    uint32_t intr_info = nvmx->intr.intr_info;
> > > > > +    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> > > >
> > > > well, above reminds me an interesting question. Combining last and this
> > > > patch, we'd see essentially that it goes back to the state before
> f96e1469
> > > > (at least when Ack on exit is true).
> > >
> > > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469
> > > gets us to that state.
> >
> > you are right. I just wanted to point out that this patch alone doesn't
> > do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0
> > is skipped from that function. So it won't be the one fixing your previous
> > problem. 
Roger Pau Monné March 24, 2020, 12:18 p.m. UTC | #6
On Tue, Mar 24, 2020 at 11:33:00AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Tuesday, March 24, 2020 7:23 PM
> > 
> > On Tue, Mar 24, 2020 at 10:11:15AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > > Sent: Tuesday, March 24, 2020 5:51 PM
> > > >
> > > > On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > > > >
> > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > > > >
> > > > > > The code in nvmx_update_apicv should update the SVI (in service
> > > > interrupt)
> > > > > > field of the guest interrupt status only when the Ack interrupt on
> > > > > > exit is set, as it is used to record that the interrupt being
> > > > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > > > as on native. It's important to record the current in service
> > > > > > interrupt on the guest interrupt status field, or else further
> > > > > > interrupts won't respect the priority of the in service one.
> > > > > >
> > > > > > While clarifying the usage make sure that the SVI is only updated
> > when
> > > > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or
> > > > > > else a guest not using the Ack on exit feature would loose interrupts
> > as
> > > > > > they would be signaled as being in service on the guest interrupt
> > > > > > status field but won't actually be recorded on the interrupt info vmcs
> > > > > > field, neither injected in any other way.
> > > > >
> > > > > It is insufficient. You also need to update RVI to enable virtual injection
> > > > > when Ack on exit is cleared.
> > > >
> > > > But RVI should be updated in vmx_intr_assist in that case, since
> > > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > > > handled normally.
> > >
> > > As we discussed before, vmx_intr_assist is invoked before
> > nvmx_switch_guest.
> > >
> > > It is incorrectly to update RVI at that time since it might be still vmcs02
> > being
> > > active (if no pending softirq to make it invoked again).
> > >
> > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> > >
> > >         if ( intack.source == hvm_intsrc_pic ||
> > >                  intack.source == hvm_intsrc_lapic )

I've realized that nvmx_intr_intercept will return 1 for interrupts
originating from the lapic or the pic, while nvmx_update_apicv will
only update GUEST_INTR_STATUS for interrupts originating from the
lapic. Is this correct?

Shouldn't both be in sync and handle the same interrupt sources?

Thanks, Roger.
Tian, Kevin March 26, 2020, 5:49 a.m. UTC | #7
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Tuesday, March 24, 2020 8:19 PM
> 
> On Tue, Mar 24, 2020 at 11:33:00AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: Tuesday, March 24, 2020 7:23 PM
> > >
> > > On Tue, Mar 24, 2020 at 10:11:15AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > > > Sent: Tuesday, March 24, 2020 5:51 PM
> > > > >
> > > > > On Tue, Mar 24, 2020 at 06:03:28AM +0000, Tian, Kevin wrote:
> > > > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > > Sent: Saturday, March 21, 2020 3:08 AM
> > > > > > >
> > > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply
> > > > > > > intertwined with the Ack interrupt on exit VMEXIT control.
> > > > > > >
> > > > > > > The code in nvmx_update_apicv should update the SVI (in service
> > > > > interrupt)
> > > > > > > field of the guest interrupt status only when the Ack interrupt on
> > > > > > > exit is set, as it is used to record that the interrupt being
> > > > > > > serviced is signaled in a vmcs field, and hence hasn't been injected
> > > > > > > as on native. It's important to record the current in service
> > > > > > > interrupt on the guest interrupt status field, or else further
> > > > > > > interrupts won't respect the priority of the in service one.
> > > > > > >
> > > > > > > While clarifying the usage make sure that the SVI is only updated
> > > when
> > > > > > > Ack on exit is set and the nested vmcs interrupt info field is valid.
> Or
> > > > > > > else a guest not using the Ack on exit feature would loose
> interrupts
> > > as
> > > > > > > they would be signaled as being in service on the guest interrupt
> > > > > > > status field but won't actually be recorded on the interrupt info
> vmcs
> > > > > > > field, neither injected in any other way.
> > > > > >
> > > > > > It is insufficient. You also need to update RVI to enable virtual
> injection
> > > > > > when Ack on exit is cleared.
> > > > >
> > > > > But RVI should be updated in vmx_intr_assist in that case, since
> > > > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be
> > > > > handled normally.
> > > >
> > > > As we discussed before, vmx_intr_assist is invoked before
> > > nvmx_switch_guest.
> > > >
> > > > It is incorrectly to update RVI at that time since it might be still vmcs02
> > > being
> > > > active (if no pending softirq to make it invoked again).
> > > >
> > > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case:
> > > >
> > > >         if ( intack.source == hvm_intsrc_pic ||
> > > >                  intack.source == hvm_intsrc_lapic )
> 
> I've realized that nvmx_intr_intercept will return 1 for interrupts
> originating from the lapic or the pic, while nvmx_update_apicv will
> only update GUEST_INTR_STATUS for interrupts originating from the
> lapic. Is this correct?

Isn't apicv for virtual lapic instead of virtual pic?

> 
> Shouldn't both be in sync and handle the same interrupt sources?
> 

But I do realize one potential issue about 67f9d0b9, which may break
vpic delivery when ack-on-exit is off. We should always use interrupt
window to handle that situation for vpic. Sorry I didn't catch it when
proposing that change...

Thanks
Kevin

Patch
diff mbox series

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1b8461ba30..180d01e385 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1383,7 +1383,7 @@  static void nvmx_update_apicv(struct vcpu *v)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
-    uint32_t intr_info = nvmx->intr.intr_info;
+    unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
 
     if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
          nvmx->intr.source == hvm_intsrc_lapic &&
@@ -1399,6 +1399,15 @@  static void nvmx_update_apicv(struct vcpu *v)
         ppr = vlapic_set_ppr(vlapic);
         WARN_ON((ppr & 0xf0) != (vector & 0xf0));
 
+        /*
+         * SVI must be updated when the interrupt has been signaled using the
+         * Ack on exit feature, or else the currently in-service interrupt
+         * won't be respected.
+         *
+         * Note that this is specific to the fact that when doing a VMEXIT an
+         * interrupt might get delivered using the interrupt info vmcs field
+         * instead of being injected normally.
+         */
         status = vector << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
         rvi = vlapic_has_pending_irq(v);
         if ( rvi != -1 )