[1/2] nvmx: fix handling of interrupts

Message ID	20200108103857.77236-2-roger.pau@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=xJ9Z=25=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3286420705 Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of roger.pau@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="roger.pau@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of roger.pau@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="roger.pau@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="roger.pau@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: 7jiqfmdepobmiVldalbVvA7XJ2NiDBNqSr9cq58kdW8Cdcj57QGOgPZSqS+OxEzuvFdbiOzHtu 7zdCjOkjsbIW78iZl2PqrBNHI+/lfwTp35o2gb/iNBfVK/KGOvGWOzvlo66a1k510/zcpMy2OT KEc4pWHSdqWQIpJkY8FPK99jQGc6ZjTaptTR5piMLG/oIDsp99M3OrafjzdemDygbnmLu+LxPg //Y7MJVfoe8idE39+dvdudlCzEoK8m/i2eeRFqfemNmLyJgBMbegT+JL6zyG4UUl5XOTVpA1Jk KxU= From: Roger Pau Monne <roger.pau@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Wed, 8 Jan 2020 11:38:56 +0100 Message-ID: <20200108103857.77236-2-roger.pau@citrix.com> In-Reply-To: <20200108103857.77236-1-roger.pau@citrix.com> References: <20200108103857.77236-1-roger.pau@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH 1/2] nvmx: fix handling of interrupts Precedence: list Cc: Kevin Tian <kevin.tian@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>, Roger Pau Monne <roger.pau@citrix.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	fix nested interrupt injection on virtual vmexit \| expand [0/2] fix nested interrupt injection on virtual vmexit [1/2] nvmx: fix handling of interrupts [2/2] Revert "tools/libxc: disable x2APIC when using nested virtualization"

Roger Pau Monné Jan. 8, 2020, 10:38 a.m. UTC

When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
interrupts shouldn't be injected using the virtual interrupt delivery
mechanism, and instead should be signaled in the vmcs using the exit
reason and the interruption-information field if the "Acknowledge
interrupt on exit" vmexit control is set.

Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
interrupts on virtual vmexit using the virtual interrupt delivery
assistance, and it's also bogus to ack interrupts without checking if
the vmexit "Acknowledge interrupt on exit" vmexit control is set.
nvmx_intr_intercept already handles interrupts correctly on virtual
vmexit.

Note that this fixes the usage of x2APIC by the L1 VMM, at least when
the L1 VMM is Xen.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c | 32 --------------------------------
 1 file changed, 32 deletions(-)

Tian, Kevin Jan. 19, 2020, 4:15 a.m. UTC | #1

> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Wednesday, January 8, 2020 6:39 PM
> 
> When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> interrupts shouldn't be injected using the virtual interrupt delivery
> mechanism, and instead should be signaled in the vmcs using the exit
> reason and the interruption-information field if the "Acknowledge
> interrupt on exit" vmexit control is set.
> 
> Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> interrupts on virtual vmexit using the virtual interrupt delivery
> assistance, and it's also bogus to ack interrupts without checking if
> the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> nvmx_intr_intercept already handles interrupts correctly on virtual
> vmexit.
> 
> Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> the L1 VMM is Xen.

while this fix makes sense to me, can you also test other L1 VMMs,
so we don't overlook some other intentions covered or hidden by
removed logic?

> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c | 32 --------------------------------
>  1 file changed, 32 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index d8ab167d62..af48b0beef 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1316,35 +1316,6 @@ static void sync_exception_state(struct vcpu *v)
>      }
>  }
> 
> -static void nvmx_update_apicv(struct vcpu *v)
> -{
> -    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> -    unsigned long reason = get_vvmcs(v, VM_EXIT_REASON);
> -    uint32_t intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO);
> -
> -    if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> -         nvmx->intr.source == hvm_intsrc_lapic &&
> -         (intr_info & INTR_INFO_VALID_MASK) )
> -    {
> -        uint16_t status;
> -        uint32_t rvi, ppr;
> -        uint32_t vector = intr_info & 0xff;
> -        struct vlapic *vlapic = vcpu_vlapic(v);
> -
> -        vlapic_ack_pending_irq(v, vector, 1);
> -
> -        ppr = vlapic_set_ppr(vlapic);
> -        WARN_ON((ppr & 0xf0) != (vector & 0xf0));
> -
> -        status = vector << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
> -        rvi = vlapic_has_pending_irq(v);
> -        if ( rvi != -1 )
> -            status |= rvi & VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK;
> -
> -        __vmwrite(GUEST_INTR_STATUS, status);
> -    }
> -}
> -
>  static void virtual_vmexit(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> @@ -1393,9 +1364,6 @@ static void virtual_vmexit(struct cpu_user_regs
> *regs)
>      /* updating host cr0 to sync TS bit */
>      __vmwrite(HOST_CR0, v->arch.hvm.vmx.host_cr0);
> 
> -    if ( cpu_has_vmx_virtual_intr_delivery )
> -        nvmx_update_apicv(v);
> -
>      nvcpu->nv_vmswitch_in_progress = 0;
>  }
> 
> --
> 2.24.1

Roger Pau Monné Jan. 20, 2020, 10:19 a.m. UTC | #2

On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > Sent: Wednesday, January 8, 2020 6:39 PM
> > 
> > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > interrupts shouldn't be injected using the virtual interrupt delivery
> > mechanism, and instead should be signaled in the vmcs using the exit
> > reason and the interruption-information field if the "Acknowledge
> > interrupt on exit" vmexit control is set.
> > 
> > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > interrupts on virtual vmexit using the virtual interrupt delivery
> > assistance, and it's also bogus to ack interrupts without checking if
> > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > nvmx_intr_intercept already handles interrupts correctly on virtual
> > vmexit.
> > 
> > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > the L1 VMM is Xen.
> 
> while this fix makes sense to me, can you also test other L1 VMMs,
> so we don't overlook some other intentions covered or hidden by
> removed logic?

I could test other hypervisors, but do we really expect anything
that's not Xen on Xen to work?

I'm asking because that's the only combination that's actually tested
by osstest.

Thanks, Roger.

Tian, Kevin Jan. 21, 2020, 3:34 a.m. UTC | #3

> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Monday, January 20, 2020 6:19 PM
> 
> On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > Sent: Wednesday, January 8, 2020 6:39 PM
> > >
> > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > mechanism, and instead should be signaled in the vmcs using the exit
> > > reason and the interruption-information field if the "Acknowledge
> > > interrupt on exit" vmexit control is set.
> > >
> > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > assistance, and it's also bogus to ack interrupts without checking if
> > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > vmexit.
> > >
> > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > the L1 VMM is Xen.
> >
> > while this fix makes sense to me, can you also test other L1 VMMs,
> > so we don't overlook some other intentions covered or hidden by
> > removed logic?
> 
> I could test other hypervisors, but do we really expect anything
> that's not Xen on Xen to work?
> 
> I'm asking because that's the only combination that's actually tested
> by osstest.
> 
> Thanks, Roger.

If others are OK with your assumption, then it's fine. I didn't tightly 
follow the nested virtualization requirements in Xen.

On the other hand, I think this patch needs a revision. It is not bogus
to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
on exit" is off. In such case, the delivery doesn't happen until L1 
hypervisor enables interrupt to clear interrupt window. Then it does
save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
check "Ack interrupt on exit". So I prefer to add such check there 
instead of completely removing this optimization.

Thanks
Kevin

Roger Pau Monné Jan. 21, 2020, 4:47 p.m. UTC | #4

On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Monday, January 20, 2020 6:19 PM
> > 
> > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > >
> > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > mechanism, and instead should be signaled in the vmcs using the exit
> > > > reason and the interruption-information field if the "Acknowledge
> > > > interrupt on exit" vmexit control is set.
> > > >
> > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > vmexit.
> > > >
> > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > > the L1 VMM is Xen.
> > >
> > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > so we don't overlook some other intentions covered or hidden by
> > > removed logic?
> > 
> > I could test other hypervisors, but do we really expect anything
> > that's not Xen on Xen to work?
> > 
> > I'm asking because that's the only combination that's actually tested
> > by osstest.
> > 
> > Thanks, Roger.
> 
> If others are OK with your assumption, then it's fine. I didn't tightly 
> follow the nested virtualization requirements in Xen.

I can try KVM or bhyve on top of Xen, but I'm not sure whether anyone
has actually tested this, so I could be triggering other bugs in the
nested code.

> On the other hand, I think this patch needs a revision. It is not bogus
> to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> on exit" is off. In such case, the delivery doesn't happen until L1 
> hypervisor enables interrupt to clear interrupt window. Then it does
> save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> check "Ack interrupt on exit". So I prefer to add such check there 
> instead of completely removing this optimization.

Right, if "Ack interrupt on exit" is off the interrupt will trigger a
vmexit, but it won't be acked and the vmexit interrupt information
should have bit 31 set to 0, which I think we don't set correctly.

The Intel SDM states:

"For other VM exits (including those due to external interrupts when
the “acknowledge interrupt on exit” VM-exit control is 0), the field
is marked invalid (by clearing bit 31) and the remainder of the field
is undefined."

AFAICT sync_exception_state also needs to check if VM_EXIT_CONTROLS
has VM_EXIT_ACK_INTR_ON_EXIT set, and only set VM_EXIT_INTR_INFO in
that case, do you agree?

Thanks, Roger.

Tian, Kevin Jan. 22, 2020, 2:18 a.m. UTC | #5

> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Wednesday, January 22, 2020 12:47 AM
> 
> On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: Monday, January 20, 2020 6:19 PM
> > >
> > > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > > >
> > > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > > mechanism, and instead should be signaled in the vmcs using the exit
> > > > > reason and the interruption-information field if the "Acknowledge
> > > > > interrupt on exit" vmexit control is set.
> > > > >
> > > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > > vmexit.
> > > > >
> > > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > > > the L1 VMM is Xen.
> > > >
> > > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > > so we don't overlook some other intentions covered or hidden by
> > > > removed logic?
> > >
> > > I could test other hypervisors, but do we really expect anything
> > > that's not Xen on Xen to work?
> > >
> > > I'm asking because that's the only combination that's actually tested
> > > by osstest.
> > >
> > > Thanks, Roger.
> >
> > If others are OK with your assumption, then it's fine. I didn't tightly
> > follow the nested virtualization requirements in Xen.
> 
> I can try KVM or bhyve on top of Xen, but I'm not sure whether anyone
> has actually tested this, so I could be triggering other bugs in the
> nested code.
> 
> > On the other hand, I think this patch needs a revision. It is not bogus
> > to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> > on exit" is off. In such case, the delivery doesn't happen until L1
> > hypervisor enables interrupt to clear interrupt window. Then it does
> > save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> > check "Ack interrupt on exit". So I prefer to add such check there
> > instead of completely removing this optimization.
> 
> Right, if "Ack interrupt on exit" is off the interrupt will trigger a
> vmexit, but it won't be acked and the vmexit interrupt information
> should have bit 31 set to 0, which I think we don't set correctly.
> 
> The Intel SDM states:
> 
> "For other VM exits (including those due to external interrupts when
> the “acknowledge interrupt on exit” VM-exit control is 0), the field
> is marked invalid (by clearing bit 31) and the remainder of the field
> is undefined."
> 
> AFAICT sync_exception_state also needs to check if VM_EXIT_CONTROLS
> has VM_EXIT_ACK_INTR_ON_EXIT set, and only set VM_EXIT_INTR_INFO in
> that case, do you agree?
> 

nice catch.

Thanks
Kevin

Roger Pau Monné Jan. 23, 2020, 12:31 p.m. UTC | #6

On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Monday, January 20, 2020 6:19 PM
> > 
> > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > >
> > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > mechanism, and instead should be signaled in the vmcs using the exit
> > > > reason and the interruption-information field if the "Acknowledge
> > > > interrupt on exit" vmexit control is set.
> > > >
> > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > vmexit.
> > > >
> > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > > the L1 VMM is Xen.
> > >
> > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > so we don't overlook some other intentions covered or hidden by
> > > removed logic?
> > 
> > I could test other hypervisors, but do we really expect anything
> > that's not Xen on Xen to work?
> > 
> > I'm asking because that's the only combination that's actually tested
> > by osstest.
> > 
> > Thanks, Roger.
> 
> If others are OK with your assumption, then it's fine. I didn't tightly 
> follow the nested virtualization requirements in Xen.
> 
> On the other hand, I think this patch needs a revision. It is not bogus
> to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> on exit" is off. In such case, the delivery doesn't happen until L1 
> hypervisor enables interrupt to clear interrupt window. Then it does
> save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> check "Ack interrupt on exit". So I prefer to add such check there 
> instead of completely removing this optimization.

I went back over this, and I'm still not sure calling
nvmx_update_apicv is actually required: AFAICT vmx_intr_assist will
already inject the interrupt correctly using virtual interrupt
delivery if left pending on the vlapic. I guess the code in
nvmx_update_apicv doesn't hurt as long as it's gated on "Ack on exit"
not being enabled, but it's likely redundant.

I will send an updated patch anyway, since I would like to get this
sorted out sooner rather than later and will follow your advise of
leaving nvmx_update_apicv in place gated by a check of whether "Ack on
exit" is not enabled.

Thanks, Roger.

Tian, Kevin Feb. 3, 2020, 7:24 a.m. UTC | #7

> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Thursday, January 23, 2020 8:32 PM
> 
> On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: Monday, January 20, 2020 6:19 PM
> > >
> > > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > > >
> > > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > > mechanism, and instead should be signaled in the vmcs using the exit
> > > > > reason and the interruption-information field if the "Acknowledge
> > > > > interrupt on exit" vmexit control is set.
> > > > >
> > > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > > vmexit.
> > > > >
> > > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > > > the L1 VMM is Xen.
> > > >
> > > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > > so we don't overlook some other intentions covered or hidden by
> > > > removed logic?
> > >
> > > I could test other hypervisors, but do we really expect anything
> > > that's not Xen on Xen to work?
> > >
> > > I'm asking because that's the only combination that's actually tested
> > > by osstest.
> > >
> > > Thanks, Roger.
> >
> > If others are OK with your assumption, then it's fine. I didn't tightly
> > follow the nested virtualization requirements in Xen.
> >
> > On the other hand, I think this patch needs a revision. It is not bogus
> > to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> > on exit" is off. In such case, the delivery doesn't happen until L1
> > hypervisor enables interrupt to clear interrupt window. Then it does
> > save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> > check "Ack interrupt on exit". So I prefer to add such check there
> > instead of completely removing this optimization.
> 
> I went back over this, and I'm still not sure calling
> nvmx_update_apicv is actually required: AFAICT vmx_intr_assist will
> already inject the interrupt correctly using virtual interrupt
> delivery if left pending on the vlapic. I guess the code in
> nvmx_update_apicv doesn't hurt as long as it's gated on "Ack on exit"
> not being enabled, but it's likely redundant.

It's not redundant. If you look at the code sequence, vmx_intr_assist
is invoked before nvmx_switch_guest. At that time, the L1 vCPU is still
in nested guest mode, thereby nvmx_intr_intercept takes effect which
injects the pending vector into vmcs02 and bypasses the remaining
virtual interrupt delivery logic for vmcs01. That is the main reason, imo,
why nvmx_update_apicv is introduced.

iiuc, nvmx_intr_intercept and nvmx_update_apicv work together to
complete nested interrupt injection:

(1) If "Ack interrupt on exit" is on, the pending vector is acked by 
the former and delivered in vvmexit information field.
(2) If "Ack interrupt on exit" is off and no virtual interrupt delivery, 
no ack and interrupt window is enabled by the former.
(3) Otherwise, the vector is acked by the latter and delivered through
virtual interrupt delivery (where vmcs01 has been switched in). 

However, there are two issues in current code. One is about (3), i.e.,
as you identified nvmx_update_apicv shouldn't blindly enable the
optimization without checking the Ack setting. the other is new 
about (2) - currently nvmx_intr_interrupt always enables interrupt 
window when the Ack setting is off, which actually negates the 
optimization of nvmx_update_apicv. Both should be fixed.

> 
> I will send an updated patch anyway, since I would like to get this
> sorted out sooner rather than later and will follow your advise of
> leaving nvmx_update_apicv in place gated by a check of whether "Ack on
> exit" is not enabled.
> 
> Thanks, Roger.

Roger Pau Monné Feb. 3, 2020, 11:55 a.m. UTC | #8

On Mon, Feb 03, 2020 at 07:24:04AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: Thursday, January 23, 2020 8:32 PM
> > 
> > On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > > Sent: Monday, January 20, 2020 6:19 PM
> > > >
> > > > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > > > >
> > > > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > > > mechanism, and instead should be signaled in the vmcs using the exit
> > > > > > reason and the interruption-information field if the "Acknowledge
> > > > > > interrupt on exit" vmexit control is set.
> > > > > >
> > > > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to inject
> > > > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > > > vmexit.
> > > > > >
> > > > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least when
> > > > > > the L1 VMM is Xen.
> > > > >
> > > > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > > > so we don't overlook some other intentions covered or hidden by
> > > > > removed logic?
> > > >
> > > > I could test other hypervisors, but do we really expect anything
> > > > that's not Xen on Xen to work?
> > > >
> > > > I'm asking because that's the only combination that's actually tested
> > > > by osstest.
> > > >
> > > > Thanks, Roger.
> > >
> > > If others are OK with your assumption, then it's fine. I didn't tightly
> > > follow the nested virtualization requirements in Xen.
> > >
> > > On the other hand, I think this patch needs a revision. It is not bogus
> > > to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> > > on exit" is off. In such case, the delivery doesn't happen until L1
> > > hypervisor enables interrupt to clear interrupt window. Then it does
> > > save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> > > check "Ack interrupt on exit". So I prefer to add such check there
> > > instead of completely removing this optimization.
> > 
> > I went back over this, and I'm still not sure calling
> > nvmx_update_apicv is actually required: AFAICT vmx_intr_assist will
> > already inject the interrupt correctly using virtual interrupt
> > delivery if left pending on the vlapic. I guess the code in
> > nvmx_update_apicv doesn't hurt as long as it's gated on "Ack on exit"
> > not being enabled, but it's likely redundant.
> 
> It's not redundant. If you look at the code sequence, vmx_intr_assist
> is invoked before nvmx_switch_guest. At that time, the L1 vCPU is still
> in nested guest mode, thereby nvmx_intr_intercept takes effect which
> injects the pending vector into vmcs02 and bypasses the remaining
> virtual interrupt delivery logic for vmcs01. That is the main reason, imo,
> why nvmx_update_apicv is introduced.
> 
> iiuc, nvmx_intr_intercept and nvmx_update_apicv work together to
> complete nested interrupt injection:
> 
> (1) If "Ack interrupt on exit" is on, the pending vector is acked by 
> the former and delivered in vvmexit information field.
> (2) If "Ack interrupt on exit" is off and no virtual interrupt delivery, 
> no ack and interrupt window is enabled by the former.
> (3) Otherwise, the vector is acked by the latter and delivered through
> virtual interrupt delivery (where vmcs01 has been switched in). 
> 
> However, there are two issues in current code. One is about (3), i.e.,
> as you identified nvmx_update_apicv shouldn't blindly enable the
> optimization without checking the Ack setting. the other is new 
> about (2) - currently nvmx_intr_interrupt always enables interrupt 
> window when the Ack setting is off, which actually negates the 
> optimization of nvmx_update_apicv. Both should be fixed.

OK, I think I got it. It's likely however that vmx_intr_assist is also
called with the vmcs already switched to vmcs01 (if there's a pending
softirq for example), I guess vmx_intr_assist also copes correctly
when called with the vmcs already switched?

Thanks, Roger.

Tian, Kevin Feb. 4, 2020, 1:31 a.m. UTC | #9

> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Monday, February 3, 2020 7:56 PM
> 
> On Mon, Feb 03, 2020 at 07:24:04AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: Thursday, January 23, 2020 8:32 PM
> > >
> > > On Tue, Jan 21, 2020 at 03:34:13AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > > > Sent: Monday, January 20, 2020 6:19 PM
> > > > >
> > > > > On Sun, Jan 19, 2020 at 04:15:04AM +0000, Tian, Kevin wrote:
> > > > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > > > Sent: Wednesday, January 8, 2020 6:39 PM
> > > > > > >
> > > > > > > When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
> > > > > > > interrupts shouldn't be injected using the virtual interrupt delivery
> > > > > > > mechanism, and instead should be signaled in the vmcs using the
> exit
> > > > > > > reason and the interruption-information field if the "Acknowledge
> > > > > > > interrupt on exit" vmexit control is set.
> > > > > > >
> > > > > > > Remove the nvmx_update_apicv helper: it's bogus to attempt to
> inject
> > > > > > > interrupts on virtual vmexit using the virtual interrupt delivery
> > > > > > > assistance, and it's also bogus to ack interrupts without checking if
> > > > > > > the vmexit "Acknowledge interrupt on exit" vmexit control is set.
> > > > > > > nvmx_intr_intercept already handles interrupts correctly on virtual
> > > > > > > vmexit.
> > > > > > >
> > > > > > > Note that this fixes the usage of x2APIC by the L1 VMM, at least
> when
> > > > > > > the L1 VMM is Xen.
> > > > > >
> > > > > > while this fix makes sense to me, can you also test other L1 VMMs,
> > > > > > so we don't overlook some other intentions covered or hidden by
> > > > > > removed logic?
> > > > >
> > > > > I could test other hypervisors, but do we really expect anything
> > > > > that's not Xen on Xen to work?
> > > > >
> > > > > I'm asking because that's the only combination that's actually tested
> > > > > by osstest.
> > > > >
> > > > > Thanks, Roger.
> > > >
> > > > If others are OK with your assumption, then it's fine. I didn't tightly
> > > > follow the nested virtualization requirements in Xen.
> > > >
> > > > On the other hand, I think this patch needs a revision. It is not bogus
> > > > to use virtual interrupt delivery on virtual VMexit, if "Ack interrupt
> > > > on exit" is off. In such case, the delivery doesn't happen until L1
> > > > hypervisor enables interrupt to clear interrupt window. Then it does
> > > > save one exit. The only bogus point is that nvmx_udpate_apicv doesn't
> > > > check "Ack interrupt on exit". So I prefer to add such check there
> > > > instead of completely removing this optimization.
> > >
> > > I went back over this, and I'm still not sure calling
> > > nvmx_update_apicv is actually required: AFAICT vmx_intr_assist will
> > > already inject the interrupt correctly using virtual interrupt
> > > delivery if left pending on the vlapic. I guess the code in
> > > nvmx_update_apicv doesn't hurt as long as it's gated on "Ack on exit"
> > > not being enabled, but it's likely redundant.
> >
> > It's not redundant. If you look at the code sequence, vmx_intr_assist
> > is invoked before nvmx_switch_guest. At that time, the L1 vCPU is still
> > in nested guest mode, thereby nvmx_intr_intercept takes effect which
> > injects the pending vector into vmcs02 and bypasses the remaining
> > virtual interrupt delivery logic for vmcs01. That is the main reason, imo,
> > why nvmx_update_apicv is introduced.
> >
> > iiuc, nvmx_intr_intercept and nvmx_update_apicv work together to
> > complete nested interrupt injection:
> >
> > (1) If "Ack interrupt on exit" is on, the pending vector is acked by
> > the former and delivered in vvmexit information field.
> > (2) If "Ack interrupt on exit" is off and no virtual interrupt delivery,
> > no ack and interrupt window is enabled by the former.
> > (3) Otherwise, the vector is acked by the latter and delivered through
> > virtual interrupt delivery (where vmcs01 has been switched in).
> >
> > However, there are two issues in current code. One is about (3), i.e.,
> > as you identified nvmx_update_apicv shouldn't blindly enable the
> > optimization without checking the Ack setting. the other is new
> > about (2) - currently nvmx_intr_interrupt always enables interrupt
> > window when the Ack setting is off, which actually negates the
> > optimization of nvmx_update_apicv. Both should be fixed.
> 
> OK, I think I got it. It's likely however that vmx_intr_assist is also
> called with the vmcs already switched to vmcs01 (if there's a pending
> softirq for example), I guess vmx_intr_assist also copes correctly
> when called with the vmcs already switched?
> 

Yes. What I described is for the case without pending softirqs.

[1/2] nvmx: fix handling of interrupts

Commit Message

Comments

Patch