diff mbox series

x86/dpci: remove the dpci EOI timer

Message ID 20210112173248.28646-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/dpci: remove the dpci EOI timer | expand

Commit Message

Roger Pau Monné Jan. 12, 2021, 5:32 p.m. UTC
Current interrupt pass though code will setup a timer for each
interrupt injected to the guest that requires an EOI from the guest.
Such timer would perform two actions if the guest doesn't EOI the
interrupt before a given period of time. The first one is deasserting
the virtual line, the second is perform an EOI of the physical
interrupt source if it requires such.

The deasserting of the guest virtual line is wrong, since it messes
with the interrupt status of the guest. It's not clear why this was
odne in the first place, it should be the guest the one to EOI the
interrupt and thus deassert the line.

Performing an EOI of the physical interrupt source is redundant, since
there's already a timer that takes care of this for all interrupts,
not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
field.

Since both of the actions performed by the dpci timer are not
required, remove it altogether.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
As with previous patches, I'm having a hard time figuring out why this
was required in the first place. I see no reason for Xen to be
deasserting the guest virtual line. There's a comment:

/*
 * Set a timer to see if the guest can finish the interrupt or not. For
 * example, the guest OS may unmask the PIC during boot, before the
 * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
 * guest will never deal with the irq, then the physical interrupt line
 * will never be deasserted.
 */

Did this happen because the device was passed through in a bogus state
where it would generate interrupts without the guest requesting

Won't the guest face the same issues when booted on bare metal, and
thus would already have the means to deal with such issues?
---
 xen/drivers/passthrough/vtd/x86/hvm.c |  3 -
 xen/drivers/passthrough/x86/hvm.c     | 95 +--------------------------
 xen/include/asm-x86/hvm/irq.h         |  3 -
 xen/include/xen/iommu.h               |  5 --
 4 files changed, 2 insertions(+), 104 deletions(-)

Comments

Tian, Kevin Jan. 13, 2021, 6:21 a.m. UTC | #1
> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Wednesday, January 13, 2021 1:33 AM
> 
> Current interrupt pass though code will setup a timer for each
> interrupt injected to the guest that requires an EOI from the guest.
> Such timer would perform two actions if the guest doesn't EOI the
> interrupt before a given period of time. The first one is deasserting
> the virtual line, the second is perform an EOI of the physical
> interrupt source if it requires such.
> 
> The deasserting of the guest virtual line is wrong, since it messes
> with the interrupt status of the guest. It's not clear why this was
> odne in the first place, it should be the guest the one to EOI the
> interrupt and thus deassert the line.
> 
> Performing an EOI of the physical interrupt source is redundant, since
> there's already a timer that takes care of this for all interrupts,
> not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> field.
> 
> Since both of the actions performed by the dpci timer are not
> required, remove it altogether.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> As with previous patches, I'm having a hard time figuring out why this
> was required in the first place. I see no reason for Xen to be
> deasserting the guest virtual line. There's a comment:
> 
> /*
>  * Set a timer to see if the guest can finish the interrupt or not. For
>  * example, the guest OS may unmask the PIC during boot, before the
>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>  * guest will never deal with the irq, then the physical interrupt line
>  * will never be deasserted.
>  */
> 
> Did this happen because the device was passed through in a bogus state
> where it would generate interrupts without the guest requesting

It could be a case where two devices share the same interrupt line and
are assigned to different domains. In this case, the interrupt activity of 
two devices interfere with each other.


> 
> Won't the guest face the same issues when booted on bare metal, and
> thus would already have the means to deal with such issues?

The original commit was added by me in ~13yrs ago (0f843ba00c95)
when enabling Xen in a client virtualization environment where interrupt
sharing is popular. I believe above comment was recorded for a real 
problem at the moment (deassert resets the intx line to unblock further
interrupts). But I'm not sure whether it is still the case after both Xen and 
guest OS have changed a lot. At least some test from people who
still use Xen in shared interrupt scenario would be helpful. Or, if such
usage is already niche, maybe we can consider disallow passing through
devices which share the same interrupt line to different domains and
then safely remove this dpci EOI trick.

Thanks
Kevin

> ---
>  xen/drivers/passthrough/vtd/x86/hvm.c |  3 -
>  xen/drivers/passthrough/x86/hvm.c     | 95 +--------------------------
>  xen/include/asm-x86/hvm/irq.h         |  3 -
>  xen/include/xen/iommu.h               |  5 --
>  4 files changed, 2 insertions(+), 104 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/x86/hvm.c
> b/xen/drivers/passthrough/vtd/x86/hvm.c
> index f77b35815c..b531fe907a 100644
> --- a/xen/drivers/passthrough/vtd/x86/hvm.c
> +++ b/xen/drivers/passthrough/vtd/x86/hvm.c
> @@ -36,10 +36,7 @@ static int _hvm_dpci_isairq_eoi(struct domain *d,
>          {
>              hvm_pci_intx_deassert(d, digl->device, digl->intx);
>              if ( --pirq_dpci->pending == 0 )
> -            {
> -                stop_timer(&pirq_dpci->timer);
>                  pirq_guest_eoi(dpci_pirq(pirq_dpci));
> -            }
>          }
>      }
> 
> diff --git a/xen/drivers/passthrough/x86/hvm.c
> b/xen/drivers/passthrough/x86/hvm.c
> index edc8059518..5d901db50c 100644
> --- a/xen/drivers/passthrough/x86/hvm.c
> +++ b/xen/drivers/passthrough/x86/hvm.c
> @@ -136,77 +136,6 @@ static void pt_pirq_softirq_reset(struct
> hvm_pirq_dpci *pirq_dpci)
>      pirq_dpci->masked = 0;
>  }
> 
> -bool pt_irq_need_timer(uint32_t flags)
> -{
> -    return !(flags & (HVM_IRQ_DPCI_GUEST_MSI |
> HVM_IRQ_DPCI_TRANSLATE |
> -                      HVM_IRQ_DPCI_NO_EOI));
> -}
> -
> -static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci
> *pirq_dpci,
> -                            void *arg)
> -{
> -    if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
> -                              &pirq_dpci->flags) )
> -    {
> -        pirq_dpci->masked = 0;
> -        pirq_dpci->pending = 0;
> -        pirq_guest_eoi(dpci_pirq(pirq_dpci));
> -    }
> -
> -    return 0;
> -}
> -
> -static void pt_irq_time_out(void *data)
> -{
> -    struct hvm_pirq_dpci *irq_map = data;
> -    const struct hvm_irq_dpci *dpci;
> -    const struct dev_intx_gsi_link *digl;
> -
> -    spin_lock(&irq_map->dom->event_lock);
> -
> -    if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI )
> -    {
> -        ASSERT(is_hardware_domain(irq_map->dom));
> -        /*
> -         * Identity mapped, no need to iterate over the guest GSI list to find
> -         * other pirqs sharing the same guest GSI.
> -         *
> -         * In the identity mapped case the EOI can also be done now, this way
> -         * the iteration over the list of domain pirqs is avoided.
> -         */
> -        hvm_gsi_deassert(irq_map->dom, dpci_pirq(irq_map)->pirq);
> -        irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
> -        pt_irq_guest_eoi(irq_map->dom, irq_map, NULL);
> -        spin_unlock(&irq_map->dom->event_lock);
> -        return;
> -    }
> -
> -    dpci = domain_get_irq_dpci(irq_map->dom);
> -    if ( unlikely(!dpci) )
> -    {
> -        ASSERT_UNREACHABLE();
> -        spin_unlock(&irq_map->dom->event_lock);
> -        return;
> -    }
> -    list_for_each_entry ( digl, &irq_map->digl_list, list )
> -    {
> -        unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
> -        const struct hvm_girq_dpci_mapping *girq;
> -
> -        list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
> -        {
> -            struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);
> -
> -            pirq_dpci(pirq)->flags |= HVM_IRQ_DPCI_EOI_LATCH;
> -        }
> -        hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
> -    }
> -
> -    pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
> -
> -    spin_unlock(&irq_map->dom->event_lock);
> -}
> -
>  struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *d)
>  {
>      if ( !d || !is_hvm_domain(d) )
> @@ -568,15 +497,10 @@ int pt_irq_create_bind(
>                  }
>              }
> 
> -            /* Init timer before binding */
> -            if ( pt_irq_need_timer(pirq_dpci->flags) )
> -                init_timer(&pirq_dpci->timer, pt_irq_time_out, pirq_dpci, 0);
>              /* Deal with gsi for legacy devices */
>              rc = pirq_guest_bind(d->vcpu[0], info, share);
>              if ( unlikely(rc) )
>              {
> -                if ( pt_irq_need_timer(pirq_dpci->flags) )
> -                    kill_timer(&pirq_dpci->timer);
>                  /*
>                   * There is no path for __do_IRQ to schedule softirq as
>                   * IRQ_GUEST is not set. As such we can reset 'dom' directly.
> @@ -743,8 +667,6 @@ int pt_irq_destroy_bind(
>      {
>          pirq_guest_unbind(d, pirq);
>          msixtbl_pt_unregister(d, pirq);
> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> -            kill_timer(&pirq_dpci->timer);
>          pirq_dpci->flags = 0;
>          /*
>           * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> @@ -934,16 +856,6 @@ static void hvm_dirq_assist(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci)
>              __msi_pirq_eoi(pirq_dpci);
>              goto out;
>          }
> -
> -        /*
> -         * Set a timer to see if the guest can finish the interrupt or not. For
> -         * example, the guest OS may unmask the PIC during boot, before the
> -         * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> -         * guest will never deal with the irq, then the physical interrupt line
> -         * will never be deasserted.
> -         */
> -        ASSERT(pt_irq_need_timer(pirq_dpci->flags));
> -        set_timer(&pirq_dpci->timer, NOW() + PT_IRQ_TIME_OUT);
>      }
> 
>   out:
> @@ -967,10 +879,10 @@ static void hvm_pirq_eoi(struct pirq *pirq)
>       * since interrupt is still not EOIed
>       */
>      if ( --pirq_dpci->pending ||
> -         !pt_irq_need_timer(pirq_dpci->flags) )
> +         /* When the interrupt source is MSI no Ack should be performed. */
> +         pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE )
>          return;
> 
> -    stop_timer(&pirq_dpci->timer);
>      pirq_guest_eoi(pirq);
>  }
> 
> @@ -1038,9 +950,6 @@ static int pci_clean_dpci_irq(struct domain *d,
> 
>      pirq_guest_unbind(d, dpci_pirq(pirq_dpci));
> 
> -    if ( pt_irq_need_timer(pirq_dpci->flags) )
> -        kill_timer(&pirq_dpci->timer);
> -
>      list_for_each_entry_safe ( digl, tmp, &pirq_dpci->digl_list, list )
>      {
>          list_del(&digl->list);
> diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-
> x86/hvm/irq.h
> index 532880d497..d40e13de6e 100644
> --- a/xen/include/asm-x86/hvm/irq.h
> +++ b/xen/include/asm-x86/hvm/irq.h
> @@ -117,7 +117,6 @@ struct dev_intx_gsi_link {
>  #define _HVM_IRQ_DPCI_MACH_PCI_SHIFT            0
>  #define _HVM_IRQ_DPCI_MACH_MSI_SHIFT            1
>  #define _HVM_IRQ_DPCI_MAPPED_SHIFT              2
> -#define _HVM_IRQ_DPCI_EOI_LATCH_SHIFT           3
>  #define _HVM_IRQ_DPCI_GUEST_PCI_SHIFT           4
>  #define _HVM_IRQ_DPCI_GUEST_MSI_SHIFT           5
>  #define _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT        6
> @@ -126,7 +125,6 @@ struct dev_intx_gsi_link {
>  #define HVM_IRQ_DPCI_MACH_PCI        (1u <<
> _HVM_IRQ_DPCI_MACH_PCI_SHIFT)
>  #define HVM_IRQ_DPCI_MACH_MSI        (1u <<
> _HVM_IRQ_DPCI_MACH_MSI_SHIFT)
>  #define HVM_IRQ_DPCI_MAPPED          (1u <<
> _HVM_IRQ_DPCI_MAPPED_SHIFT)
> -#define HVM_IRQ_DPCI_EOI_LATCH       (1u <<
> _HVM_IRQ_DPCI_EOI_LATCH_SHIFT)
>  #define HVM_IRQ_DPCI_GUEST_PCI       (1u <<
> _HVM_IRQ_DPCI_GUEST_PCI_SHIFT)
>  #define HVM_IRQ_DPCI_GUEST_MSI       (1u <<
> _HVM_IRQ_DPCI_GUEST_MSI_SHIFT)
>  #define HVM_IRQ_DPCI_IDENTITY_GSI    (1u <<
> _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT)
> @@ -173,7 +171,6 @@ struct hvm_pirq_dpci {
>      struct list_head digl_list;
>      struct domain *dom;
>      struct hvm_gmsi_info gmsi;
> -    struct timer timer;
>      struct list_head softirq_list;
>  };
> 
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index f0295fd6c3..4f3098b6ed 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -184,11 +184,6 @@ int pt_irq_destroy_bind(struct domain *, const
> struct xen_domctl_bind_pt_irq *);
>  void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
>  struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
>  void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
> -#ifdef CONFIG_HVM
> -bool pt_irq_need_timer(uint32_t flags);
> -#else
> -static inline bool pt_irq_need_timer(unsigned int flags) { return false; }
> -#endif
> 
>  struct msi_desc;
>  struct msi_msg;
> --
> 2.29.2
Roger Pau Monné Jan. 13, 2021, 1:11 p.m. UTC | #2
On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > Sent: Wednesday, January 13, 2021 1:33 AM
> > 
> > Current interrupt pass though code will setup a timer for each
> > interrupt injected to the guest that requires an EOI from the guest.
> > Such timer would perform two actions if the guest doesn't EOI the
> > interrupt before a given period of time. The first one is deasserting
> > the virtual line, the second is perform an EOI of the physical
> > interrupt source if it requires such.
> > 
> > The deasserting of the guest virtual line is wrong, since it messes
> > with the interrupt status of the guest. It's not clear why this was
> > odne in the first place, it should be the guest the one to EOI the
> > interrupt and thus deassert the line.
> > 
> > Performing an EOI of the physical interrupt source is redundant, since
> > there's already a timer that takes care of this for all interrupts,
> > not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> > field.
> > 
> > Since both of the actions performed by the dpci timer are not
> > required, remove it altogether.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > As with previous patches, I'm having a hard time figuring out why this
> > was required in the first place. I see no reason for Xen to be
> > deasserting the guest virtual line. There's a comment:
> > 
> > /*
> >  * Set a timer to see if the guest can finish the interrupt or not. For
> >  * example, the guest OS may unmask the PIC during boot, before the
> >  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> >  * guest will never deal with the irq, then the physical interrupt line
> >  * will never be deasserted.
> >  */
> > 
> > Did this happen because the device was passed through in a bogus state
> > where it would generate interrupts without the guest requesting
> 
> It could be a case where two devices share the same interrupt line and
> are assigned to different domains. In this case, the interrupt activity of 
> two devices interfere with each other.

This would also seem to be problematic if the device decides to use
MSI instead of INTx, but due to the shared nature of the INTx line we
would continue to inject INTx (generated from another device not
passed through to the guest) to the guest even if the device has
switched to MSI.

> > 
> > Won't the guest face the same issues when booted on bare metal, and
> > thus would already have the means to deal with such issues?
> 
> The original commit was added by me in ~13yrs ago (0f843ba00c95)
> when enabling Xen in a client virtualization environment where interrupt
> sharing is popular.

Thanks, the reference to the above commit is helpful, I wasn't able to
find it and it contains a comment that I think has been lost, which
provides the background on why this was added.

> I believe above comment was recorded for a real 
> problem at the moment (deassert resets the intx line to unblock further
> interrupts). But I'm not sure whether it is still the case after both Xen and 
> guest OS have changed a lot. At least some test from people who
> still use Xen in shared interrupt scenario would be helpful. Or, if such
> usage is already niche, maybe we can consider disallow passing through
> devices which share the same interrupt line to different domains and
> then safely remove this dpci EOI trick.

So the deassert done by timeout only deasserts the virtual line, but
doesn't for example clear the IRR bit from the vIO-APIC pin, which
will cause further interrupts to not be delivered anyway until a
proper EOI (or a switch to trigger mode) is done to the pin.

I think it's going to be complicated for me to find a system that has
two devices I can passthrough sharing the same GSI.

It also makes me wonder why do we need to keep all this internal state
about virtual lines and assert counts. Such virtual lines will get
de-asserted when the guest perform an EOI of the interrupt, which also
clear the PIC pending bits and allows further interrupts to be
injected. Wouldn't it be enough to simply try to assert the guest
irq/gsi, and let the interrupt controller decide whether it needs to
inject another interrupt, or there's one already pending?

vIO-APIC has the IRR field which I think it's fairly similar to what
Xen achieves with all the line logic: prevent another interrupt
injection until the current one has been EOI'ed. The i8259 doesn't
have such field, and might indeed end up with an extra interrupt
queued in IRR while there's one still pending in ISR, but I don't
think that would be harmful to the guest.

Thanks, Roger.
Jason Andryuk Jan. 13, 2021, 3:48 p.m. UTC | #3
On Wed, Jan 13, 2021 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > Sent: Wednesday, January 13, 2021 1:33 AM
> > >
> > > Current interrupt pass though code will setup a timer for each
> > > interrupt injected to the guest that requires an EOI from the guest.
> > > Such timer would perform two actions if the guest doesn't EOI the
> > > interrupt before a given period of time. The first one is deasserting
> > > the virtual line, the second is perform an EOI of the physical
> > > interrupt source if it requires such.
> > >
> > > The deasserting of the guest virtual line is wrong, since it messes
> > > with the interrupt status of the guest. It's not clear why this was
> > > odne in the first place, it should be the guest the one to EOI the
> > > interrupt and thus deassert the line.
> > >
> > > Performing an EOI of the physical interrupt source is redundant, since
> > > there's already a timer that takes care of this for all interrupts,
> > > not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> > > field.
> > >
> > > Since both of the actions performed by the dpci timer are not
> > > required, remove it altogether.
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > As with previous patches, I'm having a hard time figuring out why this
> > > was required in the first place. I see no reason for Xen to be
> > > deasserting the guest virtual line. There's a comment:
> > >
> > > /*
> > >  * Set a timer to see if the guest can finish the interrupt or not. For
> > >  * example, the guest OS may unmask the PIC during boot, before the
> > >  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> > >  * guest will never deal with the irq, then the physical interrupt line
> > >  * will never be deasserted.
> > >  */
> > >
> > > Did this happen because the device was passed through in a bogus state
> > > where it would generate interrupts without the guest requesting
> >
> > It could be a case where two devices share the same interrupt line and
> > are assigned to different domains. In this case, the interrupt activity of
> > two devices interfere with each other.
>
> This would also seem to be problematic if the device decides to use
> MSI instead of INTx, but due to the shared nature of the INTx line we
> would continue to inject INTx (generated from another device not
> passed through to the guest) to the guest even if the device has
> switched to MSI.
>
> > >
> > > Won't the guest face the same issues when booted on bare metal, and
> > > thus would already have the means to deal with such issues?
> >
> > The original commit was added by me in ~13yrs ago (0f843ba00c95)
> > when enabling Xen in a client virtualization environment where interrupt
> > sharing is popular.
>
> Thanks, the reference to the above commit is helpful, I wasn't able to
> find it and it contains a comment that I think has been lost, which
> provides the background on why this was added.
>
> > I believe above comment was recorded for a real
> > problem at the moment (deassert resets the intx line to unblock further
> > interrupts). But I'm not sure whether it is still the case after both Xen and
> > guest OS have changed a lot. At least some test from people who
> > still use Xen in shared interrupt scenario would be helpful. Or, if such
> > usage is already niche, maybe we can consider disallow passing through
> > devices which share the same interrupt line to different domains and
> > then safely remove this dpci EOI trick.
>
> So the deassert done by timeout only deasserts the virtual line, but
> doesn't for example clear the IRR bit from the vIO-APIC pin, which
> will cause further interrupts to not be delivered anyway until a
> proper EOI (or a switch to trigger mode) is done to the pin.
>
> I think it's going to be complicated for me to find a system that has
> two devices I can passthrough sharing the same GSI.

I have some laptops running OpenXT where the USB controller and NIC
share an interrupt, and I assign them to different domains.  Qubes
would hit this as well.

(I hoped MSI translate would help me sidestep the need to XSM label
the shared PIRQ, but as the other thread mentions, qemu-upstream
doesn't support that.  I started using this instead:
https://lore.kernel.org/xen-devel/20201019200318.103781-1-jandryuk@gmail.com/)

Regards,
Jason
Roger Pau Monné Jan. 13, 2021, 6:05 p.m. UTC | #4
On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:
> On Wed, Jan 13, 2021 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > Sent: Wednesday, January 13, 2021 1:33 AM
> > > >
> > > > Current interrupt pass though code will setup a timer for each
> > > > interrupt injected to the guest that requires an EOI from the guest.
> > > > Such timer would perform two actions if the guest doesn't EOI the
> > > > interrupt before a given period of time. The first one is deasserting
> > > > the virtual line, the second is perform an EOI of the physical
> > > > interrupt source if it requires such.
> > > >
> > > > The deasserting of the guest virtual line is wrong, since it messes
> > > > with the interrupt status of the guest. It's not clear why this was
> > > > odne in the first place, it should be the guest the one to EOI the
> > > > interrupt and thus deassert the line.
> > > >
> > > > Performing an EOI of the physical interrupt source is redundant, since
> > > > there's already a timer that takes care of this for all interrupts,
> > > > not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> > > > field.
> > > >
> > > > Since both of the actions performed by the dpci timer are not
> > > > required, remove it altogether.
> > > >
> > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > ---
> > > > As with previous patches, I'm having a hard time figuring out why this
> > > > was required in the first place. I see no reason for Xen to be
> > > > deasserting the guest virtual line. There's a comment:
> > > >
> > > > /*
> > > >  * Set a timer to see if the guest can finish the interrupt or not. For
> > > >  * example, the guest OS may unmask the PIC during boot, before the
> > > >  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> > > >  * guest will never deal with the irq, then the physical interrupt line
> > > >  * will never be deasserted.
> > > >  */
> > > >
> > > > Did this happen because the device was passed through in a bogus state
> > > > where it would generate interrupts without the guest requesting
> > >
> > > It could be a case where two devices share the same interrupt line and
> > > are assigned to different domains. In this case, the interrupt activity of
> > > two devices interfere with each other.
> >
> > This would also seem to be problematic if the device decides to use
> > MSI instead of INTx, but due to the shared nature of the INTx line we
> > would continue to inject INTx (generated from another device not
> > passed through to the guest) to the guest even if the device has
> > switched to MSI.
> >
> > > >
> > > > Won't the guest face the same issues when booted on bare metal, and
> > > > thus would already have the means to deal with such issues?
> > >
> > > The original commit was added by me in ~13yrs ago (0f843ba00c95)
> > > when enabling Xen in a client virtualization environment where interrupt
> > > sharing is popular.
> >
> > Thanks, the reference to the above commit is helpful, I wasn't able to
> > find it and it contains a comment that I think has been lost, which
> > provides the background on why this was added.
> >
> > > I believe above comment was recorded for a real
> > > problem at the moment (deassert resets the intx line to unblock further
> > > interrupts). But I'm not sure whether it is still the case after both Xen and
> > > guest OS have changed a lot. At least some test from people who
> > > still use Xen in shared interrupt scenario would be helpful. Or, if such
> > > usage is already niche, maybe we can consider disallow passing through
> > > devices which share the same interrupt line to different domains and
> > > then safely remove this dpci EOI trick.
> >
> > So the deassert done by timeout only deasserts the virtual line, but
> > doesn't for example clear the IRR bit from the vIO-APIC pin, which
> > will cause further interrupts to not be delivered anyway until a
> > proper EOI (or a switch to trigger mode) is done to the pin.
> >
> > I think it's going to be complicated for me to find a system that has
> > two devices I can passthrough sharing the same GSI.
> 
> I have some laptops running OpenXT where the USB controller and NIC
> share an interrupt, and I assign them to different domains.  Qubes
> would hit this as well.

Is there any chance you could try the patch and see if you can hit the
issue it was trying to fix?

It would be good to be able to reproduce it so that I could work on an
alternative fix that doesn't require having two timers for each IRQ in
flight.

> (I hoped MSI translate would help me sidestep the need to XSM label
> the shared PIRQ, but as the other thread mentions, qemu-upstream
> doesn't support that.  I started using this instead:
> https://lore.kernel.org/xen-devel/20201019200318.103781-1-jandryuk@gmail.com/)

MSI translate is also on my sight for removal, as you have seen from
that other thread :).

Thanks, Roger.
Jason Andryuk Jan. 13, 2021, 6:34 p.m. UTC | #5
On Wed, Jan 13, 2021 at 1:06 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:
> > On Wed, Jan 13, 2021 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > > > Sent: Wednesday, January 13, 2021 1:33 AM
> > > > >
> > > > > Current interrupt pass though code will setup a timer for each
> > > > > interrupt injected to the guest that requires an EOI from the guest.
> > > > > Such timer would perform two actions if the guest doesn't EOI the
> > > > > interrupt before a given period of time. The first one is deasserting
> > > > > the virtual line, the second is perform an EOI of the physical
> > > > > interrupt source if it requires such.
> > > > >
> > > > > The deasserting of the guest virtual line is wrong, since it messes
> > > > > with the interrupt status of the guest. It's not clear why this was
> > > > > odne in the first place, it should be the guest the one to EOI the
> > > > > interrupt and thus deassert the line.
> > > > >
> > > > > Performing an EOI of the physical interrupt source is redundant, since
> > > > > there's already a timer that takes care of this for all interrupts,
> > > > > not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> > > > > field.
> > > > >
> > > > > Since both of the actions performed by the dpci timer are not
> > > > > required, remove it altogether.
> > > > >
> > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > > ---
> > > > > As with previous patches, I'm having a hard time figuring out why this
> > > > > was required in the first place. I see no reason for Xen to be
> > > > > deasserting the guest virtual line. There's a comment:
> > > > >
> > > > > /*
> > > > >  * Set a timer to see if the guest can finish the interrupt or not. For
> > > > >  * example, the guest OS may unmask the PIC during boot, before the
> > > > >  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> > > > >  * guest will never deal with the irq, then the physical interrupt line
> > > > >  * will never be deasserted.
> > > > >  */
> > > > >
> > > > > Did this happen because the device was passed through in a bogus state
> > > > > where it would generate interrupts without the guest requesting
> > > >
> > > > It could be a case where two devices share the same interrupt line and
> > > > are assigned to different domains. In this case, the interrupt activity of
> > > > two devices interfere with each other.
> > >
> > > This would also seem to be problematic if the device decides to use
> > > MSI instead of INTx, but due to the shared nature of the INTx line we
> > > would continue to inject INTx (generated from another device not
> > > passed through to the guest) to the guest even if the device has
> > > switched to MSI.
> > >
> > > > >
> > > > > Won't the guest face the same issues when booted on bare metal, and
> > > > > thus would already have the means to deal with such issues?
> > > >
> > > > The original commit was added by me in ~13yrs ago (0f843ba00c95)
> > > > when enabling Xen in a client virtualization environment where interrupt
> > > > sharing is popular.
> > >
> > > Thanks, the reference to the above commit is helpful, I wasn't able to
> > > find it and it contains a comment that I think has been lost, which
> > > provides the background on why this was added.
> > >
> > > > I believe above comment was recorded for a real
> > > > problem at the moment (deassert resets the intx line to unblock further
> > > > interrupts). But I'm not sure whether it is still the case after both Xen and
> > > > guest OS have changed a lot. At least some test from people who
> > > > still use Xen in shared interrupt scenario would be helpful. Or, if such
> > > > usage is already niche, maybe we can consider disallow passing through
> > > > devices which share the same interrupt line to different domains and
> > > > then safely remove this dpci EOI trick.
> > >
> > > So the deassert done by timeout only deasserts the virtual line, but
> > > doesn't for example clear the IRR bit from the vIO-APIC pin, which
> > > will cause further interrupts to not be delivered anyway until a
> > > proper EOI (or a switch to trigger mode) is done to the pin.
> > >
> > > I think it's going to be complicated for me to find a system that has
> > > two devices I can passthrough sharing the same GSI.
> >
> > I have some laptops running OpenXT where the USB controller and NIC
> > share an interrupt, and I assign them to different domains.  Qubes
> > would hit this as well.
>
> Is there any chance you could try the patch and see if you can hit the
> issue it was trying to fix?

Would testing a backport to 4.12 be useful?  There were some file
renames, but it looks to apply.  The only difference is the 4.12
hvm_pirq_eoi hunk still has `(ent && ent->fields.mask) || `.  Maybe
backport commit eb298f32fac5ac "x86/dpci: EOI interrupt regardless of
its masking status" as well?

Testing staging would take a little longer to make a machine available.

I guess I'd also need to disable MSI for the two devices to ensure
they are both using the GSI?

Regards,
Jason
Jason Andryuk Jan. 13, 2021, 9:31 p.m. UTC | #6
On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 1:06 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:

<snip>

> > > I have some laptops running OpenXT where the USB controller and NIC
> > > share an interrupt, and I assign them to different domains.  Qubes
> > > would hit this as well.
> >
> > Is there any chance you could try the patch and see if you can hit the
> > issue it was trying to fix?
>
> Would testing a backport to 4.12 be useful?  There were some file
> renames, but it looks to apply.  The only difference is the 4.12
> hvm_pirq_eoi hunk still has `(ent && ent->fields.mask) || `.  Maybe
> backport commit eb298f32fac5ac "x86/dpci: EOI interrupt regardless of
> its masking status" as well?

Ok, I added these two patches to OpenXT with Xen 4.12.

> Testing staging would take a little longer to make a machine available.
>
> I guess I'd also need to disable MSI for the two devices to ensure
> they are both using the GSI?

lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
were started with pci=nosmi.  Networking through the iwlwifi device
works for that VM while a mouse attached to the xhci controller is
also working in the second VM.  Is there something else I should test?

Regards,
Jason
Roger Pau Monné Jan. 14, 2021, 10:22 a.m. UTC | #7
On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 1:06 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:
> 
> <snip>
> 
> > > > I have some laptops running OpenXT where the USB controller and NIC
> > > > share an interrupt, and I assign them to different domains.  Qubes
> > > > would hit this as well.
> > >
> > > Is there any chance you could try the patch and see if you can hit the
> > > issue it was trying to fix?
> >
> > Would testing a backport to 4.12 be useful?  There were some file
> > renames, but it looks to apply.  The only difference is the 4.12
> > hvm_pirq_eoi hunk still has `(ent && ent->fields.mask) || `.  Maybe
> > backport commit eb298f32fac5ac "x86/dpci: EOI interrupt regardless of
> > its masking status" as well?

Yes, backporting that one should be harmless.

> Ok, I added these two patches to OpenXT with Xen 4.12.

Thanks!

> > Testing staging would take a little longer to make a machine available.
> >
> > I guess I'd also need to disable MSI for the two devices to ensure
> > they are both using the GSI?
> 
> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
> were started with pci=nosmi.  Networking through the iwlwifi device
> works for that VM while a mouse attached to the xhci controller is
> also working in the second VM.  Is there something else I should test?

Not really, I think that test should be good enough, the issue is that
we don't know which OS was seeing the issues noted by Kevin.

To make sure you trigger the right scenario, could you start the
iwlwifi HVM guest first, and then stress test the network device
(using iperf for example) while you bring up the second VM that uses
the xhci device?

Thanks, Roger.
Jan Beulich Jan. 14, 2021, 11:42 a.m. UTC | #8
On 13.01.2021 22:31, Jason Andryuk wrote:
> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>> I guess I'd also need to disable MSI for the two devices to ensure
>> they are both using the GSI?
> 
> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
> were started with pci=nosmi.

Just to be sure (because "smi" isn't an entirely meaningless
acronym): You did actually test with "pci=nomsi"?

Jan
Jan Beulich Jan. 14, 2021, 11:45 a.m. UTC | #9
On 14.01.2021 11:22, Roger Pau Monné wrote:
> On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
>> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>>> I guess I'd also need to disable MSI for the two devices to ensure
>>> they are both using the GSI?
>>
>> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
>> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
>> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
>> were started with pci=nosmi.  Networking through the iwlwifi device
>> works for that VM while a mouse attached to the xhci controller is
>> also working in the second VM.  Is there something else I should test?
> 
> Not really, I think that test should be good enough, the issue is that
> we don't know which OS was seeing the issues noted by Kevin.

Why a specific OS? Isn't this also guarding against malice?

Jan
Jan Beulich Jan. 14, 2021, 11:48 a.m. UTC | #10
On 13.01.2021 14:11, Roger Pau Monné wrote:
> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>> As with previous patches, I'm having a hard time figuring out why this
>>> was required in the first place. I see no reason for Xen to be
>>> deasserting the guest virtual line. There's a comment:
>>>
>>> /*
>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>  * guest will never deal with the irq, then the physical interrupt line
>>>  * will never be deasserted.
>>>  */
>>>
>>> Did this happen because the device was passed through in a bogus state
>>> where it would generate interrupts without the guest requesting
>>
>> It could be a case where two devices share the same interrupt line and
>> are assigned to different domains. In this case, the interrupt activity of 
>> two devices interfere with each other.
> 
> This would also seem to be problematic if the device decides to use
> MSI instead of INTx, but due to the shared nature of the INTx line we
> would continue to inject INTx (generated from another device not
> passed through to the guest) to the guest even if the device has
> switched to MSI.

I'm having trouble with this: How does the INTx line matter when
a device is using MSI? I don't see why we should inject INTx when
the guest has configured a device for MSI; if we do, this would
indeed look like a bug (but aiui we bind either the MSI IRQ or
the pin based one, but not both).

Jan
Jan Beulich Jan. 14, 2021, 11:54 a.m. UTC | #11
On 12.01.2021 18:32, Roger Pau Monne wrote:
> @@ -967,10 +879,10 @@ static void hvm_pirq_eoi(struct pirq *pirq)
>       * since interrupt is still not EOIed
>       */
>      if ( --pirq_dpci->pending ||
> -         !pt_irq_need_timer(pirq_dpci->flags) )
> +         /* When the interrupt source is MSI no Ack should be performed. */
> +         pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE )

If we settle on this timer being possible to drop, then there's
just one cosmetic issue here (which can be fixed while committing
I suppose) - there is a pair of parentheses missing here.

Otherwise we will want to at least keep

> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -184,11 +184,6 @@ int pt_irq_destroy_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
>  void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
>  struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
>  void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
> -#ifdef CONFIG_HVM
> -bool pt_irq_need_timer(uint32_t flags);
> -#else
> -static inline bool pt_irq_need_timer(unsigned int flags) { return false; }
> -#endif

along with making the function static in its source file.

Jan
Andrew Cooper Jan. 14, 2021, 11:56 a.m. UTC | #12
On 14/01/2021 11:48, Jan Beulich wrote:
> On 13.01.2021 14:11, Roger Pau Monné wrote:
>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>> As with previous patches, I'm having a hard time figuring out why this
>>>> was required in the first place. I see no reason for Xen to be
>>>> deasserting the guest virtual line. There's a comment:
>>>>
>>>> /*
>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>>  * guest will never deal with the irq, then the physical interrupt line
>>>>  * will never be deasserted.
>>>>  */
>>>>
>>>> Did this happen because the device was passed through in a bogus state
>>>> where it would generate interrupts without the guest requesting
>>> It could be a case where two devices share the same interrupt line and
>>> are assigned to different domains. In this case, the interrupt activity of 
>>> two devices interfere with each other.
>> This would also seem to be problematic if the device decides to use
>> MSI instead of INTx, but due to the shared nature of the INTx line we
>> would continue to inject INTx (generated from another device not
>> passed through to the guest) to the guest even if the device has
>> switched to MSI.
> I'm having trouble with this: How does the INTx line matter when
> a device is using MSI? I don't see why we should inject INTx when
> the guest has configured a device for MSI; if we do, this would
> indeed look like a bug (but aiui we bind either the MSI IRQ or
> the pin based one, but not both).

When MSI is configured, a spec-complient device shouldn't raise INTx. 
But there are plenty of quirks in practice, and ample opportunity for
races, considering that in a Xen system, there are at least 3 entities
in the system fighting over control of the device.

I suspect the problem is "what happens when Xen gets an INTx".  We don't
know which device it came from, and therefore we can't even attempt to
filter out the devices we suspect are using MSI, in an attempt to avoid
raising the line with every domain.

~Andrew
Jan Beulich Jan. 14, 2021, 12:12 p.m. UTC | #13
On 14.01.2021 12:56, Andrew Cooper wrote:
> On 14/01/2021 11:48, Jan Beulich wrote:
>> On 13.01.2021 14:11, Roger Pau Monné wrote:
>>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>> As with previous patches, I'm having a hard time figuring out why this
>>>>> was required in the first place. I see no reason for Xen to be
>>>>> deasserting the guest virtual line. There's a comment:
>>>>>
>>>>> /*
>>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>>>  * guest will never deal with the irq, then the physical interrupt line
>>>>>  * will never be deasserted.
>>>>>  */
>>>>>
>>>>> Did this happen because the device was passed through in a bogus state
>>>>> where it would generate interrupts without the guest requesting
>>>> It could be a case where two devices share the same interrupt line and
>>>> are assigned to different domains. In this case, the interrupt activity of 
>>>> two devices interfere with each other.
>>> This would also seem to be problematic if the device decides to use
>>> MSI instead of INTx, but due to the shared nature of the INTx line we
>>> would continue to inject INTx (generated from another device not
>>> passed through to the guest) to the guest even if the device has
>>> switched to MSI.
>> I'm having trouble with this: How does the INTx line matter when
>> a device is using MSI? I don't see why we should inject INTx when
>> the guest has configured a device for MSI; if we do, this would
>> indeed look like a bug (but aiui we bind either the MSI IRQ or
>> the pin based one, but not both).
> 
> When MSI is configured, a spec-complient device shouldn't raise INTx. 
> But there are plenty of quirks in practice, and ample opportunity for
> races, considering that in a Xen system, there are at least 3 entities
> in the system fighting over control of the device.
> 
> I suspect the problem is "what happens when Xen gets an INTx".  We don't
> know which device it came from, and therefore we can't even attempt to
> filter out the devices we suspect are using MSI, in an attempt to avoid
> raising the line with every domain.

The domain using MSI won't have the INTx IRQ bound, so won't be
notified of the INTx instance at all. What is likely to happen
is that the other domain(s) sharing the same INTx IRQ would get
notified, but won't find anything to do. Which in turn may lead
to the EOI timer getting engaged, and hence IRQs over this line
getting throttled.

Jan
Roger Pau Monné Jan. 14, 2021, 12:33 p.m. UTC | #14
On Thu, Jan 14, 2021 at 12:45:27PM +0100, Jan Beulich wrote:
> On 14.01.2021 11:22, Roger Pau Monné wrote:
> > On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
> >> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> >>> I guess I'd also need to disable MSI for the two devices to ensure
> >>> they are both using the GSI?
> >>
> >> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> >> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> >> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
> >> were started with pci=nosmi.  Networking through the iwlwifi device
> >> works for that VM while a mouse attached to the xhci controller is
> >> also working in the second VM.  Is there something else I should test?
> > 
> > Not really, I think that test should be good enough, the issue is that
> > we don't know which OS was seeing the issues noted by Kevin.
> 
> Why a specific OS? Isn't this also guarding against malice?

No, I don't think this protects against any kind of malice (at least
that I can think of). It just deasserts the guest virtual line
periodically if the guest itself hasn't done so. It will also attempt
to EOI the physical interrupt, but that's already done by the
eoi_timer in irq_guest_action_t (which would be the part that protects
against malice IMO).

It's my understanding that according to what Kevin pointed out this
was done because when sharing a pirq amongst different guests a guest
can get interrupts delivered before it has properly setup the device,
and not deasserting those by Xen would get the guest into some kind of
stuck state, where it's not deasserting the line for itself.

TBH I'm still trying to figure out how that scenario would look like,
and why would just deasserting the line fix it. On the vIO-APIC case
you would need to forcefully clean the IRR bit in order to receive
further interrupts on that pin, so maybe the issue is that switching a
vIO-APIC pin from level to trigger mode (which clears the IRR bit)
should also deassert the line?

Thanks, Roger.
Roger Pau Monné Jan. 14, 2021, 12:54 p.m. UTC | #15
On Thu, Jan 14, 2021 at 12:48:59PM +0100, Jan Beulich wrote:
> On 13.01.2021 14:11, Roger Pau Monné wrote:
> > On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> >>> From: Roger Pau Monne <roger.pau@citrix.com>
> >>> As with previous patches, I'm having a hard time figuring out why this
> >>> was required in the first place. I see no reason for Xen to be
> >>> deasserting the guest virtual line. There's a comment:
> >>>
> >>> /*
> >>>  * Set a timer to see if the guest can finish the interrupt or not. For
> >>>  * example, the guest OS may unmask the PIC during boot, before the
> >>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> >>>  * guest will never deal with the irq, then the physical interrupt line
> >>>  * will never be deasserted.
> >>>  */
> >>>
> >>> Did this happen because the device was passed through in a bogus state
> >>> where it would generate interrupts without the guest requesting
> >>
> >> It could be a case where two devices share the same interrupt line and
> >> are assigned to different domains. In this case, the interrupt activity of 
> >> two devices interfere with each other.
> > 
> > This would also seem to be problematic if the device decides to use
> > MSI instead of INTx, but due to the shared nature of the INTx line we
> > would continue to inject INTx (generated from another device not
> > passed through to the guest) to the guest even if the device has
> > switched to MSI.
> 
> I'm having trouble with this: How does the INTx line matter when
> a device is using MSI? I don't see why we should inject INTx when
> the guest has configured a device for MSI; if we do, this would
> indeed look like a bug (but aiui we bind either the MSI IRQ or
> the pin based one, but not both).

If the IRQ is shared between multiple devices Xen could continue to
receive interrupts on that IRQ, and thus inject them to the guest?
Even if the device passed through to that specific guest has switched
to use MSI.

Maybe I'm missing something, but I don't see QEMU removing the INTx
PIRQ binding when MSI(-X) is enabled for a guest device passed
through.

Thanks, Roger.
Roger Pau Monné Jan. 14, 2021, 12:58 p.m. UTC | #16
On Thu, Jan 14, 2021 at 01:12:00PM +0100, Jan Beulich wrote:
> On 14.01.2021 12:56, Andrew Cooper wrote:
> > On 14/01/2021 11:48, Jan Beulich wrote:
> >> On 13.01.2021 14:11, Roger Pau Monné wrote:
> >>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> >>>>> From: Roger Pau Monne <roger.pau@citrix.com>
> >>>>> As with previous patches, I'm having a hard time figuring out why this
> >>>>> was required in the first place. I see no reason for Xen to be
> >>>>> deasserting the guest virtual line. There's a comment:
> >>>>>
> >>>>> /*
> >>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
> >>>>>  * example, the guest OS may unmask the PIC during boot, before the
> >>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> >>>>>  * guest will never deal with the irq, then the physical interrupt line
> >>>>>  * will never be deasserted.
> >>>>>  */
> >>>>>
> >>>>> Did this happen because the device was passed through in a bogus state
> >>>>> where it would generate interrupts without the guest requesting
> >>>> It could be a case where two devices share the same interrupt line and
> >>>> are assigned to different domains. In this case, the interrupt activity of 
> >>>> two devices interfere with each other.
> >>> This would also seem to be problematic if the device decides to use
> >>> MSI instead of INTx, but due to the shared nature of the INTx line we
> >>> would continue to inject INTx (generated from another device not
> >>> passed through to the guest) to the guest even if the device has
> >>> switched to MSI.
> >> I'm having trouble with this: How does the INTx line matter when
> >> a device is using MSI? I don't see why we should inject INTx when
> >> the guest has configured a device for MSI; if we do, this would
> >> indeed look like a bug (but aiui we bind either the MSI IRQ or
> >> the pin based one, but not both).
> > 
> > When MSI is configured, a spec-complient device shouldn't raise INTx. 
> > But there are plenty of quirks in practice, and ample opportunity for
> > races, considering that in a Xen system, there are at least 3 entities
> > in the system fighting over control of the device.
> > 
> > I suspect the problem is "what happens when Xen gets an INTx".  We don't
> > know which device it came from, and therefore we can't even attempt to
> > filter out the devices we suspect are using MSI, in an attempt to avoid
> > raising the line with every domain.
> 
> The domain using MSI won't have the INTx IRQ bound, so won't be
> notified of the INTx instance at all.

Oh, so that's the bit I was missing. So we do actually remove the INTx
binding when a guest enables MSI on a passed through device?

Can you point me at when this is done, I don't seem to be able to spot
it.

Thanks, Roger.
Jan Beulich Jan. 14, 2021, 1:30 p.m. UTC | #17
On 14.01.2021 13:54, Roger Pau Monné wrote:
> On Thu, Jan 14, 2021 at 12:48:59PM +0100, Jan Beulich wrote:
>> On 13.01.2021 14:11, Roger Pau Monné wrote:
>>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>> As with previous patches, I'm having a hard time figuring out why this
>>>>> was required in the first place. I see no reason for Xen to be
>>>>> deasserting the guest virtual line. There's a comment:
>>>>>
>>>>> /*
>>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>>>  * guest will never deal with the irq, then the physical interrupt line
>>>>>  * will never be deasserted.
>>>>>  */
>>>>>
>>>>> Did this happen because the device was passed through in a bogus state
>>>>> where it would generate interrupts without the guest requesting
>>>>
>>>> It could be a case where two devices share the same interrupt line and
>>>> are assigned to different domains. In this case, the interrupt activity of 
>>>> two devices interfere with each other.
>>>
>>> This would also seem to be problematic if the device decides to use
>>> MSI instead of INTx, but due to the shared nature of the INTx line we
>>> would continue to inject INTx (generated from another device not
>>> passed through to the guest) to the guest even if the device has
>>> switched to MSI.
>>
>> I'm having trouble with this: How does the INTx line matter when
>> a device is using MSI? I don't see why we should inject INTx when
>> the guest has configured a device for MSI; if we do, this would
>> indeed look like a bug (but aiui we bind either the MSI IRQ or
>> the pin based one, but not both).
> 
> If the IRQ is shared between multiple devices Xen could continue to
> receive interrupts on that IRQ, and thus inject them to the guest?
> Even if the device passed through to that specific guest has switched
> to use MSI.
> 
> Maybe I'm missing something, but I don't see QEMU removing the INTx
> PIRQ binding when MSI(-X) is enabled for a guest device passed
> through.

This would then match my "would indeed look like a bug". I don't see
why this mapping would need keeping once MSI is in use. (In fact it
probably shouldn't be the act of enabling MSI where this gets
removed, but the setting of PCI_COMMAND_INTX_DISABLE by the guest.)

Jan
Jan Beulich Jan. 14, 2021, 1:32 p.m. UTC | #18
On 14.01.2021 13:58, Roger Pau Monné wrote:
> On Thu, Jan 14, 2021 at 01:12:00PM +0100, Jan Beulich wrote:
>> On 14.01.2021 12:56, Andrew Cooper wrote:
>>> On 14/01/2021 11:48, Jan Beulich wrote:
>>>> On 13.01.2021 14:11, Roger Pau Monné wrote:
>>>>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>>>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>>>> As with previous patches, I'm having a hard time figuring out why this
>>>>>>> was required in the first place. I see no reason for Xen to be
>>>>>>> deasserting the guest virtual line. There's a comment:
>>>>>>>
>>>>>>> /*
>>>>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>>>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>>>>>  * guest will never deal with the irq, then the physical interrupt line
>>>>>>>  * will never be deasserted.
>>>>>>>  */
>>>>>>>
>>>>>>> Did this happen because the device was passed through in a bogus state
>>>>>>> where it would generate interrupts without the guest requesting
>>>>>> It could be a case where two devices share the same interrupt line and
>>>>>> are assigned to different domains. In this case, the interrupt activity of 
>>>>>> two devices interfere with each other.
>>>>> This would also seem to be problematic if the device decides to use
>>>>> MSI instead of INTx, but due to the shared nature of the INTx line we
>>>>> would continue to inject INTx (generated from another device not
>>>>> passed through to the guest) to the guest even if the device has
>>>>> switched to MSI.
>>>> I'm having trouble with this: How does the INTx line matter when
>>>> a device is using MSI? I don't see why we should inject INTx when
>>>> the guest has configured a device for MSI; if we do, this would
>>>> indeed look like a bug (but aiui we bind either the MSI IRQ or
>>>> the pin based one, but not both).
>>>
>>> When MSI is configured, a spec-complient device shouldn't raise INTx. 
>>> But there are plenty of quirks in practice, and ample opportunity for
>>> races, considering that in a Xen system, there are at least 3 entities
>>> in the system fighting over control of the device.
>>>
>>> I suspect the problem is "what happens when Xen gets an INTx".  We don't
>>> know which device it came from, and therefore we can't even attempt to
>>> filter out the devices we suspect are using MSI, in an attempt to avoid
>>> raising the line with every domain.
>>
>> The domain using MSI won't have the INTx IRQ bound, so won't be
>> notified of the INTx instance at all.
> 
> Oh, so that's the bit I was missing. So we do actually remove the INTx
> binding when a guest enables MSI on a passed through device?
> 
> Can you point me at when this is done, I don't seem to be able to spot
> it.

I would better have written "shouldn't" rather than "won't" (1st
instance, "wouldn't" for the 2nd). See also the other reply just
sent.

Jan
Jan Beulich Jan. 14, 2021, 1:41 p.m. UTC | #19
On 14.01.2021 13:33, Roger Pau Monné wrote:
> On Thu, Jan 14, 2021 at 12:45:27PM +0100, Jan Beulich wrote:
>> On 14.01.2021 11:22, Roger Pau Monné wrote:
>>> On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
>>>> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>>>>> I guess I'd also need to disable MSI for the two devices to ensure
>>>>> they are both using the GSI?
>>>>
>>>> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
>>>> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
>>>> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
>>>> were started with pci=nosmi.  Networking through the iwlwifi device
>>>> works for that VM while a mouse attached to the xhci controller is
>>>> also working in the second VM.  Is there something else I should test?
>>>
>>> Not really, I think that test should be good enough, the issue is that
>>> we don't know which OS was seeing the issues noted by Kevin.
>>
>> Why a specific OS? Isn't this also guarding against malice?
> 
> No, I don't think this protects against any kind of malice (at least
> that I can think of). It just deasserts the guest virtual line
> periodically if the guest itself hasn't done so. It will also attempt
> to EOI the physical interrupt, but that's already done by the
> eoi_timer in irq_guest_action_t (which would be the part that protects
> against malice IMO).

Hmm, yes, there's certainly some overlap. And indeed the EOI
timer is set 1ms in the future, while the timer here allows
for 8ms to pass before taking action.

What I'm uncertain about is the interaction between both: It
would seem to me that the pirq_guest_eoi() invocation from
here could undermine the purpose of the EOI timer. In which
case it would in fact be a win to get rid of this timer here.

> It's my understanding that according to what Kevin pointed out this
> was done because when sharing a pirq amongst different guests a guest
> can get interrupts delivered before it has properly setup the device,
> and not deasserting those by Xen would get the guest into some kind of
> stuck state, where it's not deasserting the line for itself.
> 
> TBH I'm still trying to figure out how that scenario would look like,
> and why would just deasserting the line fix it. On the vIO-APIC case
> you would need to forcefully clean the IRR bit in order to receive
> further interrupts on that pin, so maybe the issue is that switching a
> vIO-APIC pin from level to trigger mode (which clears the IRR bit)
> should also deassert the line?

I suppose this was directed at Kevin - I'm struggling as well.

Jan
Roger Pau Monné Jan. 14, 2021, 5:20 p.m. UTC | #20
On Thu, Jan 14, 2021 at 02:41:29PM +0100, Jan Beulich wrote:
> On 14.01.2021 13:33, Roger Pau Monné wrote:
> > On Thu, Jan 14, 2021 at 12:45:27PM +0100, Jan Beulich wrote:
> >> On 14.01.2021 11:22, Roger Pau Monné wrote:
> >>> On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
> >>>> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> >>>>> I guess I'd also need to disable MSI for the two devices to ensure
> >>>>> they are both using the GSI?
> >>>>
> >>>> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> >>>> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> >>>> The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
> >>>> were started with pci=nosmi.  Networking through the iwlwifi device
> >>>> works for that VM while a mouse attached to the xhci controller is
> >>>> also working in the second VM.  Is there something else I should test?
> >>>
> >>> Not really, I think that test should be good enough, the issue is that
> >>> we don't know which OS was seeing the issues noted by Kevin.
> >>
> >> Why a specific OS? Isn't this also guarding against malice?
> > 
> > No, I don't think this protects against any kind of malice (at least
> > that I can think of). It just deasserts the guest virtual line
> > periodically if the guest itself hasn't done so. It will also attempt
> > to EOI the physical interrupt, but that's already done by the
> > eoi_timer in irq_guest_action_t (which would be the part that protects
> > against malice IMO).
> 
> Hmm, yes, there's certainly some overlap. And indeed the EOI
> timer is set 1ms in the future, while the timer here allows
> for 8ms to pass before taking action.
> 
> What I'm uncertain about is the interaction between both: It
> would seem to me that the pirq_guest_eoi() invocation from
> here could undermine the purpose of the EOI timer. In which
> case it would in fact be a win to get rid of this timer here.

It's not clear to me either. In any case having two timers for the
same irq also seems like a waste of resources.

> > It's my understanding that according to what Kevin pointed out this
> > was done because when sharing a pirq amongst different guests a guest
> > can get interrupts delivered before it has properly setup the device,
> > and not deasserting those by Xen would get the guest into some kind of
> > stuck state, where it's not deasserting the line for itself.
> > 
> > TBH I'm still trying to figure out how that scenario would look like,
> > and why would just deasserting the line fix it. On the vIO-APIC case
> > you would need to forcefully clean the IRR bit in order to receive
> > further interrupts on that pin, so maybe the issue is that switching a
> > vIO-APIC pin from level to trigger mode (which clears the IRR bit)
> > should also deassert the line?
> 
> I suppose this was directed at Kevin - I'm struggling as well.

Right, was a question for anyone who might know the answer really. I
think I will prepare some more stuff to try to clean this up. Let's
see if Kevin has some input.

Thanks, Roger.
Roger Pau Monné Jan. 14, 2021, 5:22 p.m. UTC | #21
On Thu, Jan 14, 2021 at 02:30:15PM +0100, Jan Beulich wrote:
> On 14.01.2021 13:54, Roger Pau Monné wrote:
> > On Thu, Jan 14, 2021 at 12:48:59PM +0100, Jan Beulich wrote:
> >> On 13.01.2021 14:11, Roger Pau Monné wrote:
> >>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> >>>>> From: Roger Pau Monne <roger.pau@citrix.com>
> >>>>> As with previous patches, I'm having a hard time figuring out why this
> >>>>> was required in the first place. I see no reason for Xen to be
> >>>>> deasserting the guest virtual line. There's a comment:
> >>>>>
> >>>>> /*
> >>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
> >>>>>  * example, the guest OS may unmask the PIC during boot, before the
> >>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> >>>>>  * guest will never deal with the irq, then the physical interrupt line
> >>>>>  * will never be deasserted.
> >>>>>  */
> >>>>>
> >>>>> Did this happen because the device was passed through in a bogus state
> >>>>> where it would generate interrupts without the guest requesting
> >>>>
> >>>> It could be a case where two devices share the same interrupt line and
> >>>> are assigned to different domains. In this case, the interrupt activity of 
> >>>> two devices interfere with each other.
> >>>
> >>> This would also seem to be problematic if the device decides to use
> >>> MSI instead of INTx, but due to the shared nature of the INTx line we
> >>> would continue to inject INTx (generated from another device not
> >>> passed through to the guest) to the guest even if the device has
> >>> switched to MSI.
> >>
> >> I'm having trouble with this: How does the INTx line matter when
> >> a device is using MSI? I don't see why we should inject INTx when
> >> the guest has configured a device for MSI; if we do, this would
> >> indeed look like a bug (but aiui we bind either the MSI IRQ or
> >> the pin based one, but not both).
> > 
> > If the IRQ is shared between multiple devices Xen could continue to
> > receive interrupts on that IRQ, and thus inject them to the guest?
> > Even if the device passed through to that specific guest has switched
> > to use MSI.
> > 
> > Maybe I'm missing something, but I don't see QEMU removing the INTx
> > PIRQ binding when MSI(-X) is enabled for a guest device passed
> > through.
> 
> This would then match my "would indeed look like a bug". I don't see
> why this mapping would need keeping once MSI is in use. (In fact it
> probably shouldn't be the act of enabling MSI where this gets
> removed, but the setting of PCI_COMMAND_INTX_DISABLE by the guest.)

We would also need to assert that no other passthrough device on the
guest is using that same irq bounding, or else we would wedge
interrupts for that other device.

Thanks, Roger.
Jan Beulich Jan. 15, 2021, 10:24 a.m. UTC | #22
On 14.01.2021 18:22, Roger Pau Monné wrote:
> On Thu, Jan 14, 2021 at 02:30:15PM +0100, Jan Beulich wrote:
>> On 14.01.2021 13:54, Roger Pau Monné wrote:
>>> On Thu, Jan 14, 2021 at 12:48:59PM +0100, Jan Beulich wrote:
>>>> On 13.01.2021 14:11, Roger Pau Monné wrote:
>>>>> On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
>>>>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>>>> As with previous patches, I'm having a hard time figuring out why this
>>>>>>> was required in the first place. I see no reason for Xen to be
>>>>>>> deasserting the guest virtual line. There's a comment:
>>>>>>>
>>>>>>> /*
>>>>>>>  * Set a timer to see if the guest can finish the interrupt or not. For
>>>>>>>  * example, the guest OS may unmask the PIC during boot, before the
>>>>>>>  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
>>>>>>>  * guest will never deal with the irq, then the physical interrupt line
>>>>>>>  * will never be deasserted.
>>>>>>>  */
>>>>>>>
>>>>>>> Did this happen because the device was passed through in a bogus state
>>>>>>> where it would generate interrupts without the guest requesting
>>>>>>
>>>>>> It could be a case where two devices share the same interrupt line and
>>>>>> are assigned to different domains. In this case, the interrupt activity of 
>>>>>> two devices interfere with each other.
>>>>>
>>>>> This would also seem to be problematic if the device decides to use
>>>>> MSI instead of INTx, but due to the shared nature of the INTx line we
>>>>> would continue to inject INTx (generated from another device not
>>>>> passed through to the guest) to the guest even if the device has
>>>>> switched to MSI.
>>>>
>>>> I'm having trouble with this: How does the INTx line matter when
>>>> a device is using MSI? I don't see why we should inject INTx when
>>>> the guest has configured a device for MSI; if we do, this would
>>>> indeed look like a bug (but aiui we bind either the MSI IRQ or
>>>> the pin based one, but not both).
>>>
>>> If the IRQ is shared between multiple devices Xen could continue to
>>> receive interrupts on that IRQ, and thus inject them to the guest?
>>> Even if the device passed through to that specific guest has switched
>>> to use MSI.
>>>
>>> Maybe I'm missing something, but I don't see QEMU removing the INTx
>>> PIRQ binding when MSI(-X) is enabled for a guest device passed
>>> through.
>>
>> This would then match my "would indeed look like a bug". I don't see
>> why this mapping would need keeping once MSI is in use. (In fact it
>> probably shouldn't be the act of enabling MSI where this gets
>> removed, but the setting of PCI_COMMAND_INTX_DISABLE by the guest.)
> 
> We would also need to assert that no other passthrough device on the
> guest is using that same irq bounding, or else we would wedge
> interrupts for that other device.

Oh, right - this sadly is one of the various things that aren't
properly ref-counted.

Jan
Roger Pau Monné Jan. 15, 2021, 1:19 p.m. UTC | #23
On Thu, Jan 14, 2021 at 12:54:22PM +0100, Jan Beulich wrote:
> On 12.01.2021 18:32, Roger Pau Monne wrote:
> > @@ -967,10 +879,10 @@ static void hvm_pirq_eoi(struct pirq *pirq)
> >       * since interrupt is still not EOIed
> >       */
> >      if ( --pirq_dpci->pending ||
> > -         !pt_irq_need_timer(pirq_dpci->flags) )
> > +         /* When the interrupt source is MSI no Ack should be performed. */
> > +         pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE )
> 
> If we settle on this timer being possible to drop, then there's
> just one cosmetic issue here (which can be fixed while committing
> I suppose) - there is a pair of parentheses missing here.

I will be sending an updated version of this patch in a longer series.
I've added the parentheses instead of keeping pt_irq_need_timer, as I
would also like to get rid of HVM_IRQ_DPCI_TRANSLATE at some point and
then the change here is straightforward to perform.

Thanks, Roger.
Jason Andryuk Jan. 15, 2021, 1:40 p.m. UTC | #24
On Thu, Jan 14, 2021 at 5:22 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
> > On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk <jandryuk@gmail.com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 1:06 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:
> >
> > <snip>
> >
> > > > > I have some laptops running OpenXT where the USB controller and NIC
> > > > > share an interrupt, and I assign them to different domains.  Qubes
> > > > > would hit this as well.
> > > >
> > > > Is there any chance you could try the patch and see if you can hit the
> > > > issue it was trying to fix?
> > >
> > > Would testing a backport to 4.12 be useful?  There were some file
> > > renames, but it looks to apply.  The only difference is the 4.12
> > > hvm_pirq_eoi hunk still has `(ent && ent->fields.mask) || `.  Maybe
> > > backport commit eb298f32fac5ac "x86/dpci: EOI interrupt regardless of
> > > its masking status" as well?
>
> Yes, backporting that one should be harmless.
>
> > Ok, I added these two patches to OpenXT with Xen 4.12.
>
> Thanks!
>
> > > Testing staging would take a little longer to make a machine available.
> > >
> > > I guess I'd also need to disable MSI for the two devices to ensure
> > > they are both using the GSI?
> >
> > lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> > devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> > The two linux HVMs run with (PV) linux stubdoms, and the HVM kernels
> > were started with pci=nosmi.  Networking through the iwlwifi device

Jan, the VM kernel command lines had pci=nomsi and the typo was only
in the email.

> > works for that VM while a mouse attached to the xhci controller is
> > also working in the second VM.  Is there something else I should test?
>
> Not really, I think that test should be good enough, the issue is that
> we don't know which OS was seeing the issues noted by Kevin.
>
> To make sure you trigger the right scenario, could you start the
> iwlwifi HVM guest first, and then stress test the network device
> (using iperf for example) while you bring up the second VM that uses
> the xhci device?

I ran `cat /dev/zero | ssh server 'cat > /dev/null'` which generates
~1000 interrupts per second.

The second VM starts up ok.  Both are Linux 5.4.x, so not a version
Kevin would have tested :)

Regards,
Jason
Tian, Kevin Jan. 19, 2021, 2:59 a.m. UTC | #25
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Friday, January 15, 2021 1:21 AM
> 
> On Thu, Jan 14, 2021 at 02:41:29PM +0100, Jan Beulich wrote:
> > On 14.01.2021 13:33, Roger Pau Monné wrote:
> > > On Thu, Jan 14, 2021 at 12:45:27PM +0100, Jan Beulich wrote:
> > >> On 14.01.2021 11:22, Roger Pau Monné wrote:
> > >>> On Wed, Jan 13, 2021 at 04:31:33PM -0500, Jason Andryuk wrote:
> > >>>> On Wed, Jan 13, 2021 at 1:34 PM Jason Andryuk
> <jandryuk@gmail.com> wrote:
> > >>>>> I guess I'd also need to disable MSI for the two devices to ensure
> > >>>>> they are both using the GSI?
> > >>>>
> > >>>> lspci in dom0 shows the USB xhci controller, iwlwifi, and e1000e
> > >>>> devices all with IRQ 16 and all with MSI disabled ("MSI: Enabled-").
> > >>>> The two linux HVMs run with (PV) linux stubdoms, and the HVM
> kernels
> > >>>> were started with pci=nosmi.  Networking through the iwlwifi device
> > >>>> works for that VM while a mouse attached to the xhci controller is
> > >>>> also working in the second VM.  Is there something else I should test?
> > >>>
> > >>> Not really, I think that test should be good enough, the issue is that
> > >>> we don't know which OS was seeing the issues noted by Kevin.
> > >>
> > >> Why a specific OS? Isn't this also guarding against malice?
> > >
> > > No, I don't think this protects against any kind of malice (at least
> > > that I can think of). It just deasserts the guest virtual line
> > > periodically if the guest itself hasn't done so. It will also attempt
> > > to EOI the physical interrupt, but that's already done by the
> > > eoi_timer in irq_guest_action_t (which would be the part that protects
> > > against malice IMO).
> >
> > Hmm, yes, there's certainly some overlap. And indeed the EOI
> > timer is set 1ms in the future, while the timer here allows
> > for 8ms to pass before taking action.
> >
> > What I'm uncertain about is the interaction between both: It
> > would seem to me that the pirq_guest_eoi() invocation from
> > here could undermine the purpose of the EOI timer. In which
> > case it would in fact be a win to get rid of this timer here.
> 
> It's not clear to me either. In any case having two timers for the
> same irq also seems like a waste of resources.
> 
> > > It's my understanding that according to what Kevin pointed out this
> > > was done because when sharing a pirq amongst different guests a guest
> > > can get interrupts delivered before it has properly setup the device,
> > > and not deasserting those by Xen would get the guest into some kind of
> > > stuck state, where it's not deasserting the line for itself.
> > >
> > > TBH I'm still trying to figure out how that scenario would look like,
> > > and why would just deasserting the line fix it. On the vIO-APIC case
> > > you would need to forcefully clean the IRR bit in order to receive
> > > further interrupts on that pin, so maybe the issue is that switching a
> > > vIO-APIC pin from level to trigger mode (which clears the IRR bit)
> > > should also deassert the line?
> >
> > I suppose this was directed at Kevin - I'm struggling as well.
> 
> Right, was a question for anyone who might know the answer really. I
> think I will prepare some more stuff to try to clean this up. Let's
> see if Kevin has some input.
> 

Honestly speaking I don't have a good memory of this old stuff and
also cannot figure out a reasonable scenario at this moment. I think
we could just go cleaning it up and see what might jump out later.

Thanks
Kevin
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/vtd/x86/hvm.c b/xen/drivers/passthrough/vtd/x86/hvm.c
index f77b35815c..b531fe907a 100644
--- a/xen/drivers/passthrough/vtd/x86/hvm.c
+++ b/xen/drivers/passthrough/vtd/x86/hvm.c
@@ -36,10 +36,7 @@  static int _hvm_dpci_isairq_eoi(struct domain *d,
         {
             hvm_pci_intx_deassert(d, digl->device, digl->intx);
             if ( --pirq_dpci->pending == 0 )
-            {
-                stop_timer(&pirq_dpci->timer);
                 pirq_guest_eoi(dpci_pirq(pirq_dpci));
-            }
         }
     }
 
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index edc8059518..5d901db50c 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -136,77 +136,6 @@  static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
     pirq_dpci->masked = 0;
 }
 
-bool pt_irq_need_timer(uint32_t flags)
-{
-    return !(flags & (HVM_IRQ_DPCI_GUEST_MSI | HVM_IRQ_DPCI_TRANSLATE |
-                      HVM_IRQ_DPCI_NO_EOI));
-}
-
-static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
-                            void *arg)
-{
-    if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
-                              &pirq_dpci->flags) )
-    {
-        pirq_dpci->masked = 0;
-        pirq_dpci->pending = 0;
-        pirq_guest_eoi(dpci_pirq(pirq_dpci));
-    }
-
-    return 0;
-}
-
-static void pt_irq_time_out(void *data)
-{
-    struct hvm_pirq_dpci *irq_map = data;
-    const struct hvm_irq_dpci *dpci;
-    const struct dev_intx_gsi_link *digl;
-
-    spin_lock(&irq_map->dom->event_lock);
-
-    if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI )
-    {
-        ASSERT(is_hardware_domain(irq_map->dom));
-        /*
-         * Identity mapped, no need to iterate over the guest GSI list to find
-         * other pirqs sharing the same guest GSI.
-         *
-         * In the identity mapped case the EOI can also be done now, this way
-         * the iteration over the list of domain pirqs is avoided.
-         */
-        hvm_gsi_deassert(irq_map->dom, dpci_pirq(irq_map)->pirq);
-        irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
-        pt_irq_guest_eoi(irq_map->dom, irq_map, NULL);
-        spin_unlock(&irq_map->dom->event_lock);
-        return;
-    }
-
-    dpci = domain_get_irq_dpci(irq_map->dom);
-    if ( unlikely(!dpci) )
-    {
-        ASSERT_UNREACHABLE();
-        spin_unlock(&irq_map->dom->event_lock);
-        return;
-    }
-    list_for_each_entry ( digl, &irq_map->digl_list, list )
-    {
-        unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
-        const struct hvm_girq_dpci_mapping *girq;
-
-        list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
-        {
-            struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);
-
-            pirq_dpci(pirq)->flags |= HVM_IRQ_DPCI_EOI_LATCH;
-        }
-        hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
-    }
-
-    pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
-
-    spin_unlock(&irq_map->dom->event_lock);
-}
-
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *d)
 {
     if ( !d || !is_hvm_domain(d) )
@@ -568,15 +497,10 @@  int pt_irq_create_bind(
                 }
             }
 
-            /* Init timer before binding */
-            if ( pt_irq_need_timer(pirq_dpci->flags) )
-                init_timer(&pirq_dpci->timer, pt_irq_time_out, pirq_dpci, 0);
             /* Deal with gsi for legacy devices */
             rc = pirq_guest_bind(d->vcpu[0], info, share);
             if ( unlikely(rc) )
             {
-                if ( pt_irq_need_timer(pirq_dpci->flags) )
-                    kill_timer(&pirq_dpci->timer);
                 /*
                  * There is no path for __do_IRQ to schedule softirq as
                  * IRQ_GUEST is not set. As such we can reset 'dom' directly.
@@ -743,8 +667,6 @@  int pt_irq_destroy_bind(
     {
         pirq_guest_unbind(d, pirq);
         msixtbl_pt_unregister(d, pirq);
-        if ( pt_irq_need_timer(pirq_dpci->flags) )
-            kill_timer(&pirq_dpci->timer);
         pirq_dpci->flags = 0;
         /*
          * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
@@ -934,16 +856,6 @@  static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
             __msi_pirq_eoi(pirq_dpci);
             goto out;
         }
-
-        /*
-         * Set a timer to see if the guest can finish the interrupt or not. For
-         * example, the guest OS may unmask the PIC during boot, before the
-         * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
-         * guest will never deal with the irq, then the physical interrupt line
-         * will never be deasserted.
-         */
-        ASSERT(pt_irq_need_timer(pirq_dpci->flags));
-        set_timer(&pirq_dpci->timer, NOW() + PT_IRQ_TIME_OUT);
     }
 
  out:
@@ -967,10 +879,10 @@  static void hvm_pirq_eoi(struct pirq *pirq)
      * since interrupt is still not EOIed
      */
     if ( --pirq_dpci->pending ||
-         !pt_irq_need_timer(pirq_dpci->flags) )
+         /* When the interrupt source is MSI no Ack should be performed. */
+         pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE )
         return;
 
-    stop_timer(&pirq_dpci->timer);
     pirq_guest_eoi(pirq);
 }
 
@@ -1038,9 +950,6 @@  static int pci_clean_dpci_irq(struct domain *d,
 
     pirq_guest_unbind(d, dpci_pirq(pirq_dpci));
 
-    if ( pt_irq_need_timer(pirq_dpci->flags) )
-        kill_timer(&pirq_dpci->timer);
-
     list_for_each_entry_safe ( digl, tmp, &pirq_dpci->digl_list, list )
     {
         list_del(&digl->list);
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 532880d497..d40e13de6e 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -117,7 +117,6 @@  struct dev_intx_gsi_link {
 #define _HVM_IRQ_DPCI_MACH_PCI_SHIFT            0
 #define _HVM_IRQ_DPCI_MACH_MSI_SHIFT            1
 #define _HVM_IRQ_DPCI_MAPPED_SHIFT              2
-#define _HVM_IRQ_DPCI_EOI_LATCH_SHIFT           3
 #define _HVM_IRQ_DPCI_GUEST_PCI_SHIFT           4
 #define _HVM_IRQ_DPCI_GUEST_MSI_SHIFT           5
 #define _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT        6
@@ -126,7 +125,6 @@  struct dev_intx_gsi_link {
 #define HVM_IRQ_DPCI_MACH_PCI        (1u << _HVM_IRQ_DPCI_MACH_PCI_SHIFT)
 #define HVM_IRQ_DPCI_MACH_MSI        (1u << _HVM_IRQ_DPCI_MACH_MSI_SHIFT)
 #define HVM_IRQ_DPCI_MAPPED          (1u << _HVM_IRQ_DPCI_MAPPED_SHIFT)
-#define HVM_IRQ_DPCI_EOI_LATCH       (1u << _HVM_IRQ_DPCI_EOI_LATCH_SHIFT)
 #define HVM_IRQ_DPCI_GUEST_PCI       (1u << _HVM_IRQ_DPCI_GUEST_PCI_SHIFT)
 #define HVM_IRQ_DPCI_GUEST_MSI       (1u << _HVM_IRQ_DPCI_GUEST_MSI_SHIFT)
 #define HVM_IRQ_DPCI_IDENTITY_GSI    (1u << _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT)
@@ -173,7 +171,6 @@  struct hvm_pirq_dpci {
     struct list_head digl_list;
     struct domain *dom;
     struct hvm_gmsi_info gmsi;
-    struct timer timer;
     struct list_head softirq_list;
 };
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index f0295fd6c3..4f3098b6ed 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -184,11 +184,6 @@  int pt_irq_destroy_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
 void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
 void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
-#ifdef CONFIG_HVM
-bool pt_irq_need_timer(uint32_t flags);
-#else
-static inline bool pt_irq_need_timer(unsigned int flags) { return false; }
-#endif
 
 struct msi_desc;
 struct msi_msg;