[v5,4/6] xen/x86: Allow stubdom access to irq created for msi.
diff mbox series

Message ID bf64e46ec03145da1887cfff4c632c86784792f6.1563325215.git-series.marmarek@invisiblethingslab.com
State Superseded
Headers show
Series
  • Fix PCI passthrough for HVM with stubdomain
Related show

Commit Message

Marek Marczykowski-Górecki July 17, 2019, 1 a.m. UTC
Stubdomains need to be given sufficient privilege over the guest which it
provides emulation for in order for PCI passthrough to work correctly.
When a HVM domain try to enable MSI, QEMU in stubdomain calls
PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
part of xc_domain_update_msi_irq. Allow for that as part of
PHYSDEVOP_map_pirq.

This is not needed for PCI INTx, because IRQ in that case is known
beforehand and the stubdomain is given permissions over this IRQ by
libxl__device_pci_add (there's a do_pci_add against the stubdomain).

create_irq() already grant IRQ access to hardware_domain, with
assumption the device model (something managing this IRQ) lives there.
Modify create_irq() to take additional parameter pointing at device
model domain - which may be dom0 or stubdomain.  Save ID of the domain
given permission, to revoke it in destroy_irq() - easier and cleaner
than replaying logic of create_irq() parameter. Use domid instead of
actual reference to the domain, because it might get destroyed before
destroying IRQ (stubdomain is destroyed before its target domain). And
it is not an issue, because IRQ permissions live within domain
structure, so destroying a domain also implicitly revoke the permission.
Potential domid reuse is detected by by checking if that domain does
have permission over the IRQ being destroyed.

Then, adjust all callers to provide the parameter. In case of calls not
related to stubdomain-initiated allocations, give it either
hardware_domain (so the behavior is unchanged there), or NULL for
interrupts used by Xen internally.

Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet <chanudete@ainfosec.com>.

Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
Changes in v3:
 - extend commit message
Changes in v4:
 - add missing destroy_irq on error path
Changes in v5:
 - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
   basically make it a different patch
 - add get_dm_domain() helper
 - do not give hardware_domain permission over IRQs used in Xen
   internally
 - rename create_irq argument to just 'd', to avoid confusion
   when it's called by hardware domain
 - verify that device is de-assigned before pci_remove_device call
 - save ID of domain given permission in create_irq(), to revoke it in
 destroy_irq()
 - drop domain parameter from destroy_irq() and msi_free_irq()
 - do not give hardware domain permission over IRQ created in
 iommu_set_interrupt()
---
 xen/arch/x86/hpet.c                      |  3 +-
 xen/arch/x86/irq.c                       | 51 ++++++++++++++++++-------
 xen/common/irq.c                         |  1 +-
 xen/drivers/char/ns16550.c               |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/pci.c            |  3 +-
 xen/drivers/passthrough/vtd/iommu.c      |  3 +-
 xen/include/asm-x86/irq.h                |  2 +-
 xen/include/xen/irq.h                    |  1 +-
 9 files changed, 50 insertions(+), 18 deletions(-)

Comments

Roger Pau Monné July 17, 2019, 9:54 a.m. UTC | #1
On Wed, Jul 17, 2019 at 03:00:42AM +0200, Marek Marczykowski-Górecki wrote:
> Stubdomains need to be given sufficient privilege over the guest which it
> provides emulation for in order for PCI passthrough to work correctly.
> When a HVM domain try to enable MSI, QEMU in stubdomain calls
> PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> part of xc_domain_update_msi_irq. Allow for that as part of
> PHYSDEVOP_map_pirq.
> 
> This is not needed for PCI INTx, because IRQ in that case is known
> beforehand and the stubdomain is given permissions over this IRQ by
> libxl__device_pci_add (there's a do_pci_add against the stubdomain).
> 
> create_irq() already grant IRQ access to hardware_domain, with
> assumption the device model (something managing this IRQ) lives there.
> Modify create_irq() to take additional parameter pointing at device
> model domain - which may be dom0 or stubdomain.  Save ID of the domain
> given permission, to revoke it in destroy_irq() - easier and cleaner
> than replaying logic of create_irq() parameter. Use domid instead of
> actual reference to the domain, because it might get destroyed before
> destroying IRQ (stubdomain is destroyed before its target domain). And
> it is not an issue, because IRQ permissions live within domain
> structure, so destroying a domain also implicitly revoke the permission.
> Potential domid reuse is detected by by checking if that domain does
> have permission over the IRQ being destroyed.
> 
> Then, adjust all callers to provide the parameter. In case of calls not
> related to stubdomain-initiated allocations, give it either
> hardware_domain (so the behavior is unchanged there), or NULL for
> interrupts used by Xen internally.
> 
> Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet <chanudete@ainfosec.com>.
> 
> Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> Changes in v3:
>  - extend commit message
> Changes in v4:
>  - add missing destroy_irq on error path
> Changes in v5:
>  - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
>    basically make it a different patch
>  - add get_dm_domain() helper
>  - do not give hardware_domain permission over IRQs used in Xen
>    internally
>  - rename create_irq argument to just 'd', to avoid confusion
>    when it's called by hardware domain
>  - verify that device is de-assigned before pci_remove_device call
>  - save ID of domain given permission in create_irq(), to revoke it in
>  destroy_irq()
>  - drop domain parameter from destroy_irq() and msi_free_irq()
>  - do not give hardware domain permission over IRQ created in
>  iommu_set_interrupt()
> ---
>  xen/arch/x86/hpet.c                      |  3 +-
>  xen/arch/x86/irq.c                       | 51 ++++++++++++++++++-------
>  xen/common/irq.c                         |  1 +-
>  xen/drivers/char/ns16550.c               |  2 +-
>  xen/drivers/passthrough/amd/iommu_init.c |  2 +-
>  xen/drivers/passthrough/pci.c            |  3 +-
>  xen/drivers/passthrough/vtd/iommu.c      |  3 +-
>  xen/include/asm-x86/irq.h                |  2 +-
>  xen/include/xen/irq.h                    |  1 +-
>  9 files changed, 50 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> index 4b08488..b4854ff 100644
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -11,6 +11,7 @@
>  #include <xen/softirq.h>
>  #include <xen/irq.h>
>  #include <xen/numa.h>
> +#include <xen/sched.h>
>  #include <asm/fixmap.h>
>  #include <asm/div64.h>
>  #include <asm/hpet.h>
> @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch)
>  {
>      int irq;
>  
> -    if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
> +    if ( (irq = create_irq(NUMA_NO_NODE, hardware_domain)) < 0 )

Shouldn't this be NULL? I don't think the hardware domain should be
allowed to play with the HPET IRQs?

>          return irq;
>  
>      ch->msi.irq = irq;
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 2cac28a..dc5d302 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -164,10 +164,21 @@ int __init bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
>      return ret;
>  }
>  
> +static struct domain *get_dm_domain(struct domain *d)
                                       ^ const
> +{
> +    return current->domain->target == d ? current->domain : hardware_domain;
> +}
> +
>  /*
>   * Dynamic irq allocate and deallocation for MSI
>   */
> -int create_irq(nodeid_t node)
> +
> +/*
> + * create_irq - allocate irq for MSI
> + * @d domain that will get permission over the allocated irq; this permission
> + * will automatically be revoked on destroy_irq
> + */
> +int create_irq(nodeid_t node, struct domain *d)
>  {
>      int irq, ret;
>      struct irq_desc *desc;
> @@ -200,18 +211,24 @@ int create_irq(nodeid_t node)
>          desc->arch.used = IRQ_UNUSED;
>          irq = ret;
>      }

I would assert that desc->creator_domid == DOMID_INVALID here, since
in the failure case creator_domid is not overwritten.

> -    else if ( hardware_domain )
> +    else if ( d )
>      {
> -        ret = irq_permit_access(hardware_domain, irq);
> +        ASSERT(d == current->domain);
> +        ret = irq_permit_access(d, irq);
>          if ( ret )
>              printk(XENLOG_G_ERR
> -                   "Could not grant Dom0 access to IRQ%d (error %d)\n",
> -                   irq, ret);
> +                   "Could not grant Dom%u access to IRQ%d (error %d)\n",
> +                   d->domain_id, irq, ret);
> +        else
> +            desc->creator_domid = d->domain_id;
>      }
>  
>      return irq;
>  }
>  
> +/*
> + * destroy_irq - deallocate irq for MSI
> + */
>  void destroy_irq(unsigned int irq)
>  {
>      struct irq_desc *desc = irq_to_desc(irq);
> @@ -220,14 +237,22 @@ void destroy_irq(unsigned int irq)
>  
>      BUG_ON(!MSI_IRQ(irq));
>  
> -    if ( hardware_domain )
> +    if ( desc->creator_domid != DOMID_INVALID )
>      {
> -        int err = irq_deny_access(hardware_domain, irq);
> +        struct domain *d = get_domain_by_id(desc->creator_domid);
>  
> -        if ( err )
> -            printk(XENLOG_G_ERR
> -                   "Could not revoke Dom0 access to IRQ%u (error %d)\n",
> -                   irq, err);
> +        if ( d && irq_access_permitted(d, irq) ) {
> +            int err;
> +
> +            err = irq_deny_access(d, irq);
> +            if ( err )
> +                printk(XENLOG_G_ERR
> +                       "Could not revoke Dom%u access to IRQ%u (error %d)\n",
> +                       d->domain_id, irq, err);
> +        }
> +
> +        if ( d )
> +            put_domain(d);

Don't you need to set creator_domid = DOMID_INVALID in destroy_irq at
some point?

Or else a failure in create_irq could leak the irq to it's previous
owner. Note that init_one_irq_desc would only init the fields the
first time the IRQ is used, but not for subsequent usages AFAICT.

>      }
>  
>      spin_lock_irqsave(&desc->lock, flags);
> @@ -2058,7 +2083,7 @@ int map_domain_pirq(
>              spin_unlock_irqrestore(&desc->lock, flags);
>  
>              info = NULL;
> -            irq = create_irq(NUMA_NO_NODE);
> +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));

Isn't it fine to just use current->domain here directly?

It's always going to be the current domain the one that calls
map_domain_pirq in order to get a PIRQ mapped for it's target
domain I think.

>              ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
>                             : irq;
>              if ( ret < 0 )
> @@ -2691,7 +2716,7 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
>          if ( irq == -1 )
>          {
>      case MAP_PIRQ_TYPE_MULTI_MSI:
> -            irq = create_irq(NUMA_NO_NODE);
> +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
>          }
>  
>          if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> diff --git a/xen/common/irq.c b/xen/common/irq.c
> index f42512d..42b27a9 100644
> --- a/xen/common/irq.c
> +++ b/xen/common/irq.c
> @@ -16,6 +16,7 @@ int init_one_irq_desc(struct irq_desc *desc)
>      spin_lock_init(&desc->lock);
>      cpumask_setall(desc->affinity);
>      INIT_LIST_HEAD(&desc->rl_link);
> +    desc->creator_domid = DOMID_INVALID;
>  
>      err = arch_init_one_irq_desc(desc);
>      if ( err )
> diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> index 189e121..ccc8b04 100644
> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -719,7 +719,7 @@ static void __init ns16550_init_irq(struct serial_port *port)
>      struct ns16550 *uart = port->uart;
>  
>      if ( uart->msi )
> -        uart->irq = create_irq(0);
> +        uart->irq = create_irq(0, NULL);
>  #endif
>  }
>  
> diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
> index 4e76b26..50785e0 100644
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -781,7 +781,7 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
>      hw_irq_controller *handler;
>      u16 control;
>  
> -    irq = create_irq(NUMA_NO_NODE);
> +    irq = create_irq(NUMA_NO_NODE, NULL);
>      if ( irq <= 0 )
>      {
>          dprintk(XENLOG_ERR, "IOMMU: no irqs\n");
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index e886894..507b3d1 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -845,6 +845,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>          if ( pdev->bus == bus && pdev->devfn == devfn )
>          {
> +            ret = -EBUSY;
> +            if ( pdev->domain && pdev->domain != hardware_domain )
> +                break;

This seems like an unlrelated fix?

ie: preventing device removal while in use by a domain different than
dom0?

Note that you don't need the pdev->domain != NULL check, just doing
pdev->domain != hardware_domain seems enough, since you don't
dereference the pdev->domain pointer in the expression (unless I'm
missing other usages below).

>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
>                  list_del(&pdev->domain_list);
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 8b27d7e..79f9682 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1138,7 +1138,8 @@ static int __init iommu_set_interrupt(struct acpi_drhd_unit *drhd)
>      struct irq_desc *desc;
>  
>      irq = create_irq(rhsa ? pxm_to_node(rhsa->proximity_domain)
> -                          : NUMA_NO_NODE);
> +                          : NUMA_NO_NODE,
> +                     NULL);
>      if ( irq <= 0 )
>      {
>          dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n");
> diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
> index c0c6e7c..5b24428 100644
> --- a/xen/include/asm-x86/irq.h
> +++ b/xen/include/asm-x86/irq.h
> @@ -155,7 +155,7 @@ int  init_irq_data(void);
>  void clear_irq_vector(int irq);
>  
>  int irq_to_vector(int irq);
> -int create_irq(nodeid_t node);
> +int create_irq(nodeid_t node, struct domain *d);
>  void destroy_irq(unsigned int irq);
>  int assign_irq_vector(int irq, const cpumask_t *);
>  
> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> index 586b783..c7a6a83 100644
> --- a/xen/include/xen/irq.h
> +++ b/xen/include/xen/irq.h
> @@ -91,6 +91,7 @@ typedef struct irq_desc {
>      spinlock_t lock;
>      struct arch_irq_desc arch;
>      cpumask_var_t affinity;
> +    domid_t creator_domid; /* weak reference to domain handling this IRQ */

I feel like handling is too vague here, but I'm not a native speaker
so I'm not sure. I would maybe write:

... domain having permissions over this IRQ (which can be different
from the domain actually having the IRQ assigned) */

Which I think is less ambiguous.

Thanks, Roger.
Marek Marczykowski-Górecki July 17, 2019, 3:09 p.m. UTC | #2
On Wed, Jul 17, 2019 at 11:54:35AM +0200, Roger Pau Monné wrote:
> On Wed, Jul 17, 2019 at 03:00:42AM +0200, Marek Marczykowski-Górecki wrote:
> > Stubdomains need to be given sufficient privilege over the guest which it
> > provides emulation for in order for PCI passthrough to work correctly.
> > When a HVM domain try to enable MSI, QEMU in stubdomain calls
> > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as
> > part of xc_domain_update_msi_irq. Allow for that as part of
> > PHYSDEVOP_map_pirq.
> > 
> > This is not needed for PCI INTx, because IRQ in that case is known
> > beforehand and the stubdomain is given permissions over this IRQ by
> > libxl__device_pci_add (there's a do_pci_add against the stubdomain).
> > 
> > create_irq() already grant IRQ access to hardware_domain, with
> > assumption the device model (something managing this IRQ) lives there.
> > Modify create_irq() to take additional parameter pointing at device
> > model domain - which may be dom0 or stubdomain.  Save ID of the domain
> > given permission, to revoke it in destroy_irq() - easier and cleaner
> > than replaying logic of create_irq() parameter. Use domid instead of
> > actual reference to the domain, because it might get destroyed before
> > destroying IRQ (stubdomain is destroyed before its target domain). And
> > it is not an issue, because IRQ permissions live within domain
> > structure, so destroying a domain also implicitly revoke the permission.
> > Potential domid reuse is detected by by checking if that domain does
> > have permission over the IRQ being destroyed.
> > 
> > Then, adjust all callers to provide the parameter. In case of calls not
> > related to stubdomain-initiated allocations, give it either
> > hardware_domain (so the behavior is unchanged there), or NULL for
> > interrupts used by Xen internally.
> > 
> > Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet <chanudete@ainfosec.com>.
> > 
> > Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > Changes in v3:
> >  - extend commit message
> > Changes in v4:
> >  - add missing destroy_irq on error path
> > Changes in v5:
> >  - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which
> >    basically make it a different patch
> >  - add get_dm_domain() helper
> >  - do not give hardware_domain permission over IRQs used in Xen
> >    internally
> >  - rename create_irq argument to just 'd', to avoid confusion
> >    when it's called by hardware domain
> >  - verify that device is de-assigned before pci_remove_device call
> >  - save ID of domain given permission in create_irq(), to revoke it in
> >  destroy_irq()
> >  - drop domain parameter from destroy_irq() and msi_free_irq()
> >  - do not give hardware domain permission over IRQ created in
> >  iommu_set_interrupt()
> > ---
> >  xen/arch/x86/hpet.c                      |  3 +-
> >  xen/arch/x86/irq.c                       | 51 ++++++++++++++++++-------
> >  xen/common/irq.c                         |  1 +-
> >  xen/drivers/char/ns16550.c               |  2 +-
> >  xen/drivers/passthrough/amd/iommu_init.c |  2 +-
> >  xen/drivers/passthrough/pci.c            |  3 +-
> >  xen/drivers/passthrough/vtd/iommu.c      |  3 +-
> >  xen/include/asm-x86/irq.h                |  2 +-
> >  xen/include/xen/irq.h                    |  1 +-
> >  9 files changed, 50 insertions(+), 18 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> > index 4b08488..b4854ff 100644
> > --- a/xen/arch/x86/hpet.c
> > +++ b/xen/arch/x86/hpet.c
> > @@ -11,6 +11,7 @@
> >  #include <xen/softirq.h>
> >  #include <xen/irq.h>
> >  #include <xen/numa.h>
> > +#include <xen/sched.h>
> >  #include <asm/fixmap.h>
> >  #include <asm/div64.h>
> >  #include <asm/hpet.h>
> > @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch)
> >  {
> >      int irq;
> >  
> > -    if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
> > +    if ( (irq = create_irq(NUMA_NO_NODE, hardware_domain)) < 0 )
> 
> Shouldn't this be NULL? I don't think the hardware domain should be
> allowed to play with the HPET IRQs?

Good point.

> >          return irq;
> >  
> >      ch->msi.irq = irq;
> > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> > index 2cac28a..dc5d302 100644
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -164,10 +164,21 @@ int __init bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
> >      return ret;
> >  }
> >  
> > +static struct domain *get_dm_domain(struct domain *d)
>                                        ^ const
> > +{
> > +    return current->domain->target == d ? current->domain : hardware_domain;
> > +}
> > +
> >  /*
> >   * Dynamic irq allocate and deallocation for MSI
> >   */
> > -int create_irq(nodeid_t node)
> > +
> > +/*
> > + * create_irq - allocate irq for MSI
> > + * @d domain that will get permission over the allocated irq; this permission
> > + * will automatically be revoked on destroy_irq
> > + */
> > +int create_irq(nodeid_t node, struct domain *d)
> >  {
> >      int irq, ret;
> >      struct irq_desc *desc;
> > @@ -200,18 +211,24 @@ int create_irq(nodeid_t node)
> >          desc->arch.used = IRQ_UNUSED;
> >          irq = ret;
> >      }
> 
> I would assert that desc->creator_domid == DOMID_INVALID here, since
> in the failure case creator_domid is not overwritten.

Yes, see below.

> > -    else if ( hardware_domain )
> > +    else if ( d )
> >      {
> > -        ret = irq_permit_access(hardware_domain, irq);
> > +        ASSERT(d == current->domain);
> > +        ret = irq_permit_access(d, irq);
> >          if ( ret )
> >              printk(XENLOG_G_ERR
> > -                   "Could not grant Dom0 access to IRQ%d (error %d)\n",
> > -                   irq, ret);
> > +                   "Could not grant Dom%u access to IRQ%d (error %d)\n",
> > +                   d->domain_id, irq, ret);
> > +        else
> > +            desc->creator_domid = d->domain_id;
> >      }
> >  
> >      return irq;
> >  }
> >  
> > +/*
> > + * destroy_irq - deallocate irq for MSI
> > + */
> >  void destroy_irq(unsigned int irq)
> >  {
> >      struct irq_desc *desc = irq_to_desc(irq);
> > @@ -220,14 +237,22 @@ void destroy_irq(unsigned int irq)
> >  
> >      BUG_ON(!MSI_IRQ(irq));
> >  
> > -    if ( hardware_domain )
> > +    if ( desc->creator_domid != DOMID_INVALID )
> >      {
> > -        int err = irq_deny_access(hardware_domain, irq);
> > +        struct domain *d = get_domain_by_id(desc->creator_domid);
> >  
> > -        if ( err )
> > -            printk(XENLOG_G_ERR
> > -                   "Could not revoke Dom0 access to IRQ%u (error %d)\n",
> > -                   irq, err);
> > +        if ( d && irq_access_permitted(d, irq) ) {
> > +            int err;
> > +
> > +            err = irq_deny_access(d, irq);
> > +            if ( err )
> > +                printk(XENLOG_G_ERR
> > +                       "Could not revoke Dom%u access to IRQ%u (error %d)\n",
> > +                       d->domain_id, irq, err);
> > +        }
> > +
> > +        if ( d )
> > +            put_domain(d);
> 
> Don't you need to set creator_domid = DOMID_INVALID in destroy_irq at
> some point?
> 
> Or else a failure in create_irq could leak the irq to it's previous
> owner. Note that init_one_irq_desc would only init the fields the
> first time the IRQ is used, but not for subsequent usages AFAICT.

I assumed init_one_irq_desc do the work on subsequent usages too. If not,
indeed I need to modify creator_domid in few more places.

> >      }
> >  
> >      spin_lock_irqsave(&desc->lock, flags);
> > @@ -2058,7 +2083,7 @@ int map_domain_pirq(
> >              spin_unlock_irqrestore(&desc->lock, flags);
> >  
> >              info = NULL;
> > -            irq = create_irq(NUMA_NO_NODE);
> > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> 
> Isn't it fine to just use current->domain here directly?
> 
> It's always going to be the current domain the one that calls
> map_domain_pirq in order to get a PIRQ mapped for it's target
> domain I think.

I wasn't sure if that's true if all the cases. Especially if hardware
domain != toolstack domain. How is it then? Is it hardware domain
calling map_domain_pirq in that case?

> >              ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
> >                             : irq;
> >              if ( ret < 0 )
> > @@ -2691,7 +2716,7 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
> >          if ( irq == -1 )
> >          {
> >      case MAP_PIRQ_TYPE_MULTI_MSI:
> > -            irq = create_irq(NUMA_NO_NODE);
> > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> >          }
> >  
> >          if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> > diff --git a/xen/common/irq.c b/xen/common/irq.c
> > index f42512d..42b27a9 100644
> > --- a/xen/common/irq.c
> > +++ b/xen/common/irq.c
> > @@ -16,6 +16,7 @@ int init_one_irq_desc(struct irq_desc *desc)
> >      spin_lock_init(&desc->lock);
> >      cpumask_setall(desc->affinity);
> >      INIT_LIST_HEAD(&desc->rl_link);
> > +    desc->creator_domid = DOMID_INVALID;
> >  
> >      err = arch_init_one_irq_desc(desc);
> >      if ( err )
> > diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> > index 189e121..ccc8b04 100644
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -719,7 +719,7 @@ static void __init ns16550_init_irq(struct serial_port *port)
> >      struct ns16550 *uart = port->uart;
> >  
> >      if ( uart->msi )
> > -        uart->irq = create_irq(0);
> > +        uart->irq = create_irq(0, NULL);
> >  #endif
> >  }
> >  
> > diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
> > index 4e76b26..50785e0 100644
> > --- a/xen/drivers/passthrough/amd/iommu_init.c
> > +++ b/xen/drivers/passthrough/amd/iommu_init.c
> > @@ -781,7 +781,7 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
> >      hw_irq_controller *handler;
> >      u16 control;
> >  
> > -    irq = create_irq(NUMA_NO_NODE);
> > +    irq = create_irq(NUMA_NO_NODE, NULL);
> >      if ( irq <= 0 )
> >      {
> >          dprintk(XENLOG_ERR, "IOMMU: no irqs\n");
> > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> > index e886894..507b3d1 100644
> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -845,6 +845,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> >      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> >          if ( pdev->bus == bus && pdev->devfn == devfn )
> >          {
> > +            ret = -EBUSY;
> > +            if ( pdev->domain && pdev->domain != hardware_domain )
> > +                break;
> 
> This seems like an unlrelated fix?
> 
> ie: preventing device removal while in use by a domain different than
> dom0?

Indeed it may warrant separate commit now.

> Note that you don't need the pdev->domain != NULL check, just doing
> pdev->domain != hardware_domain seems enough, since you don't
> dereference the pdev->domain pointer in the expression (unless I'm
> missing other usages below).

I don't want to prevent removal if pdev->domain is NULL (if that's even
possible).

> >              ret = iommu_remove_device(pdev);
> >              if ( pdev->domain )
> >                  list_del(&pdev->domain_list);
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> > index 8b27d7e..79f9682 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1138,7 +1138,8 @@ static int __init iommu_set_interrupt(struct acpi_drhd_unit *drhd)
> >      struct irq_desc *desc;
> >  
> >      irq = create_irq(rhsa ? pxm_to_node(rhsa->proximity_domain)
> > -                          : NUMA_NO_NODE);
> > +                          : NUMA_NO_NODE,
> > +                     NULL);
> >      if ( irq <= 0 )
> >      {
> >          dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n");
> > diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
> > index c0c6e7c..5b24428 100644
> > --- a/xen/include/asm-x86/irq.h
> > +++ b/xen/include/asm-x86/irq.h
> > @@ -155,7 +155,7 @@ int  init_irq_data(void);
> >  void clear_irq_vector(int irq);
> >  
> >  int irq_to_vector(int irq);
> > -int create_irq(nodeid_t node);
> > +int create_irq(nodeid_t node, struct domain *d);
> >  void destroy_irq(unsigned int irq);
> >  int assign_irq_vector(int irq, const cpumask_t *);
> >  
> > diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> > index 586b783..c7a6a83 100644
> > --- a/xen/include/xen/irq.h
> > +++ b/xen/include/xen/irq.h
> > @@ -91,6 +91,7 @@ typedef struct irq_desc {
> >      spinlock_t lock;
> >      struct arch_irq_desc arch;
> >      cpumask_var_t affinity;
> > +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
> 
> I feel like handling is too vague here, but I'm not a native speaker
> so I'm not sure. I would maybe write:
> 
> ... domain having permissions over this IRQ (which can be different
> from the domain actually having the IRQ assigned) */
> 
> Which I think is less ambiguous.

I wanted to fit the comment in one line. But your version indeed may be
better.
Roger Pau Monné July 18, 2019, 9:29 a.m. UTC | #3
On Wed, Jul 17, 2019 at 05:09:12PM +0200, Marek Marczykowski-Górecki wrote:
> On Wed, Jul 17, 2019 at 11:54:35AM +0200, Roger Pau Monné wrote:
> > On Wed, Jul 17, 2019 at 03:00:42AM +0200, Marek Marczykowski-Górecki wrote:
> > > @@ -220,14 +237,22 @@ void destroy_irq(unsigned int irq)
> > >  
> > >      BUG_ON(!MSI_IRQ(irq));
> > >  
> > > -    if ( hardware_domain )
> > > +    if ( desc->creator_domid != DOMID_INVALID )
> > >      {
> > > -        int err = irq_deny_access(hardware_domain, irq);
> > > +        struct domain *d = get_domain_by_id(desc->creator_domid);
> > >  
> > > -        if ( err )
> > > -            printk(XENLOG_G_ERR
> > > -                   "Could not revoke Dom0 access to IRQ%u (error %d)\n",
> > > -                   irq, err);
> > > +        if ( d && irq_access_permitted(d, irq) ) {
> > > +            int err;
> > > +
> > > +            err = irq_deny_access(d, irq);
> > > +            if ( err )
> > > +                printk(XENLOG_G_ERR
> > > +                       "Could not revoke Dom%u access to IRQ%u (error %d)\n",
> > > +                       d->domain_id, irq, err);
> > > +        }
> > > +
> > > +        if ( d )
> > > +            put_domain(d);
> > 
> > Don't you need to set creator_domid = DOMID_INVALID in destroy_irq at
> > some point?
> > 
> > Or else a failure in create_irq could leak the irq to it's previous
> > owner. Note that init_one_irq_desc would only init the fields the
> > first time the IRQ is used, but not for subsequent usages AFAICT.
> 
> I assumed init_one_irq_desc do the work on subsequent usages too. If not,
> indeed I need to modify creator_domid in few more places.

I don't think so, init_one_irq_desc will only init the fields if
handler == NULL, which will only happen the first time the IRQ is
used, afterwards handler is set to &no_irq_type by destroy_irq.

Just setting creator_domid = DOMID_INVALID in destroy_irq and adding
the assert to create_irq should be enough AFAICT, since those
functions are used exclusively by non-shared IRQs (MSI and MSI-X).

> > >      }
> > >  
> > >      spin_lock_irqsave(&desc->lock, flags);
> > > @@ -2058,7 +2083,7 @@ int map_domain_pirq(
> > >              spin_unlock_irqrestore(&desc->lock, flags);
> > >  
> > >              info = NULL;
> > > -            irq = create_irq(NUMA_NO_NODE);
> > > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> > 
> > Isn't it fine to just use current->domain here directly?
> > 
> > It's always going to be the current domain the one that calls
> > map_domain_pirq in order to get a PIRQ mapped for it's target
> > domain I think.
> 
> I wasn't sure if that's true if all the cases. Especially if hardware
> domain != toolstack domain. How is it then? Is it hardware domain
> calling map_domain_pirq in that case?

But then it's going to be the hardware domain the one that runs the
QEMU instance, and hence the one that issues the hypercalls to
map/unmap PIRQs to a target domain?

ie: the PCI backend (either pciback or QEMU) is not going to run on
the toolstack domain.

I'm afraid I don't see a case where current->domain isn't the domain
also requiring permissions over the IRQ, but I could be wrong. Can you
come up with a detailed scenario where this might happen?

> 
> > >              ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
> > >                             : irq;
> > >              if ( ret < 0 )
> > > @@ -2691,7 +2716,7 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
> > >          if ( irq == -1 )
> > >          {
> > >      case MAP_PIRQ_TYPE_MULTI_MSI:
> > > -            irq = create_irq(NUMA_NO_NODE);
> > > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> > >          }
> > >  
> > >          if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> > > diff --git a/xen/common/irq.c b/xen/common/irq.c
> > > index f42512d..42b27a9 100644
> > > --- a/xen/common/irq.c
> > > +++ b/xen/common/irq.c
> > > @@ -16,6 +16,7 @@ int init_one_irq_desc(struct irq_desc *desc)
> > >      spin_lock_init(&desc->lock);
> > >      cpumask_setall(desc->affinity);
> > >      INIT_LIST_HEAD(&desc->rl_link);
> > > +    desc->creator_domid = DOMID_INVALID;
> > >  
> > >      err = arch_init_one_irq_desc(desc);
> > >      if ( err )
> > > diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> > > index 189e121..ccc8b04 100644
> > > --- a/xen/drivers/char/ns16550.c
> > > +++ b/xen/drivers/char/ns16550.c
> > > @@ -719,7 +719,7 @@ static void __init ns16550_init_irq(struct serial_port *port)
> > >      struct ns16550 *uart = port->uart;
> > >  
> > >      if ( uart->msi )
> > > -        uart->irq = create_irq(0);
> > > +        uart->irq = create_irq(0, NULL);
> > >  #endif
> > >  }
> > >  
> > > diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
> > > index 4e76b26..50785e0 100644
> > > --- a/xen/drivers/passthrough/amd/iommu_init.c
> > > +++ b/xen/drivers/passthrough/amd/iommu_init.c
> > > @@ -781,7 +781,7 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
> > >      hw_irq_controller *handler;
> > >      u16 control;
> > >  
> > > -    irq = create_irq(NUMA_NO_NODE);
> > > +    irq = create_irq(NUMA_NO_NODE, NULL);
> > >      if ( irq <= 0 )
> > >      {
> > >          dprintk(XENLOG_ERR, "IOMMU: no irqs\n");
> > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> > > index e886894..507b3d1 100644
> > > --- a/xen/drivers/passthrough/pci.c
> > > +++ b/xen/drivers/passthrough/pci.c
> > > @@ -845,6 +845,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> > >      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> > >          if ( pdev->bus == bus && pdev->devfn == devfn )
> > >          {
> > > +            ret = -EBUSY;
> > > +            if ( pdev->domain && pdev->domain != hardware_domain )
> > > +                break;
> > 
> > This seems like an unlrelated fix?
> > 
> > ie: preventing device removal while in use by a domain different than
> > dom0?
> 
> Indeed it may warrant separate commit now.
> 
> > Note that you don't need the pdev->domain != NULL check, just doing
> > pdev->domain != hardware_domain seems enough, since you don't
> > dereference the pdev->domain pointer in the expression (unless I'm
> > missing other usages below).
> 
> I don't want to prevent removal if pdev->domain is NULL (if that's even
> possible).

But if pdev->domain == NULL, then it's certainly going to be different
from hardware_domain, so just using pdev->domain != hardware_domain
achieves both.

> > >              ret = iommu_remove_device(pdev);
> > >              if ( pdev->domain )
> > >                  list_del(&pdev->domain_list);
> > > diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> > > index 8b27d7e..79f9682 100644
> > > --- a/xen/drivers/passthrough/vtd/iommu.c
> > > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > > @@ -1138,7 +1138,8 @@ static int __init iommu_set_interrupt(struct acpi_drhd_unit *drhd)
> > >      struct irq_desc *desc;
> > >  
> > >      irq = create_irq(rhsa ? pxm_to_node(rhsa->proximity_domain)
> > > -                          : NUMA_NO_NODE);
> > > +                          : NUMA_NO_NODE,
> > > +                     NULL);
> > >      if ( irq <= 0 )
> > >      {
> > >          dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n");
> > > diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
> > > index c0c6e7c..5b24428 100644
> > > --- a/xen/include/asm-x86/irq.h
> > > +++ b/xen/include/asm-x86/irq.h
> > > @@ -155,7 +155,7 @@ int  init_irq_data(void);
> > >  void clear_irq_vector(int irq);
> > >  
> > >  int irq_to_vector(int irq);
> > > -int create_irq(nodeid_t node);
> > > +int create_irq(nodeid_t node, struct domain *d);
> > >  void destroy_irq(unsigned int irq);
> > >  int assign_irq_vector(int irq, const cpumask_t *);
> > >  
> > > diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> > > index 586b783..c7a6a83 100644
> > > --- a/xen/include/xen/irq.h
> > > +++ b/xen/include/xen/irq.h
> > > @@ -91,6 +91,7 @@ typedef struct irq_desc {
> > >      spinlock_t lock;
> > >      struct arch_irq_desc arch;
> > >      cpumask_var_t affinity;
> > > +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
> > 
> > I feel like handling is too vague here, but I'm not a native speaker
> > so I'm not sure. I would maybe write:
> > 
> > ... domain having permissions over this IRQ (which can be different
> > from the domain actually having the IRQ assigned) */
> > 
> > Which I think is less ambiguous.
> 
> I wanted to fit the comment in one line. But your version indeed may be
> better.

I would always error towards being more verbose, even if that means
using more that one line. As said, I don't think the comment is wrong,
just feel it could be more detailed.

Thanks, Roger.
Marek Marczykowski-Górecki July 18, 2019, 3:12 p.m. UTC | #4
On Thu, Jul 18, 2019 at 11:29:39AM +0200, Roger Pau Monné wrote:
> On Wed, Jul 17, 2019 at 05:09:12PM +0200, Marek Marczykowski-Górecki wrote:
> > On Wed, Jul 17, 2019 at 11:54:35AM +0200, Roger Pau Monné wrote:
> > > On Wed, Jul 17, 2019 at 03:00:42AM +0200, Marek Marczykowski-Górecki wrote:
> > > > @@ -220,14 +237,22 @@ void destroy_irq(unsigned int irq)
> > > >  
> > > >      BUG_ON(!MSI_IRQ(irq));
> > > >  
> > > > -    if ( hardware_domain )
> > > > +    if ( desc->creator_domid != DOMID_INVALID )
> > > >      {
> > > > -        int err = irq_deny_access(hardware_domain, irq);
> > > > +        struct domain *d = get_domain_by_id(desc->creator_domid);
> > > >  
> > > > -        if ( err )
> > > > -            printk(XENLOG_G_ERR
> > > > -                   "Could not revoke Dom0 access to IRQ%u (error %d)\n",
> > > > -                   irq, err);
> > > > +        if ( d && irq_access_permitted(d, irq) ) {
> > > > +            int err;
> > > > +
> > > > +            err = irq_deny_access(d, irq);
> > > > +            if ( err )
> > > > +                printk(XENLOG_G_ERR
> > > > +                       "Could not revoke Dom%u access to IRQ%u (error %d)\n",
> > > > +                       d->domain_id, irq, err);
> > > > +        }
> > > > +
> > > > +        if ( d )
> > > > +            put_domain(d);
> > > 
> > > Don't you need to set creator_domid = DOMID_INVALID in destroy_irq at
> > > some point?
> > > 
> > > Or else a failure in create_irq could leak the irq to it's previous
> > > owner. Note that init_one_irq_desc would only init the fields the
> > > first time the IRQ is used, but not for subsequent usages AFAICT.
> > 
> > I assumed init_one_irq_desc do the work on subsequent usages too. If not,
> > indeed I need to modify creator_domid in few more places.
> 
> I don't think so, init_one_irq_desc will only init the fields if
> handler == NULL, which will only happen the first time the IRQ is
> used, afterwards handler is set to &no_irq_type by destroy_irq.
> 
> Just setting creator_domid = DOMID_INVALID in destroy_irq and adding
> the assert to create_irq should be enough AFAICT, since those
> functions are used exclusively by non-shared IRQs (MSI and MSI-X).

Ok.

> > > >      }
> > > >  
> > > >      spin_lock_irqsave(&desc->lock, flags);
> > > > @@ -2058,7 +2083,7 @@ int map_domain_pirq(
> > > >              spin_unlock_irqrestore(&desc->lock, flags);
> > > >  
> > > >              info = NULL;
> > > > -            irq = create_irq(NUMA_NO_NODE);
> > > > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> > > 
> > > Isn't it fine to just use current->domain here directly?
> > > 
> > > It's always going to be the current domain the one that calls
> > > map_domain_pirq in order to get a PIRQ mapped for it's target
> > > domain I think.
> > 
> > I wasn't sure if that's true if all the cases. Especially if hardware
> > domain != toolstack domain. How is it then? Is it hardware domain
> > calling map_domain_pirq in that case?
> 
> But then it's going to be the hardware domain the one that runs the
> QEMU instance, and hence the one that issues the hypercalls to
> map/unmap PIRQs to a target domain?
> 
> ie: the PCI backend (either pciback or QEMU) is not going to run on
> the toolstack domain.

Indeed, you're right. This also means get_dm_domain() helper wouldn't be
needed anymore.

> I'm afraid I don't see a case where current->domain isn't the domain
> also requiring permissions over the IRQ, but I could be wrong. Can you
> come up with a detailed scenario where this might happen?
> 
> > 
> > > >              ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
> > > >                             : irq;
> > > >              if ( ret < 0 )
> > > > @@ -2691,7 +2716,7 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
> > > >          if ( irq == -1 )
> > > >          {
> > > >      case MAP_PIRQ_TYPE_MULTI_MSI:
> > > > -            irq = create_irq(NUMA_NO_NODE);
> > > > +            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
> > > >          }
> > > >  
> > > >          if ( irq < nr_irqs_gsi || irq >= nr_irqs )
> > > > diff --git a/xen/common/irq.c b/xen/common/irq.c
> > > > index f42512d..42b27a9 100644
> > > > --- a/xen/common/irq.c
> > > > +++ b/xen/common/irq.c
> > > > @@ -16,6 +16,7 @@ int init_one_irq_desc(struct irq_desc *desc)
> > > >      spin_lock_init(&desc->lock);
> > > >      cpumask_setall(desc->affinity);
> > > >      INIT_LIST_HEAD(&desc->rl_link);
> > > > +    desc->creator_domid = DOMID_INVALID;
> > > >  
> > > >      err = arch_init_one_irq_desc(desc);
> > > >      if ( err )
> > > > diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> > > > index 189e121..ccc8b04 100644
> > > > --- a/xen/drivers/char/ns16550.c
> > > > +++ b/xen/drivers/char/ns16550.c
> > > > @@ -719,7 +719,7 @@ static void __init ns16550_init_irq(struct serial_port *port)
> > > >      struct ns16550 *uart = port->uart;
> > > >  
> > > >      if ( uart->msi )
> > > > -        uart->irq = create_irq(0);
> > > > +        uart->irq = create_irq(0, NULL);
> > > >  #endif
> > > >  }
> > > >  
> > > > diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
> > > > index 4e76b26..50785e0 100644
> > > > --- a/xen/drivers/passthrough/amd/iommu_init.c
> > > > +++ b/xen/drivers/passthrough/amd/iommu_init.c
> > > > @@ -781,7 +781,7 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
> > > >      hw_irq_controller *handler;
> > > >      u16 control;
> > > >  
> > > > -    irq = create_irq(NUMA_NO_NODE);
> > > > +    irq = create_irq(NUMA_NO_NODE, NULL);
> > > >      if ( irq <= 0 )
> > > >      {
> > > >          dprintk(XENLOG_ERR, "IOMMU: no irqs\n");
> > > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> > > > index e886894..507b3d1 100644
> > > > --- a/xen/drivers/passthrough/pci.c
> > > > +++ b/xen/drivers/passthrough/pci.c
> > > > @@ -845,6 +845,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> > > >      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> > > >          if ( pdev->bus == bus && pdev->devfn == devfn )
> > > >          {
> > > > +            ret = -EBUSY;
> > > > +            if ( pdev->domain && pdev->domain != hardware_domain )
> > > > +                break;
> > > 
> > > This seems like an unlrelated fix?
> > > 
> > > ie: preventing device removal while in use by a domain different than
> > > dom0?
> > 
> > Indeed it may warrant separate commit now.
> > 
> > > Note that you don't need the pdev->domain != NULL check, just doing
> > > pdev->domain != hardware_domain seems enough, since you don't
> > > dereference the pdev->domain pointer in the expression (unless I'm
> > > missing other usages below).
> > 
> > I don't want to prevent removal if pdev->domain is NULL (if that's even
> > possible).
> 
> But if pdev->domain == NULL, then it's certainly going to be different
> from hardware_domain, 

Exactly. And I do _not_ want to hit that break if pdev->domain == NULL.

> so just using pdev->domain != hardware_domain
> achieves both.
Roger Pau Monné July 18, 2019, 4:55 p.m. UTC | #5
On Thu, Jul 18, 2019 at 05:12:54PM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, Jul 18, 2019 at 11:29:39AM +0200, Roger Pau Monné wrote:
> > On Wed, Jul 17, 2019 at 05:09:12PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Wed, Jul 17, 2019 at 11:54:35AM +0200, Roger Pau Monné wrote:
> > > > On Wed, Jul 17, 2019 at 03:00:42AM +0200, Marek Marczykowski-Górecki wrote:
> > > > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> > > > > index e886894..507b3d1 100644
> > > > > --- a/xen/drivers/passthrough/pci.c
> > > > > +++ b/xen/drivers/passthrough/pci.c
> > > > > @@ -845,6 +845,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> > > > >      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> > > > >          if ( pdev->bus == bus && pdev->devfn == devfn )
> > > > >          {
> > > > > +            ret = -EBUSY;
> > > > > +            if ( pdev->domain && pdev->domain != hardware_domain )
> > > > > +                break;
> > > > 
> > > > This seems like an unlrelated fix?
> > > > 
> > > > ie: preventing device removal while in use by a domain different than
> > > > dom0?
> > > 
> > > Indeed it may warrant separate commit now.
> > > 
> > > > Note that you don't need the pdev->domain != NULL check, just doing
> > > > pdev->domain != hardware_domain seems enough, since you don't
> > > > dereference the pdev->domain pointer in the expression (unless I'm
> > > > missing other usages below).
> > > 
> > > I don't want to prevent removal if pdev->domain is NULL (if that's even
> > > possible).
> > 
> > But if pdev->domain == NULL, then it's certainly going to be different
> > from hardware_domain, 
> 
> Exactly. And I do _not_ want to hit that break if pdev->domain == NULL.

Oh sorry for being obtuse, you are right. In any case, please send
this as a separate patch.

Thanks, Roger.
Julien Grall July 20, 2019, 4:48 p.m. UTC | #6
Hi,

Sorry for jumping late in the discussion.

On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> index 586b783..c7a6a83 100644
> --- a/xen/include/xen/irq.h
> +++ b/xen/include/xen/irq.h
> @@ -91,6 +91,7 @@ typedef struct irq_desc {
>       spinlock_t lock;
>       struct arch_irq_desc arch;
>       cpumask_var_t affinity;
> +    domid_t creator_domid; /* weak reference to domain handling this IRQ */

This x86 only. Can this be moved in arch_irq_desc to avoid increasing 
the structure on Arm?

Cheers,
Marek Marczykowski-Górecki July 20, 2019, 9:21 p.m. UTC | #7
On Sat, Jul 20, 2019 at 05:48:56PM +0100, Julien Grall wrote:
> Hi,
> 
> Sorry for jumping late in the discussion.
> 
> On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
> > diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> > index 586b783..c7a6a83 100644
> > --- a/xen/include/xen/irq.h
> > +++ b/xen/include/xen/irq.h
> > @@ -91,6 +91,7 @@ typedef struct irq_desc {
> >       spinlock_t lock;
> >       struct arch_irq_desc arch;
> >       cpumask_var_t affinity;
> > +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
> 
> This x86 only. Can this be moved in arch_irq_desc to avoid increasing the
> structure on Arm?

How (if at all) PCI passthrough is supported on ARM? Is qemu involved?
If so, and if that qemu would be isolated in stubdomain, I think ARM
would need a similar feature. If it not the case right now, but it is
planned, do you think it's worth moving it to arch_irq_desc and possibly
move back later?
Julien Grall July 21, 2019, 6:05 p.m. UTC | #8
Hi,

On 7/20/19 10:21 PM, Marek Marczykowski-Górecki wrote:
> On Sat, Jul 20, 2019 at 05:48:56PM +0100, Julien Grall wrote:
>> Hi,
>>
>> Sorry for jumping late in the discussion.
>>
>> On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
>>> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
>>> index 586b783..c7a6a83 100644
>>> --- a/xen/include/xen/irq.h
>>> +++ b/xen/include/xen/irq.h
>>> @@ -91,6 +91,7 @@ typedef struct irq_desc {
>>>        spinlock_t lock;
>>>        struct arch_irq_desc arch;
>>>        cpumask_var_t affinity;
>>> +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
>>
>> This x86 only. Can this be moved in arch_irq_desc to avoid increasing the
>> structure on Arm?
> 
> How (if at all) PCI passthrough is supported on ARM? Is qemu involved?
> If so, and if that qemu would be isolated in stubdomain, I think ARM
> would need a similar feature. If it not the case right now, but it is
> planned, do you think it's worth moving it to arch_irq_desc and possibly
> move back later?

PCI passthrough is not yet supported on Arm. However, I would not expect 
QEMU to be involved at all for Arm.

In any case, I would still prefer to keep field in arch_irq_desc until 
we see any usage on Arm.

Cheers,
Roger Pau Monné July 22, 2019, 8:45 a.m. UTC | #9
On Sun, Jul 21, 2019 at 07:05:16PM +0100, Julien Grall wrote:
> Hi,
> 
> On 7/20/19 10:21 PM, Marek Marczykowski-Górecki wrote:
> > On Sat, Jul 20, 2019 at 05:48:56PM +0100, Julien Grall wrote:
> > > Hi,
> > > 
> > > Sorry for jumping late in the discussion.
> > > 
> > > On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
> > > > diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> > > > index 586b783..c7a6a83 100644
> > > > --- a/xen/include/xen/irq.h
> > > > +++ b/xen/include/xen/irq.h
> > > > @@ -91,6 +91,7 @@ typedef struct irq_desc {
> > > >        spinlock_t lock;
> > > >        struct arch_irq_desc arch;
> > > >        cpumask_var_t affinity;
> > > > +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
> > > 
> > > This x86 only. Can this be moved in arch_irq_desc to avoid increasing the
> > > structure on Arm?
> > 
> > How (if at all) PCI passthrough is supported on ARM? Is qemu involved?
> > If so, and if that qemu would be isolated in stubdomain, I think ARM
> > would need a similar feature. If it not the case right now, but it is
> > planned, do you think it's worth moving it to arch_irq_desc and possibly
> > move back later?
> 
> PCI passthrough is not yet supported on Arm. However, I would not expect
> QEMU to be involved at all for Arm.
> 
> In any case, I would still prefer to keep field in arch_irq_desc until we
> see any usage on Arm.

I'm fine with putting it inside of the arch struct, but there's
nothing x86 specific about this field. Regardless of whether you use
QEMU or something different, if you want to allow passthrough on ARM
from domains != dom0 you will need this field anyway, so that the
domain running the passthrough emulator can properly manage interrupts
on behalf of the guest.

If you only plan to support something like vPCI inside of Xen then I
guess there's no need for the field, since Xen is always going to be
the one doing the passthrough.

Roger.
Julien Grall July 22, 2019, 9:06 a.m. UTC | #10
Hi Roger,

On 22/07/2019 09:45, Roger Pau Monné wrote:
> On Sun, Jul 21, 2019 at 07:05:16PM +0100, Julien Grall wrote:
>> Hi,
>>
>> On 7/20/19 10:21 PM, Marek Marczykowski-Górecki wrote:
>>> On Sat, Jul 20, 2019 at 05:48:56PM +0100, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> Sorry for jumping late in the discussion.
>>>>
>>>> On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
>>>>> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
>>>>> index 586b783..c7a6a83 100644
>>>>> --- a/xen/include/xen/irq.h
>>>>> +++ b/xen/include/xen/irq.h
>>>>> @@ -91,6 +91,7 @@ typedef struct irq_desc {
>>>>>         spinlock_t lock;
>>>>>         struct arch_irq_desc arch;
>>>>>         cpumask_var_t affinity;
>>>>> +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
>>>>
>>>> This x86 only. Can this be moved in arch_irq_desc to avoid increasing the
>>>> structure on Arm?
>>>
>>> How (if at all) PCI passthrough is supported on ARM? Is qemu involved?
>>> If so, and if that qemu would be isolated in stubdomain, I think ARM
>>> would need a similar feature. If it not the case right now, but it is
>>> planned, do you think it's worth moving it to arch_irq_desc and possibly
>>> move back later?
>>
>> PCI passthrough is not yet supported on Arm. However, I would not expect
>> QEMU to be involved at all for Arm.
>>
>> In any case, I would still prefer to keep field in arch_irq_desc until we
>> see any usage on Arm.
> 
> I'm fine with putting it inside of the arch struct, but there's
> nothing x86 specific about this field.

In theory yes, in practice this is only used by x86 today. I don't want to 
increase the size of irq_desc for unused field.

We can move because the field in irq_desc if there are a need on Arm.

Cheers,
Julien Grall July 22, 2019, 9:16 a.m. UTC | #11
On 22/07/2019 10:06, Julien Grall wrote:
> Hi Roger,
> 
> On 22/07/2019 09:45, Roger Pau Monné wrote:
>> On Sun, Jul 21, 2019 at 07:05:16PM +0100, Julien Grall wrote:
>>> Hi,
>>>
>>> On 7/20/19 10:21 PM, Marek Marczykowski-Górecki wrote:
>>>> On Sat, Jul 20, 2019 at 05:48:56PM +0100, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> Sorry for jumping late in the discussion.
>>>>>
>>>>> On 7/17/19 2:00 AM, Marek Marczykowski-Górecki wrote:
>>>>>> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
>>>>>> index 586b783..c7a6a83 100644
>>>>>> --- a/xen/include/xen/irq.h
>>>>>> +++ b/xen/include/xen/irq.h
>>>>>> @@ -91,6 +91,7 @@ typedef struct irq_desc {
>>>>>>         spinlock_t lock;
>>>>>>         struct arch_irq_desc arch;
>>>>>>         cpumask_var_t affinity;
>>>>>> +    domid_t creator_domid; /* weak reference to domain handling this IRQ */
>>>>>
>>>>> This x86 only. Can this be moved in arch_irq_desc to avoid increasing the
>>>>> structure on Arm?
>>>>
>>>> How (if at all) PCI passthrough is supported on ARM? Is qemu involved?
>>>> If so, and if that qemu would be isolated in stubdomain, I think ARM
>>>> would need a similar feature. If it not the case right now, but it is
>>>> planned, do you think it's worth moving it to arch_irq_desc and possibly
>>>> move back later?
>>>
>>> PCI passthrough is not yet supported on Arm. However, I would not expect
>>> QEMU to be involved at all for Arm.
>>>
>>> In any case, I would still prefer to keep field in arch_irq_desc until we
>>> see any usage on Arm.
>>
>> I'm fine with putting it inside of the arch struct, but there's
>> nothing x86 specific about this field.
> 
> In theory yes, in practice this is only used by x86 today. I don't want to 
> increase the size of irq_desc for unused field.
> 
> We can move because the field in irq_desc if there are a need on Arm.

s/because/later/


Cheers,

Patch
diff mbox series

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 4b08488..b4854ff 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -11,6 +11,7 @@ 
 #include <xen/softirq.h>
 #include <xen/irq.h>
 #include <xen/numa.h>
+#include <xen/sched.h>
 #include <asm/fixmap.h>
 #include <asm/div64.h>
 #include <asm/hpet.h>
@@ -368,7 +369,7 @@  static int __init hpet_assign_irq(struct hpet_event_channel *ch)
 {
     int irq;
 
-    if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
+    if ( (irq = create_irq(NUMA_NO_NODE, hardware_domain)) < 0 )
         return irq;
 
     ch->msi.irq = irq;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 2cac28a..dc5d302 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -164,10 +164,21 @@  int __init bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
     return ret;
 }
 
+static struct domain *get_dm_domain(struct domain *d)
+{
+    return current->domain->target == d ? current->domain : hardware_domain;
+}
+
 /*
  * Dynamic irq allocate and deallocation for MSI
  */
-int create_irq(nodeid_t node)
+
+/*
+ * create_irq - allocate irq for MSI
+ * @d domain that will get permission over the allocated irq; this permission
+ * will automatically be revoked on destroy_irq
+ */
+int create_irq(nodeid_t node, struct domain *d)
 {
     int irq, ret;
     struct irq_desc *desc;
@@ -200,18 +211,24 @@  int create_irq(nodeid_t node)
         desc->arch.used = IRQ_UNUSED;
         irq = ret;
     }
-    else if ( hardware_domain )
+    else if ( d )
     {
-        ret = irq_permit_access(hardware_domain, irq);
+        ASSERT(d == current->domain);
+        ret = irq_permit_access(d, irq);
         if ( ret )
             printk(XENLOG_G_ERR
-                   "Could not grant Dom0 access to IRQ%d (error %d)\n",
-                   irq, ret);
+                   "Could not grant Dom%u access to IRQ%d (error %d)\n",
+                   d->domain_id, irq, ret);
+        else
+            desc->creator_domid = d->domain_id;
     }
 
     return irq;
 }
 
+/*
+ * destroy_irq - deallocate irq for MSI
+ */
 void destroy_irq(unsigned int irq)
 {
     struct irq_desc *desc = irq_to_desc(irq);
@@ -220,14 +237,22 @@  void destroy_irq(unsigned int irq)
 
     BUG_ON(!MSI_IRQ(irq));
 
-    if ( hardware_domain )
+    if ( desc->creator_domid != DOMID_INVALID )
     {
-        int err = irq_deny_access(hardware_domain, irq);
+        struct domain *d = get_domain_by_id(desc->creator_domid);
 
-        if ( err )
-            printk(XENLOG_G_ERR
-                   "Could not revoke Dom0 access to IRQ%u (error %d)\n",
-                   irq, err);
+        if ( d && irq_access_permitted(d, irq) ) {
+            int err;
+
+            err = irq_deny_access(d, irq);
+            if ( err )
+                printk(XENLOG_G_ERR
+                       "Could not revoke Dom%u access to IRQ%u (error %d)\n",
+                       d->domain_id, irq, err);
+        }
+
+        if ( d )
+            put_domain(d);
     }
 
     spin_lock_irqsave(&desc->lock, flags);
@@ -2058,7 +2083,7 @@  int map_domain_pirq(
             spin_unlock_irqrestore(&desc->lock, flags);
 
             info = NULL;
-            irq = create_irq(NUMA_NO_NODE);
+            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
             ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
                            : irq;
             if ( ret < 0 )
@@ -2691,7 +2716,7 @@  int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
         if ( irq == -1 )
         {
     case MAP_PIRQ_TYPE_MULTI_MSI:
-            irq = create_irq(NUMA_NO_NODE);
+            irq = create_irq(NUMA_NO_NODE, get_dm_domain(d));
         }
 
         if ( irq < nr_irqs_gsi || irq >= nr_irqs )
diff --git a/xen/common/irq.c b/xen/common/irq.c
index f42512d..42b27a9 100644
--- a/xen/common/irq.c
+++ b/xen/common/irq.c
@@ -16,6 +16,7 @@  int init_one_irq_desc(struct irq_desc *desc)
     spin_lock_init(&desc->lock);
     cpumask_setall(desc->affinity);
     INIT_LIST_HEAD(&desc->rl_link);
+    desc->creator_domid = DOMID_INVALID;
 
     err = arch_init_one_irq_desc(desc);
     if ( err )
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 189e121..ccc8b04 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -719,7 +719,7 @@  static void __init ns16550_init_irq(struct serial_port *port)
     struct ns16550 *uart = port->uart;
 
     if ( uart->msi )
-        uart->irq = create_irq(0);
+        uart->irq = create_irq(0, NULL);
 #endif
 }
 
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index 4e76b26..50785e0 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -781,7 +781,7 @@  static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
     hw_irq_controller *handler;
     u16 control;
 
-    irq = create_irq(NUMA_NO_NODE);
+    irq = create_irq(NUMA_NO_NODE, NULL);
     if ( irq <= 0 )
     {
         dprintk(XENLOG_ERR, "IOMMU: no irqs\n");
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e886894..507b3d1 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -845,6 +845,9 @@  int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
+            ret = -EBUSY;
+            if ( pdev->domain && pdev->domain != hardware_domain )
+                break;
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8b27d7e..79f9682 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1138,7 +1138,8 @@  static int __init iommu_set_interrupt(struct acpi_drhd_unit *drhd)
     struct irq_desc *desc;
 
     irq = create_irq(rhsa ? pxm_to_node(rhsa->proximity_domain)
-                          : NUMA_NO_NODE);
+                          : NUMA_NO_NODE,
+                     NULL);
     if ( irq <= 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n");
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index c0c6e7c..5b24428 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -155,7 +155,7 @@  int  init_irq_data(void);
 void clear_irq_vector(int irq);
 
 int irq_to_vector(int irq);
-int create_irq(nodeid_t node);
+int create_irq(nodeid_t node, struct domain *d);
 void destroy_irq(unsigned int irq);
 int assign_irq_vector(int irq, const cpumask_t *);
 
diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
index 586b783..c7a6a83 100644
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -91,6 +91,7 @@  typedef struct irq_desc {
     spinlock_t lock;
     struct arch_irq_desc arch;
     cpumask_var_t affinity;
+    domid_t creator_domid; /* weak reference to domain handling this IRQ */
 
     /* irq ratelimit */
     s_time_t rl_quantum_start;