Message ID | 7d011094eed3f5c3cf6971cc8760874fd56ca443.1569379186.git-series.marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v7,1/4] libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use | expand |
On Wed, Sep 25, 2019 at 04:41:26AM +0200, Marek Marczykowski-Górecki wrote: > Stubdomains need to be given sufficient privilege over the guest which it > provides emulation for in order for PCI passthrough to work correctly. > When a HVM domain try to enable MSI, QEMU in stubdomain calls > PHYSDEVOP_map_pirq, but later it needs to call XEN_DOMCTL_bind_pt_irq as > part of xc_domain_update_msi_irq. Allow for that as part of > PHYSDEVOP_map_pirq. I would replace the last sentence with: "Give the stubdomain enough permissions over the mapped interrupt in order to bind it successfully to it's target domain." > > This is not needed for PCI INTx, because IRQ in that case is known > beforehand and the stubdomain is given permissions over this IRQ by > libxl__device_pci_add (there's a do_pci_add against the stubdomain). > > create_irq() already grant IRQ access to hardware_domain, with > assumption the device model (something managing this IRQ) lives there. I would remove the "(something managing this IRQ)", I think it makes the sentence harder to understand while not adding any valuable information. > Modify create_irq() to take additional parameter pointing at device > model domain - which may be dom0 or stubdomain. Save ID of the domain > given permission, to revoke it in destroy_irq() - easier and cleaner > than replaying logic of create_irq() parameter. Use domid instead of > actual reference to the domain, because it might get destroyed before > destroying IRQ (stubdomain is destroyed before its target domain). And > it is not an issue, because IRQ permissions live within domain > structure, so destroying a domain also implicitly revoke the permission. > Potential domid reuse is detected by by checking if that domain does ^ double by > have permission over the IRQ being destroyed. > > Then, adjust all callers to provide the parameter. In case of calls not > related to stubdomain-initiated allocations, give it either > hardware_domain (so the behavior is unchanged there), or NULL for > interrupts used by Xen internally. > > Inspired by https://github.com/OpenXT/xenclient-oe/blob/5e0e7304a5a3c75ef01240a1e3673665b2aaf05e/recipes-extended/xen/files/stubdomain-msi-irq-access.patch by Eric Chanudet <chanudete@ainfosec.com>. > > Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> I have some minor comments below, but the patch LGTM, so: Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> It would still be nice to get the missing bits (interrupt enabling), or else this patch is kind of pointless, since it still doesn't allow stubdomains to work correctly with passed through devices. > --- > Changes in v3: > - extend commit message > Changes in v4: > - add missing destroy_irq on error path > Changes in v5: > - move irq_{grant,revoke}_access() to {create,destroy}_irq(), which > basically make it a different patch > - add get_dm_domain() helper > - do not give hardware_domain permission over IRQs used in Xen > internally > - rename create_irq argument to just 'd', to avoid confusion > when it's called by hardware domain > - verify that device is de-assigned before pci_remove_device call > - save ID of domain given permission in create_irq(), to revoke it in > destroy_irq() > - drop domain parameter from destroy_irq() and msi_free_irq() > - do not give hardware domain permission over IRQ created in > iommu_set_interrupt() > Changes in v6: > - do not give permission over hpet irq to hardware_domain > - move creator_domid to arch_irq_desc > - fix creator_domid initialization > - always give current->domain permission instead of using > get_dm_domain() helper. Analysis of all its use cases tells that it is > the only value it returns. > - drop unrelated change > Changes in v7: > - Code style improvements (spaces, use %pd etc) > - use bool parameter to create_irq, as it's only getting > current->domain or NULL > - remove redundant irq_access_permitted() > --- > xen/arch/x86/hpet.c | 3 +- > xen/arch/x86/irq.c | 42 +++++++++++++++++-------- > xen/drivers/char/ns16550.c | 2 +- > xen/drivers/passthrough/amd/iommu_init.c | 2 +- > xen/drivers/passthrough/vtd/iommu.c | 3 +- > xen/include/asm-x86/irq.h | 7 +++- > 6 files changed, 41 insertions(+), 18 deletions(-) > > diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c > index 4b08488..57f68fa 100644 > --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c > @@ -11,6 +11,7 @@ > #include <xen/softirq.h> > #include <xen/irq.h> > #include <xen/numa.h> > +#include <xen/sched.h> > #include <asm/fixmap.h> > #include <asm/div64.h> > #include <asm/hpet.h> > @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) > { > int irq; > > - if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) > + if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 ) > return irq; > > ch->msi.irq = irq; > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > index 0ee3346..256dd02 100644 > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq) > /* > * Dynamic irq allocate and deallocation for MSI > */ > -int create_irq(nodeid_t node) > + Extra newline. > +int create_irq(nodeid_t node, bool grant_access) > { > int irq, ret; > struct irq_desc *desc; > @@ -282,18 +283,23 @@ int create_irq(nodeid_t node) > } > ret = assign_irq_vector(irq, mask); > } > + > + ASSERT(desc->arch.creator_domid == DOMID_INVALID); > + > if (ret < 0) > { > desc->arch.used = IRQ_UNUSED; > irq = ret; > } > - else if ( hardware_domain ) > + else if ( grant_access ) > { > - ret = irq_permit_access(hardware_domain, irq); > + ret = irq_permit_access(current->domain, irq); > if ( ret ) > printk(XENLOG_G_ERR > - "Could not grant Dom0 access to IRQ%d (error %d)\n", > - irq, ret); > + "Could not grant %pd access to IRQ%d (error %d)\n", > + current->domain, irq, ret); > + else > + desc->arch.creator_domid = current->domain->domain_id; > } > > return irq; > @@ -307,14 +313,23 @@ void destroy_irq(unsigned int irq) > > BUG_ON(!MSI_IRQ(irq)); > > - if ( hardware_domain ) > + if ( desc->arch.creator_domid != DOMID_INVALID ) > { > - int err = irq_deny_access(hardware_domain, irq); > + struct domain *d = get_domain_by_id(desc->arch.creator_domid); > > - if ( err ) > - printk(XENLOG_G_ERR > - "Could not revoke Dom0 access to IRQ%u (error %d)\n", > - irq, err); > + if ( d ) > + { > + int err = irq_deny_access(d, irq); > + if ( err ) > + printk(XENLOG_G_ERR > + "Could not revoke %pd access to IRQ%u (error %d)\n", > + d, irq, err); > + } > + > + if ( d ) > + put_domain(d); You can place the put_domain inside the previous if AFAICT, since it's the same condition. > diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h > index bc0c0c1..79853d0 100644 > --- a/xen/include/asm-x86/irq.h > +++ b/xen/include/asm-x86/irq.h > @@ -45,6 +45,11 @@ struct arch_irq_desc { > unsigned move_cleanup_count; > u8 move_in_progress : 1; > s8 used; > + /* > + * Weak reference to domain having permission over this IRQ (which can > + * be different from the domain actually havint the IRQ assigned) ^ having > + */ > + domid_t creator_domid; > }; > > /* For use with irq_desc.arch.used */ > @@ -161,7 +166,7 @@ int init_irq_data(void); > void clear_irq_vector(int irq); > > int irq_to_vector(int irq); > -int create_irq(nodeid_t node); I would add: /* * If grant_access is set the current domain is given permissions over * the created IRQ. */ > +int create_irq(nodeid_t node, bool grant_access); > void destroy_irq(unsigned int irq); > int assign_irq_vector(int irq, const cpumask_t *); Thanks, Roger.
On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > It would still be nice to get the missing bits (interrupt enabling), > or else this patch is kind of pointless, since it still doesn't allow > stubdomains to work correctly with passed through devices. Well, that part, as discussed, doesn't need to be in Xen. For example the solution deployed in current Qubes stable version is based on pciback for this purpose.
On 25.09.2019 11:41, Roger Pau Monné wrote: > On Wed, Sep 25, 2019 at 04:41:26AM +0200, Marek Marczykowski-Górecki wrote: >> --- a/xen/arch/x86/irq.c >> +++ b/xen/arch/x86/irq.c >> @@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq) >> /* >> * Dynamic irq allocate and deallocation for MSI >> */ >> -int create_irq(nodeid_t node) >> + > > Extra newline. > >> +int create_irq(nodeid_t node, bool grant_access) >> { >> int irq, ret; >> struct irq_desc *desc; I did notice this too (on an earlier version), and it was my understanding that the addition was deliberate - the comment is for more than just this one function. I wouldn't insist on either variant, i.e. I'm fine with the blank line added and I'm also fine with the addition dropped again. Jan
On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > It would still be nice to get the missing bits (interrupt enabling), > > or else this patch is kind of pointless, since it still doesn't allow > > stubdomains to work correctly with passed through devices. > > Well, that part, as discussed, doesn't need to be in Xen. For example > the solution deployed in current Qubes stable version is based on > pciback for this purpose. Ack. Do you think it would make sense to submit that part to Linux then? It would be nice to have this feature working upstream IMO, and will also allow Qubes to get rid of those patches. Thanks, Roger.
On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > It would still be nice to get the missing bits (interrupt enabling), > > > or else this patch is kind of pointless, since it still doesn't allow > > > stubdomains to work correctly with passed through devices. > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > the solution deployed in current Qubes stable version is based on > > pciback for this purpose. > > Ack. Do you think it would make sense to submit that part to Linux > then? How would an interface with toolstack (when to allow enabling MSI) should look like? Right now I have it as extra attribute in sysfs of pciback and libxl writes to it. Or rather should it be in xenstore? Or maybe pciback should somehow detect itself if it's talking to stubdomain while the device is assigned to a HVM domain, or to a target PV domain itself? The actual patch is here: https://github.com/QubesOS/qubes-linux-kernel/blob/master/0014-xen-pciback-add-attribute-to-allow-MSI-enable-flag-w.patch and the toolstack part: https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.8/patch-stubdom-allow-msi-enable.patch
On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > or else this patch is kind of pointless, since it still doesn't allow > > > > stubdomains to work correctly with passed through devices. > > > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > > the solution deployed in current Qubes stable version is based on > > > pciback for this purpose. > > > > Ack. Do you think it would make sense to submit that part to Linux > > then? > > How would an interface with toolstack (when to allow enabling MSI) > should look like? Right now I have it as extra attribute in sysfs of > pciback and libxl writes to it. Or rather should it be in xenstore? I think xenstore would be more suitable for this. There are already some flags passed to pciback there: msitranslate, power_mgmt and permissive (see libxl_pci.c:63). > Or maybe pciback should somehow detect itself if it's talking to > stubdomain while the device is assigned to a HVM domain, or to a target > PV domain itself? I think doing it in the toolstack and just passing an option to pciback is likely easier than adding more logic to pciback. Thanks, Roger.
On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote: > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > > or else this patch is kind of pointless, since it still doesn't allow > > > > > stubdomains to work correctly with passed through devices. > > > > > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > > > the solution deployed in current Qubes stable version is based on > > > > pciback for this purpose. > > > > > > Ack. Do you think it would make sense to submit that part to Linux > > > then? > > > > How would an interface with toolstack (when to allow enabling MSI) > > should look like? Right now I have it as extra attribute in sysfs of > > pciback and libxl writes to it. Or rather should it be in xenstore? > > I think xenstore would be more suitable for this. There are already > some flags passed to pciback there: msitranslate, power_mgmt and > permissive (see libxl_pci.c:63). Hmm, I see permissive is also set in sysfs (libxl_pci.c:pci_add_dm_done). And I think that is used by pciback, based on inspection of its source code. In fact, I don't see anything in pciback parsing opts-%d xenstore entry at all. It looks like it's only used by the toolstack to reconstruct libxl_device_pci struct from xenstore. > > Or maybe pciback should somehow detect itself if it's talking to > > stubdomain while the device is assigned to a HVM domain, or to a target > > PV domain itself? > > I think doing it in the toolstack and just passing an option to > pciback is likely easier than adding more logic to pciback. Agree.
On Sun, Sep 29, 2019 at 03:35:57AM +0200, Marek Marczykowski-Górecki wrote: > On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote: > > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > > > or else this patch is kind of pointless, since it still doesn't allow > > > > > > stubdomains to work correctly with passed through devices. BTW it is useful with permissive mode enabled.
On Sun, Sep 29, 2019 at 03:35:54AM +0200, Marek Marczykowski-Górecki wrote: > On Thu, Sep 26, 2019 at 09:10:17AM +0200, Roger Pau Monné wrote: > > On Thu, Sep 26, 2019 at 06:16:06AM +0200, Marek Marczykowski-Górecki wrote: > > > On Wed, Sep 25, 2019 at 03:26:17PM +0200, Roger Pau Monné wrote: > > > > On Wed, Sep 25, 2019 at 02:29:41PM +0200, Marek Marczykowski-Górecki wrote: > > > > > On Wed, Sep 25, 2019 at 11:41:50AM +0200, Roger Pau Monné wrote: > > > > > > It would still be nice to get the missing bits (interrupt enabling), > > > > > > or else this patch is kind of pointless, since it still doesn't allow > > > > > > stubdomains to work correctly with passed through devices. > > > > > > > > > > Well, that part, as discussed, doesn't need to be in Xen. For example > > > > > the solution deployed in current Qubes stable version is based on > > > > > pciback for this purpose. > > > > > > > > Ack. Do you think it would make sense to submit that part to Linux > > > > then? > > > > > > How would an interface with toolstack (when to allow enabling MSI) > > > should look like? Right now I have it as extra attribute in sysfs of > > > pciback and libxl writes to it. Or rather should it be in xenstore? > > > > I think xenstore would be more suitable for this. There are already > > some flags passed to pciback there: msitranslate, power_mgmt and > > permissive (see libxl_pci.c:63). > > Hmm, I see permissive is also set in sysfs > (libxl_pci.c:pci_add_dm_done). And I think that is used by pciback, > based on inspection of its source code. > In fact, I don't see anything in pciback parsing opts-%d xenstore entry > at all. It looks like it's only used by the toolstack to reconstruct > libxl_device_pci struct from xenstore. Then please use sysfs, using whatever mechanism is used by other options seems fine to me. Thanks, Roger.
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index 4b08488..57f68fa 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -11,6 +11,7 @@ #include <xen/softirq.h> #include <xen/irq.h> #include <xen/numa.h> +#include <xen/sched.h> #include <asm/fixmap.h> #include <asm/div64.h> #include <asm/hpet.h> @@ -368,7 +369,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) { int irq; - if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) + if ( (irq = create_irq(NUMA_NO_NODE, false)) < 0 ) return irq; ch->msi.irq = irq; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 0ee3346..256dd02 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -254,7 +254,8 @@ void __init clear_irq_vector(int irq) /* * Dynamic irq allocate and deallocation for MSI */ -int create_irq(nodeid_t node) + +int create_irq(nodeid_t node, bool grant_access) { int irq, ret; struct irq_desc *desc; @@ -282,18 +283,23 @@ int create_irq(nodeid_t node) } ret = assign_irq_vector(irq, mask); } + + ASSERT(desc->arch.creator_domid == DOMID_INVALID); + if (ret < 0) { desc->arch.used = IRQ_UNUSED; irq = ret; } - else if ( hardware_domain ) + else if ( grant_access ) { - ret = irq_permit_access(hardware_domain, irq); + ret = irq_permit_access(current->domain, irq); if ( ret ) printk(XENLOG_G_ERR - "Could not grant Dom0 access to IRQ%d (error %d)\n", - irq, ret); + "Could not grant %pd access to IRQ%d (error %d)\n", + current->domain, irq, ret); + else + desc->arch.creator_domid = current->domain->domain_id; } return irq; @@ -307,14 +313,23 @@ void destroy_irq(unsigned int irq) BUG_ON(!MSI_IRQ(irq)); - if ( hardware_domain ) + if ( desc->arch.creator_domid != DOMID_INVALID ) { - int err = irq_deny_access(hardware_domain, irq); + struct domain *d = get_domain_by_id(desc->arch.creator_domid); - if ( err ) - printk(XENLOG_G_ERR - "Could not revoke Dom0 access to IRQ%u (error %d)\n", - irq, err); + if ( d ) + { + int err = irq_deny_access(d, irq); + if ( err ) + printk(XENLOG_G_ERR + "Could not revoke %pd access to IRQ%u (error %d)\n", + d, irq, err); + } + + if ( d ) + put_domain(d); + + desc->arch.creator_domid = DOMID_INVALID; } spin_lock_irqsave(&desc->lock, flags); @@ -381,6 +396,7 @@ int arch_init_one_irq_desc(struct irq_desc *desc) desc->arch.vector = IRQ_VECTOR_UNASSIGNED; desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED; + desc->arch.creator_domid = DOMID_INVALID; return 0; } @@ -2133,7 +2149,7 @@ int map_domain_pirq( spin_unlock_irqrestore(&desc->lock, flags); info = NULL; - irq = create_irq(NUMA_NO_NODE); + irq = create_irq(NUMA_NO_NODE, true); ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info) : irq; if ( ret < 0 ) @@ -2818,7 +2834,7 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p, if ( irq == -1 ) { case MAP_PIRQ_TYPE_MULTI_MSI: - irq = create_irq(NUMA_NO_NODE); + irq = create_irq(NUMA_NO_NODE, true); } if ( irq < nr_irqs_gsi || irq >= nr_irqs ) diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c index 8667de6..fcd3979 100644 --- a/xen/drivers/char/ns16550.c +++ b/xen/drivers/char/ns16550.c @@ -722,7 +722,7 @@ static void __init ns16550_init_irq(struct serial_port *port) struct ns16550 *uart = port->uart; if ( uart->msi ) - uart->irq = create_irq(0); + uart->irq = create_irq(0, false); #endif } diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index bb9f33e..233a8ae 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -765,7 +765,7 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) { int irq, ret; - irq = create_irq(NUMA_NO_NODE); + irq = create_irq(NUMA_NO_NODE, false); if ( irq <= 0 ) { dprintk(XENLOG_ERR, "IOMMU: no irqs\n"); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 5d72270..24a1e92 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1138,7 +1138,8 @@ static int __init iommu_set_interrupt(struct acpi_drhd_unit *drhd) struct irq_desc *desc; irq = create_irq(rhsa ? pxm_to_node(rhsa->proximity_domain) - : NUMA_NO_NODE); + : NUMA_NO_NODE, + false); if ( irq <= 0 ) { dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n"); diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h index bc0c0c1..79853d0 100644 --- a/xen/include/asm-x86/irq.h +++ b/xen/include/asm-x86/irq.h @@ -45,6 +45,11 @@ struct arch_irq_desc { unsigned move_cleanup_count; u8 move_in_progress : 1; s8 used; + /* + * Weak reference to domain having permission over this IRQ (which can + * be different from the domain actually havint the IRQ assigned) + */ + domid_t creator_domid; }; /* For use with irq_desc.arch.used */ @@ -161,7 +166,7 @@ int init_irq_data(void); void clear_irq_vector(int irq); int irq_to_vector(int irq); -int create_irq(nodeid_t node); +int create_irq(nodeid_t node, bool grant_access); void destroy_irq(unsigned int irq); int assign_irq_vector(int irq, const cpumask_t *);