Message ID | 20170419151128.87416-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: > Note that currently there's no support for unbinding this interrupts. Do you plan to deal with that before this changes goes in? Aiui this not working means you can't pass through devices with pin based interrupts once Dom0 chose to bind to them. Otoh hand you modify pt_irq_destroy_bind(), so I'm a little puzzled ... > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -126,6 +126,34 @@ void hvm_pci_intx_deassert( > spin_unlock(&d->arch.hvm_domain.irq_lock); > } > > +void hvm_gsi_assert(struct domain *d, unsigned int gsi) > +{ > + struct hvm_irq *hvm_irq = hvm_domain_irq(d); > + > + ASSERT(gsi < hvm_irq->nr_gsis); This would probably better match the alternative construct in __hvm_pci_intx_assert(). > + ASSERT(!has_vpic(d)); This doesn't look to be relevant for the rest of the function. Is there a particular reason you've added it? If so, a brief comment might help. > + spin_lock(&d->arch.hvm_domain.irq_lock); > + if ( !hvm_irq->gsi_assert_count[gsi] ) > + { > + hvm_irq->gsi_assert_count[gsi]++; Why is this an increment instead of a simple write of 1? Or the other way around - why is this not always incrementing, just like __hvm_pci_intx_assert() does? (Symmetric questions then for hvm_gsi_deassert()). > @@ -274,10 +289,16 @@ int pt_irq_create_bind( > spin_lock(&d->event_lock); > > hvm_irq_dpci = domain_get_irq_dpci(d); > - if ( hvm_irq_dpci == NULL ) > + if ( hvm_irq_dpci == NULL && !is_hardware_domain(d) ) > { > unsigned int i; > > + /* > + * NB: the hardware domain doesn't use a hvm_irq_dpci struct because > + * it's only allowed to identity map GSIs, and so the data contained in > + * that struct (used to map guest GSIs into machine GSIs and perform > + * interrupt routing) it's completely useless for it. "...) is completely useless to it." > @@ -422,35 +443,51 @@ int pt_irq_create_bind( > case PT_IRQ_TYPE_PCI: > case PT_IRQ_TYPE_MSI_TRANSLATE: > { > - unsigned int bus = pt_irq_bind->u.pci.bus; > - unsigned int device = pt_irq_bind->u.pci.device; > - unsigned int intx = pt_irq_bind->u.pci.intx; > - unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); > - unsigned int link = hvm_pci_intx_link(device, intx); > - struct dev_intx_gsi_link *digl = xmalloc(struct dev_intx_gsi_link); > - struct hvm_girq_dpci_mapping *girq = > - xmalloc(struct hvm_girq_dpci_mapping); > + struct dev_intx_gsi_link *digl = NULL; > + struct hvm_girq_dpci_mapping *girq = NULL; > + unsigned int guest_gsi; > > - if ( !digl || !girq ) > + /* > + * Mapping GSIs for the hardware domain is different than doing it for > + * an unpriviledged guest, the hardware domain is only allowed to > + * identity map GSIs, and as such all the data in the u.pci union is > + * discarded. > + */ > + if ( !is_hardware_domain(d) ) > { > - spin_unlock(&d->event_lock); > - xfree(girq); > - xfree(digl); > - return -ENOMEM; > - } > + unsigned int link; > + > + digl = xmalloc(struct dev_intx_gsi_link); > + girq = xmalloc(struct hvm_girq_dpci_mapping); > + > + if ( !digl || !girq ) > + { > + spin_unlock(&d->event_lock); > + xfree(girq); > + xfree(digl); > + return -ENOMEM; > + } > + > + girq->bus = digl->bus = pt_irq_bind->u.pci.bus; > + girq->device = digl->device = pt_irq_bind->u.pci.device; > + girq->intx = digl->intx = pt_irq_bind->u.pci.intx; > + list_add_tail(&digl->list, &pirq_dpci->digl_list); > > - hvm_irq_dpci->link_cnt[link]++; > + guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx); > + link = hvm_pci_intx_link(digl->device, digl->intx); > > - digl->bus = bus; > - digl->device = device; > - digl->intx = intx; > - list_add_tail(&digl->list, &pirq_dpci->digl_list); > + hvm_irq_dpci->link_cnt[link]++; > > - girq->bus = bus; > - girq->device = device; > - girq->intx = intx; > - girq->machine_gsi = pirq; > - list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); > + girq->machine_gsi = pirq; > + list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); > + } > + else > + { > + /* MSI_TRANSLATE is not supported by the hardware domain. */ > + ASSERT(pt_irq_bind->irq_type == PT_IRQ_TYPE_PCI); > + guest_gsi = pirq; > + ASSERT(guest_gsi < hvm_domain_irq(d)->nr_gsis); > + } This is dangerous: For one it is impossible to judge the correctness of at least the first of these assertions for the hwdom case without looking at patch 3. And then the domctl path leading here does not exclude the subject domain equaling the calling one, i.e. you potentially assert guest input correctness here. Yes, we have XSA-77 in place, but no, we shouldn't introduce new issues anywhere unless that's entirely unavoidable. > @@ -504,10 +567,18 @@ int pt_irq_create_bind( > spin_unlock(&d->event_lock); > > if ( iommu_verbose ) > - printk(XENLOG_G_INFO > - "d%d: bind: m_gsi=%u g_gsi=%u dev=%02x.%02x.%u intx=%u\n", > - d->domain_id, pirq, guest_gsi, bus, > - PCI_SLOT(device), PCI_FUNC(device), intx); > + { > + char buf[50]; > + > + if ( !is_hardware_domain(d) ) > + snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u", > + digl->bus, PCI_SLOT(digl->device), > + PCI_FUNC(digl->device), digl->intx); The buffer array seems heavily over-sized - my counting gives at best slightly over 20 characters you actually need. > @@ -522,7 +593,6 @@ int pt_irq_create_bind( > int pt_irq_destroy_bind( > struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind) > { > - struct hvm_irq_dpci *hvm_irq_dpci; > struct hvm_pirq_dpci *pirq_dpci; > unsigned int machine_gsi = pt_irq_bind->machine_irq; > struct pirq *pirq; > @@ -552,17 +622,15 @@ int pt_irq_destroy_bind( > > spin_lock(&d->event_lock); > > - hvm_irq_dpci = domain_get_irq_dpci(d); > - > - if ( hvm_irq_dpci == NULL ) > + pirq = pirq_info(d, machine_gsi); > + pirq_dpci = pirq_dpci(pirq); > + if ( pirq_dpci->flags & HVM_IRQ_DPCI_IDENTITY_GSI ) I'm sure we've discussed aspects of this before: pirq_dpci may be NULL here, i.e. you can't blindly dereference it. All other uses in the function indeed have a NULL check first. > @@ -570,9 +638,16 @@ int pt_irq_destroy_bind( > unsigned int intx = pt_irq_bind->u.pci.intx; > unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); > unsigned int link = hvm_pci_intx_link(device, intx); > + struct hvm_irq_dpci *hvm_irq_dpci = domain_get_irq_dpci(d); > struct hvm_girq_dpci_mapping *girq; > struct dev_intx_gsi_link *digl, *tmp; > > + if ( hvm_irq_dpci == NULL ) > + { > + spin_unlock(&d->event_lock); > + return -EINVAL; > + } Moving this check here is a behavioral modification. Perhaps a good one, yet it doesn't belong into this patch. Instead it should be properly reasoned about in a separate patch, if it is a correct thing to do. > @@ -814,17 +896,12 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) > spin_unlock(&d->event_lock); > } > > -static void __hvm_dpci_eoi(struct domain *d, > - const struct hvm_girq_dpci_mapping *girq, > +static void __hvm_pirq_eoi(struct pirq *pirq, Please drop the double leading underscores. > const union vioapic_redir_entry *ent) > { > - struct pirq *pirq = pirq_info(d, girq->machine_gsi); > - struct hvm_pirq_dpci *pirq_dpci; > - > - if ( !hvm_domain_use_pirq(d, pirq) ) > - hvm_pci_intx_deassert(d, girq->device, girq->intx); > + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > > - pirq_dpci = pirq_dpci(pirq); > + ASSERT(pirq_dpci); Why is this useful / needed all of the sudden? > @@ -839,6 +916,32 @@ static void __hvm_dpci_eoi(struct domain *d, > pirq_guest_eoi(pirq); > } > > +static void __hvm_dpci_eoi(struct domain *d, > + const struct hvm_girq_dpci_mapping *girq, > + const union vioapic_redir_entry *ent) > +{ > + struct pirq *pirq = pirq_info(d, girq->machine_gsi); > + > + if ( !hvm_domain_use_pirq(d, pirq) ) > + hvm_pci_intx_deassert(d, girq->device, girq->intx); > + > + __hvm_pirq_eoi(pirq, ent); > +} > + > +static void __hvm_gsi_eoi(struct domain *d, unsigned int gsi, Same here for the double underscores. For the pre-existing function I'd leave it up to you whether to also drop them. What I care about is that we don't gain new non-conforming names. > + const union vioapic_redir_entry *ent) > +{ > + struct pirq *pirq = pirq_info(d, gsi); > + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > + > + /* Check if GSI is actually mapped. */ > + if ( !pirq_dpci ) Please avoid the local variable when used just once here. Jan
On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: > >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: > > Note that currently there's no support for unbinding this interrupts. > > Do you plan to deal with that before this changes goes in? Aiui this > not working means you can't pass through devices with pin based > interrupts once Dom0 chose to bind to them. Otoh hand you modify > pt_irq_destroy_bind(), so I'm a little puzzled ... Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to unbind such an interrupt. I can implement the unbind, but it's not going to be used ATM. > > --- a/xen/arch/x86/hvm/irq.c > > +++ b/xen/arch/x86/hvm/irq.c > > @@ -126,6 +126,34 @@ void hvm_pci_intx_deassert( > > spin_unlock(&d->arch.hvm_domain.irq_lock); > > } > > > > +void hvm_gsi_assert(struct domain *d, unsigned int gsi) > > +{ > > + struct hvm_irq *hvm_irq = hvm_domain_irq(d); > > + > > + ASSERT(gsi < hvm_irq->nr_gsis); > > This would probably better match the alternative construct in > __hvm_pci_intx_assert(). OK, changed. > > + ASSERT(!has_vpic(d)); > > This doesn't look to be relevant for the rest of the function. Is there > a particular reason you've added it? If so, a brief comment might > help. I've added this because hvm_gsi_assert doesn't call assert_irq, so a PIC would not work properly, but it's probably pointless. > > + spin_lock(&d->arch.hvm_domain.irq_lock); > > + if ( !hvm_irq->gsi_assert_count[gsi] ) > > + { > > + hvm_irq->gsi_assert_count[gsi]++; > > Why is this an increment instead of a simple write of 1? Or the > other way around - why is this not always incrementing, just like > __hvm_pci_intx_assert() does? (Symmetric questions then for > hvm_gsi_deassert()). __hvm_pci_intx_{de}assert has an array that tracks the status of each interrupt line, and Xen does the routing based on that (the __test_and_clear_bit at the top of __hvm_pci_intx_assert). That prevents the same line from triggering multiple times, which is not available here, and thus Xen needs to rely on gsi_assert_count in order to know if the GSI is pending or not. Switched to use a set to 1/0 instead of the increment, which I agree makes this clearer. > > > @@ -274,10 +289,16 @@ int pt_irq_create_bind( > > spin_lock(&d->event_lock); > > > > hvm_irq_dpci = domain_get_irq_dpci(d); > > - if ( hvm_irq_dpci == NULL ) > > + if ( hvm_irq_dpci == NULL && !is_hardware_domain(d) ) > > { > > unsigned int i; > > > > + /* > > + * NB: the hardware domain doesn't use a hvm_irq_dpci struct because > > + * it's only allowed to identity map GSIs, and so the data contained in > > + * that struct (used to map guest GSIs into machine GSIs and perform > > + * interrupt routing) it's completely useless for it. > > "...) is completely useless to it." Fixed, thanks. > > @@ -422,35 +443,51 @@ int pt_irq_create_bind( > > case PT_IRQ_TYPE_PCI: > > case PT_IRQ_TYPE_MSI_TRANSLATE: > > { > > - unsigned int bus = pt_irq_bind->u.pci.bus; > > - unsigned int device = pt_irq_bind->u.pci.device; > > - unsigned int intx = pt_irq_bind->u.pci.intx; > > - unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); > > - unsigned int link = hvm_pci_intx_link(device, intx); > > - struct dev_intx_gsi_link *digl = xmalloc(struct dev_intx_gsi_link); > > - struct hvm_girq_dpci_mapping *girq = > > - xmalloc(struct hvm_girq_dpci_mapping); > > + struct dev_intx_gsi_link *digl = NULL; > > + struct hvm_girq_dpci_mapping *girq = NULL; > > + unsigned int guest_gsi; > > > > - if ( !digl || !girq ) > > + /* > > + * Mapping GSIs for the hardware domain is different than doing it for > > + * an unpriviledged guest, the hardware domain is only allowed to > > + * identity map GSIs, and as such all the data in the u.pci union is > > + * discarded. > > + */ > > + if ( !is_hardware_domain(d) ) > > { > > - spin_unlock(&d->event_lock); > > - xfree(girq); > > - xfree(digl); > > - return -ENOMEM; > > - } > > + unsigned int link; > > + > > + digl = xmalloc(struct dev_intx_gsi_link); > > + girq = xmalloc(struct hvm_girq_dpci_mapping); > > + > > + if ( !digl || !girq ) > > + { > > + spin_unlock(&d->event_lock); > > + xfree(girq); > > + xfree(digl); > > + return -ENOMEM; > > + } > > + > > + girq->bus = digl->bus = pt_irq_bind->u.pci.bus; > > + girq->device = digl->device = pt_irq_bind->u.pci.device; > > + girq->intx = digl->intx = pt_irq_bind->u.pci.intx; > > + list_add_tail(&digl->list, &pirq_dpci->digl_list); > > > > - hvm_irq_dpci->link_cnt[link]++; > > + guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx); > > + link = hvm_pci_intx_link(digl->device, digl->intx); > > > > - digl->bus = bus; > > - digl->device = device; > > - digl->intx = intx; > > - list_add_tail(&digl->list, &pirq_dpci->digl_list); > > + hvm_irq_dpci->link_cnt[link]++; > > > > - girq->bus = bus; > > - girq->device = device; > > - girq->intx = intx; > > - girq->machine_gsi = pirq; > > - list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); > > + girq->machine_gsi = pirq; > > + list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); > > + } > > + else > > + { > > + /* MSI_TRANSLATE is not supported by the hardware domain. */ > > + ASSERT(pt_irq_bind->irq_type == PT_IRQ_TYPE_PCI); > > + guest_gsi = pirq; > > + ASSERT(guest_gsi < hvm_domain_irq(d)->nr_gsis); > > + } > > This is dangerous: For one it is impossible to judge the correctness > of at least the first of these assertions for the hwdom case without > looking at patch 3. And then the domctl path leading here does not > exclude the subject domain equaling the calling one, i.e. you > potentially assert guest input correctness here. Yes, we have XSA-77 > in place, but no, we shouldn't introduce new issues anywhere unless > that's entirely unavoidable. OK, let me return error instead to be on the safe side then. > > @@ -504,10 +567,18 @@ int pt_irq_create_bind( > > spin_unlock(&d->event_lock); > > > > if ( iommu_verbose ) > > - printk(XENLOG_G_INFO > > - "d%d: bind: m_gsi=%u g_gsi=%u dev=%02x.%02x.%u intx=%u\n", > > - d->domain_id, pirq, guest_gsi, bus, > > - PCI_SLOT(device), PCI_FUNC(device), intx); > > + { > > + char buf[50]; > > + > > + if ( !is_hardware_domain(d) ) > > + snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u", > > + digl->bus, PCI_SLOT(digl->device), > > + PCI_FUNC(digl->device), digl->intx); > > The buffer array seems heavily over-sized - my counting gives at best > slightly over 20 characters you actually need. AFAICT max length should be 21, would you be fine with me setting it to 24 to be safe? > > > @@ -522,7 +593,6 @@ int pt_irq_create_bind( > > int pt_irq_destroy_bind( > > struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind) > > { > > - struct hvm_irq_dpci *hvm_irq_dpci; > > struct hvm_pirq_dpci *pirq_dpci; > > unsigned int machine_gsi = pt_irq_bind->machine_irq; > > struct pirq *pirq; > > @@ -552,17 +622,15 @@ int pt_irq_destroy_bind( > > > > spin_lock(&d->event_lock); > > > > - hvm_irq_dpci = domain_get_irq_dpci(d); > > - > > - if ( hvm_irq_dpci == NULL ) > > + pirq = pirq_info(d, machine_gsi); > > + pirq_dpci = pirq_dpci(pirq); > > + if ( pirq_dpci->flags & HVM_IRQ_DPCI_IDENTITY_GSI ) > > I'm sure we've discussed aspects of this before: pirq_dpci may be > NULL here, i.e. you can't blindly dereference it. All other uses in > the function indeed have a NULL check first. OK, I've removed this and fixed the function so it can unbind HVM_IRQ_DPCI_IDENTITY_GSI. > > @@ -570,9 +638,16 @@ int pt_irq_destroy_bind( > > unsigned int intx = pt_irq_bind->u.pci.intx; > > unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); > > unsigned int link = hvm_pci_intx_link(device, intx); > > + struct hvm_irq_dpci *hvm_irq_dpci = domain_get_irq_dpci(d); > > struct hvm_girq_dpci_mapping *girq; > > struct dev_intx_gsi_link *digl, *tmp; > > > > + if ( hvm_irq_dpci == NULL ) > > + { > > + spin_unlock(&d->event_lock); > > + return -EINVAL; > > + } > > Moving this check here is a behavioral modification. Perhaps a > good one, yet it doesn't belong into this patch. Instead it should > be properly reasoned about in a separate patch, if it is a correct > thing to do. I've left it in it's previous place and added a !is_hardware_domain check. > > @@ -814,17 +896,12 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) > > spin_unlock(&d->event_lock); > > } > > > > -static void __hvm_dpci_eoi(struct domain *d, > > - const struct hvm_girq_dpci_mapping *girq, > > +static void __hvm_pirq_eoi(struct pirq *pirq, > > Please drop the double leading underscores. Done. > > > const union vioapic_redir_entry *ent) > > { > > - struct pirq *pirq = pirq_info(d, girq->machine_gsi); > > - struct hvm_pirq_dpci *pirq_dpci; > > - > > - if ( !hvm_domain_use_pirq(d, pirq) ) > > - hvm_pci_intx_deassert(d, girq->device, girq->intx); > > + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > > > > - pirq_dpci = pirq_dpci(pirq); > > + ASSERT(pirq_dpci); > > Why is this useful / needed all of the sudden? Hm, I don't think it's needed, probably just a leftover from when I was testing the patches. > > @@ -839,6 +916,32 @@ static void __hvm_dpci_eoi(struct domain *d, > > pirq_guest_eoi(pirq); > > } > > > > +static void __hvm_dpci_eoi(struct domain *d, > > + const struct hvm_girq_dpci_mapping *girq, > > + const union vioapic_redir_entry *ent) > > +{ > > + struct pirq *pirq = pirq_info(d, girq->machine_gsi); > > + > > + if ( !hvm_domain_use_pirq(d, pirq) ) > > + hvm_pci_intx_deassert(d, girq->device, girq->intx); > > + > > + __hvm_pirq_eoi(pirq, ent); > > +} > > + > > +static void __hvm_gsi_eoi(struct domain *d, unsigned int gsi, > > Same here for the double underscores. For the pre-existing > function I'd leave it up to you whether to also drop them. What > I care about is that we don't gain new non-conforming names. I will leave the others as they are, or else they should be changed in a separate patch. > > + const union vioapic_redir_entry *ent) > > +{ > > + struct pirq *pirq = pirq_info(d, gsi); > > + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); > > + > > + /* Check if GSI is actually mapped. */ > > + if ( !pirq_dpci ) > > Please avoid the local variable when used just once here. Done. Thanks for the review, Roger.
>>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: >> > Note that currently there's no support for unbinding this interrupts. >> >> Do you plan to deal with that before this changes goes in? Aiui this >> not working means you can't pass through devices with pin based >> interrupts once Dom0 chose to bind to them. Otoh hand you modify >> pt_irq_destroy_bind(), so I'm a little puzzled ... > > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to unbind > such an interrupt. I can implement the unbind, but it's not going to be used > ATM. Is it not? I can see the mentioned pass-through case to be of no interest, but wouldn't a well behaved kernel perhaps unmap IRQs while shutting down? >> > + spin_lock(&d->arch.hvm_domain.irq_lock); >> > + if ( !hvm_irq->gsi_assert_count[gsi] ) >> > + { >> > + hvm_irq->gsi_assert_count[gsi]++; >> >> Why is this an increment instead of a simple write of 1? Or the >> other way around - why is this not always incrementing, just like >> __hvm_pci_intx_assert() does? (Symmetric questions then for >> hvm_gsi_deassert()). > > __hvm_pci_intx_{de}assert has an array that tracks the status of each interrupt > line, and Xen does the routing based on that (the __test_and_clear_bit at the > top of __hvm_pci_intx_assert). That prevents the same line from triggering > multiple times, which is not available here, and thus Xen needs to rely on > gsi_assert_count in order to know if the GSI is pending or not. > > Switched to use a set to 1/0 instead of the increment, which I agree makes this > clearer. And altogether this likely would benefit from a comment put somewhere. >> > @@ -504,10 +567,18 @@ int pt_irq_create_bind( >> > spin_unlock(&d->event_lock); >> > >> > if ( iommu_verbose ) >> > - printk(XENLOG_G_INFO >> > - "d%d: bind: m_gsi=%u g_gsi=%u dev=%02x.%02x.%u intx=%u\n", >> > - d->domain_id, pirq, guest_gsi, bus, >> > - PCI_SLOT(device), PCI_FUNC(device), intx); >> > + { >> > + char buf[50]; >> > + >> > + if ( !is_hardware_domain(d) ) >> > + snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u", >> > + digl->bus, PCI_SLOT(digl->device), >> > + PCI_FUNC(digl->device), digl->intx); >> >> The buffer array seems heavily over-sized - my counting gives at best >> slightly over 20 characters you actually need. > > AFAICT max length should be 21, would you be fine with me setting it to 24 to > be safe? Sure. Jan
On Tue, May 16, 2017 at 10:23:48AM -0600, Jan Beulich wrote: > >>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: > > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: > >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: > >> > Note that currently there's no support for unbinding this interrupts. > >> > >> Do you plan to deal with that before this changes goes in? Aiui this > >> not working means you can't pass through devices with pin based > >> interrupts once Dom0 chose to bind to them. Otoh hand you modify > >> pt_irq_destroy_bind(), so I'm a little puzzled ... > > > > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to unbind > > such an interrupt. I can implement the unbind, but it's not going to be used > > ATM. > > Is it not? I can see the mentioned pass-through case to be of no > interest, but wouldn't a well behaved kernel perhaps unmap IRQs > while shutting down? I guess I haven't explained myself correctly, what I meant is that right now I don't have any use-case for this, I haven't started working on pci-passtrhough for guests, and the Dom0 implementation I have doesn't unbind interrupts on shutdown. I could unbind them when Dom0 masks the vIO APIC pin, but I think that's going to be awfully slow. > >> > + spin_lock(&d->arch.hvm_domain.irq_lock); > >> > + if ( !hvm_irq->gsi_assert_count[gsi] ) > >> > + { > >> > + hvm_irq->gsi_assert_count[gsi]++; > >> > >> Why is this an increment instead of a simple write of 1? Or the > >> other way around - why is this not always incrementing, just like > >> __hvm_pci_intx_assert() does? (Symmetric questions then for > >> hvm_gsi_deassert()). > > > > __hvm_pci_intx_{de}assert has an array that tracks the status of each interrupt > > line, and Xen does the routing based on that (the __test_and_clear_bit at the > > top of __hvm_pci_intx_assert). That prevents the same line from triggering > > multiple times, which is not available here, and thus Xen needs to rely on > > gsi_assert_count in order to know if the GSI is pending or not. > > > > Switched to use a set to 1/0 instead of the increment, which I agree makes this > > clearer. > > And altogether this likely would benefit from a comment put > somewhere. Done. Thanks.
>>> On 17.05.17 at 10:26, <roger.pau@citrix.com> wrote: > On Tue, May 16, 2017 at 10:23:48AM -0600, Jan Beulich wrote: >> >>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: >> > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: >> >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: >> >> > Note that currently there's no support for unbinding this interrupts. >> >> >> >> Do you plan to deal with that before this changes goes in? Aiui this >> >> not working means you can't pass through devices with pin based >> >> interrupts once Dom0 chose to bind to them. Otoh hand you modify >> >> pt_irq_destroy_bind(), so I'm a little puzzled ... >> > >> > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to unbind >> > such an interrupt. I can implement the unbind, but it's not going to be used >> > ATM. >> >> Is it not? I can see the mentioned pass-through case to be of no >> interest, but wouldn't a well behaved kernel perhaps unmap IRQs >> while shutting down? > > I guess I haven't explained myself correctly, what I meant is that right now I > don't have any use-case for this, I haven't started working on pci-passtrhough > for guests, and the Dom0 implementation I have doesn't unbind interrupts on > shutdown. > > I could unbind them when Dom0 masks the vIO APIC pin, but I think that's > going to be awfully slow. Well, doesn't this point out another weakness of the no-physdevops model you advocate for? Implying an unbind from the mask bit being set in an RTE would certainly be undesirable (there are reasons to transiently mask an interrupt, after all). Hence there's no way Dom0 could indicate "I'm done with this interrupt", unless I'm missing something. Jan
On Wed, May 17, 2017 at 03:02:26AM -0600, Jan Beulich wrote: > >>> On 17.05.17 at 10:26, <roger.pau@citrix.com> wrote: > > On Tue, May 16, 2017 at 10:23:48AM -0600, Jan Beulich wrote: > >> >>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: > >> > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: > >> >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: > >> >> > Note that currently there's no support for unbinding this interrupts. > >> >> > >> >> Do you plan to deal with that before this changes goes in? Aiui this > >> >> not working means you can't pass through devices with pin based > >> >> interrupts once Dom0 chose to bind to them. Otoh hand you modify > >> >> pt_irq_destroy_bind(), so I'm a little puzzled ... > >> > > >> > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to unbind > >> > such an interrupt. I can implement the unbind, but it's not going to be used > >> > ATM. > >> > >> Is it not? I can see the mentioned pass-through case to be of no > >> interest, but wouldn't a well behaved kernel perhaps unmap IRQs > >> while shutting down? > > > > I guess I haven't explained myself correctly, what I meant is that right now I > > don't have any use-case for this, I haven't started working on pci-passtrhough > > for guests, and the Dom0 implementation I have doesn't unbind interrupts on > > shutdown. > > > > I could unbind them when Dom0 masks the vIO APIC pin, but I think that's > > going to be awfully slow. > > Well, doesn't this point out another weakness of the no-physdevops > model you advocate for? Implying an unbind from the mask bit being > set in an RTE would certainly be undesirable (there are reasons to > transiently mask an interrupt, after all). Hence there's no way Dom0 > could indicate "I'm done with this interrupt", unless I'm missing > something. The only reason I could see for the hardware domain to unbind an interrupt is when doing pci-passthrough, and there the toolstack is involved, which could unbind the interrupt using the already existing hypercalls. Just to be clear, my no-physdevops thing is about how guests interact with physical devices when using them in a native way. Things like pci-passthrough (that are not part of how an OS operates) should indeed use hypercalls (because there's no native way to do this). Roger.
>>> On 17.05.17 at 11:34, <roger.pau@citrix.com> wrote: > On Wed, May 17, 2017 at 03:02:26AM -0600, Jan Beulich wrote: >> >>> On 17.05.17 at 10:26, <roger.pau@citrix.com> wrote: >> > On Tue, May 16, 2017 at 10:23:48AM -0600, Jan Beulich wrote: >> >> >>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: >> >> > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: >> >> >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: >> >> >> > Note that currently there's no support for unbinding this interrupts. >> >> >> >> >> >> Do you plan to deal with that before this changes goes in? Aiui this >> >> >> not working means you can't pass through devices with pin based >> >> >> interrupts once Dom0 chose to bind to them. Otoh hand you modify >> >> >> pt_irq_destroy_bind(), so I'm a little puzzled ... >> >> > >> >> > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to > unbind >> >> > such an interrupt. I can implement the unbind, but it's not going to be > used >> >> > ATM. >> >> >> >> Is it not? I can see the mentioned pass-through case to be of no >> >> interest, but wouldn't a well behaved kernel perhaps unmap IRQs >> >> while shutting down? >> > >> > I guess I haven't explained myself correctly, what I meant is that right > now I >> > don't have any use-case for this, I haven't started working on > pci-passtrhough >> > for guests, and the Dom0 implementation I have doesn't unbind interrupts on >> > shutdown. >> > >> > I could unbind them when Dom0 masks the vIO APIC pin, but I think that's >> > going to be awfully slow. >> >> Well, doesn't this point out another weakness of the no-physdevops >> model you advocate for? Implying an unbind from the mask bit being >> set in an RTE would certainly be undesirable (there are reasons to >> transiently mask an interrupt, after all). Hence there's no way Dom0 >> could indicate "I'm done with this interrupt", unless I'm missing >> something. > > The only reason I could see for the hardware domain to unbind an interrupt is > when doing pci-passthrough, and there the toolstack is involved, which could > unbind the interrupt using the already existing hypercalls. I'm not convinced the tool stack doing this behind the back of the kernel is an acceptable thing to do, even more so when thinking of shared interrupts. Jan
On Wed, May 17, 2017 at 03:51:43AM -0600, Jan Beulich wrote: > >>> On 17.05.17 at 11:34, <roger.pau@citrix.com> wrote: > > On Wed, May 17, 2017 at 03:02:26AM -0600, Jan Beulich wrote: > >> >>> On 17.05.17 at 10:26, <roger.pau@citrix.com> wrote: > >> > On Tue, May 16, 2017 at 10:23:48AM -0600, Jan Beulich wrote: > >> >> >>> On 16.05.17 at 17:55, <roger.pau@citrix.com> wrote: > >> >> > On Fri, May 12, 2017 at 07:42:51AM -0600, Jan Beulich wrote: > >> >> >> >>> On 19.04.17 at 17:11, <roger.pau@citrix.com> wrote: > >> >> >> > Note that currently there's no support for unbinding this interrupts. > >> >> >> > >> >> >> Do you plan to deal with that before this changes goes in? Aiui this > >> >> >> not working means you can't pass through devices with pin based > >> >> >> interrupts once Dom0 chose to bind to them. Otoh hand you modify > >> >> >> pt_irq_destroy_bind(), so I'm a little puzzled ... > >> >> > > >> >> > Yes, I modify pt_irq_destroy_bind to return EOPNOTSUPP when trying to > > unbind > >> >> > such an interrupt. I can implement the unbind, but it's not going to be > > used > >> >> > ATM. > >> >> > >> >> Is it not? I can see the mentioned pass-through case to be of no > >> >> interest, but wouldn't a well behaved kernel perhaps unmap IRQs > >> >> while shutting down? > >> > > >> > I guess I haven't explained myself correctly, what I meant is that right > > now I > >> > don't have any use-case for this, I haven't started working on > > pci-passtrhough > >> > for guests, and the Dom0 implementation I have doesn't unbind interrupts on > >> > shutdown. > >> > > >> > I could unbind them when Dom0 masks the vIO APIC pin, but I think that's > >> > going to be awfully slow. > >> > >> Well, doesn't this point out another weakness of the no-physdevops > >> model you advocate for? Implying an unbind from the mask bit being > >> set in an RTE would certainly be undesirable (there are reasons to > >> transiently mask an interrupt, after all). Hence there's no way Dom0 > >> could indicate "I'm done with this interrupt", unless I'm missing > >> something. > > > > The only reason I could see for the hardware domain to unbind an interrupt is > > when doing pci-passthrough, and there the toolstack is involved, which could > > unbind the interrupt using the already existing hypercalls. > > I'm not convinced the tool stack doing this behind the back of the > kernel is an acceptable thing to do, even more so when thinking > of shared interrupts. The toolstack could figure out whether the interrupt is shared or not (edge or level triggered) and then decide whether it needs unbinding or not. The kernel must obviously not be using the device by that point. I'm again not opposed to have an in-kernel (for Dom0) driver for managing passthrough, but I would like to avoid it if possible. In any case, I think the interface for how Dom0 interacts with interrupts from it's assigned devices vs the interface used to detach devices for pci-passthrough is orthogonal (one can be completely separated from the other). Roger.
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 86255847a6..cb9084e3f3 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -126,6 +126,34 @@ void hvm_pci_intx_deassert( spin_unlock(&d->arch.hvm_domain.irq_lock); } +void hvm_gsi_assert(struct domain *d, unsigned int gsi) +{ + struct hvm_irq *hvm_irq = hvm_domain_irq(d); + + ASSERT(gsi < hvm_irq->nr_gsis); + ASSERT(!has_vpic(d)); + spin_lock(&d->arch.hvm_domain.irq_lock); + if ( !hvm_irq->gsi_assert_count[gsi] ) + { + hvm_irq->gsi_assert_count[gsi]++; + assert_gsi(d, gsi); + } + spin_unlock(&d->arch.hvm_domain.irq_lock); +} + +void hvm_gsi_deassert(struct domain *d, unsigned int gsi) +{ + struct hvm_irq *hvm_irq = hvm_domain_irq(d); + + ASSERT(gsi < hvm_irq->nr_gsis); + ASSERT(!has_vpic(d)); + spin_lock(&d->arch.hvm_domain.irq_lock); + if ( hvm_irq->gsi_assert_count[gsi] ) + hvm_irq->gsi_assert_count[gsi]--; + ASSERT(!hvm_irq->gsi_assert_count[gsi]); + spin_unlock(&d->arch.hvm_domain.irq_lock); +} + void hvm_isa_irq_assert( struct domain *d, unsigned int isa_irq) { diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index e5a43e508f..1c89ae1642 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -164,6 +164,20 @@ static void pt_irq_time_out(void *data) spin_lock(&irq_map->dom->event_lock); + if ( irq_map->flags & HVM_IRQ_DPCI_IDENTITY_GSI ) + { + struct pirq *pirq = dpci_pirq(irq_map); + + ASSERT(is_hardware_domain(irq_map->dom)); + /* + * Identity mapped, no need to iterate over the guest GSI list to find + * other pirqs sharing the same guest GSI. + */ + irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH; + hvm_gsi_deassert(irq_map->dom, pirq->pirq); + goto out; + } + dpci = domain_get_irq_dpci(irq_map->dom); if ( unlikely(!dpci) ) { @@ -185,6 +199,7 @@ static void pt_irq_time_out(void *data) hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx); } + out: pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL); spin_unlock(&irq_map->dom->event_lock); @@ -274,10 +289,16 @@ int pt_irq_create_bind( spin_lock(&d->event_lock); hvm_irq_dpci = domain_get_irq_dpci(d); - if ( hvm_irq_dpci == NULL ) + if ( hvm_irq_dpci == NULL && !is_hardware_domain(d) ) { unsigned int i; + /* + * NB: the hardware domain doesn't use a hvm_irq_dpci struct because + * it's only allowed to identity map GSIs, and so the data contained in + * that struct (used to map guest GSIs into machine GSIs and perform + * interrupt routing) it's completely useless for it. + */ hvm_irq_dpci = xzalloc(struct hvm_irq_dpci); if ( hvm_irq_dpci == NULL ) { @@ -422,35 +443,51 @@ int pt_irq_create_bind( case PT_IRQ_TYPE_PCI: case PT_IRQ_TYPE_MSI_TRANSLATE: { - unsigned int bus = pt_irq_bind->u.pci.bus; - unsigned int device = pt_irq_bind->u.pci.device; - unsigned int intx = pt_irq_bind->u.pci.intx; - unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); - unsigned int link = hvm_pci_intx_link(device, intx); - struct dev_intx_gsi_link *digl = xmalloc(struct dev_intx_gsi_link); - struct hvm_girq_dpci_mapping *girq = - xmalloc(struct hvm_girq_dpci_mapping); + struct dev_intx_gsi_link *digl = NULL; + struct hvm_girq_dpci_mapping *girq = NULL; + unsigned int guest_gsi; - if ( !digl || !girq ) + /* + * Mapping GSIs for the hardware domain is different than doing it for + * an unpriviledged guest, the hardware domain is only allowed to + * identity map GSIs, and as such all the data in the u.pci union is + * discarded. + */ + if ( !is_hardware_domain(d) ) { - spin_unlock(&d->event_lock); - xfree(girq); - xfree(digl); - return -ENOMEM; - } + unsigned int link; + + digl = xmalloc(struct dev_intx_gsi_link); + girq = xmalloc(struct hvm_girq_dpci_mapping); + + if ( !digl || !girq ) + { + spin_unlock(&d->event_lock); + xfree(girq); + xfree(digl); + return -ENOMEM; + } + + girq->bus = digl->bus = pt_irq_bind->u.pci.bus; + girq->device = digl->device = pt_irq_bind->u.pci.device; + girq->intx = digl->intx = pt_irq_bind->u.pci.intx; + list_add_tail(&digl->list, &pirq_dpci->digl_list); - hvm_irq_dpci->link_cnt[link]++; + guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx); + link = hvm_pci_intx_link(digl->device, digl->intx); - digl->bus = bus; - digl->device = device; - digl->intx = intx; - list_add_tail(&digl->list, &pirq_dpci->digl_list); + hvm_irq_dpci->link_cnt[link]++; - girq->bus = bus; - girq->device = device; - girq->intx = intx; - girq->machine_gsi = pirq; - list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); + girq->machine_gsi = pirq; + list_add_tail(&girq->list, &hvm_irq_dpci->girq[guest_gsi]); + } + else + { + /* MSI_TRANSLATE is not supported by the hardware domain. */ + ASSERT(pt_irq_bind->irq_type == PT_IRQ_TYPE_PCI); + guest_gsi = pirq; + ASSERT(guest_gsi < hvm_domain_irq(d)->nr_gsis); + } /* Bind the same mirq once in the same domain */ if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) ) @@ -472,7 +509,27 @@ int pt_irq_create_bind( pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED | HVM_IRQ_DPCI_MACH_PCI | HVM_IRQ_DPCI_GUEST_PCI; - share = BIND_PIRQ__WILL_SHARE; + if ( !is_hardware_domain(d) ) + share = BIND_PIRQ__WILL_SHARE; + else + { + unsigned int pin; + struct hvm_vioapic *vioapic = gsi_vioapic(d, guest_gsi, + &pin); + + if ( !vioapic ) + { + ASSERT_UNREACHABLE(); + return -EINVAL; + } + pirq_dpci->flags |= HVM_IRQ_DPCI_IDENTITY_GSI; + /* + * Check if the corresponding vIO APIC pin is configured + * level or edge trigger, level triggered interrupts will + * be marked as shareable. + */ + share = vioapic->redirtbl[pin].fields.trig_mode; + } } /* Init timer before binding */ @@ -489,9 +546,15 @@ int pt_irq_create_bind( * IRQ_GUEST is not set. As such we can reset 'dom' directly. */ pirq_dpci->dom = NULL; - list_del(&girq->list); - list_del(&digl->list); - hvm_irq_dpci->link_cnt[link]--; + if ( !is_hardware_domain(d) ) + { + unsigned int link = hvm_pci_intx_link(digl->device, + digl->intx); + + list_del(&girq->list); + list_del(&digl->list); + hvm_irq_dpci->link_cnt[link]--; + } pirq_dpci->flags = 0; pirq_cleanup_check(info, d); spin_unlock(&d->event_lock); @@ -504,10 +567,18 @@ int pt_irq_create_bind( spin_unlock(&d->event_lock); if ( iommu_verbose ) - printk(XENLOG_G_INFO - "d%d: bind: m_gsi=%u g_gsi=%u dev=%02x.%02x.%u intx=%u\n", - d->domain_id, pirq, guest_gsi, bus, - PCI_SLOT(device), PCI_FUNC(device), intx); + { + char buf[50]; + + if ( !is_hardware_domain(d) ) + snprintf(buf, ARRAY_SIZE(buf), " dev=%02x.%02x.%u intx=%u", + digl->bus, PCI_SLOT(digl->device), + PCI_FUNC(digl->device), digl->intx); + + printk(XENLOG_G_INFO "d%d: bind: m_gsi=%u g_gsi=%u%s\n", + d->domain_id, pirq, guest_gsi, + !is_hardware_domain(d) ? buf : ""); + } break; } @@ -522,7 +593,6 @@ int pt_irq_create_bind( int pt_irq_destroy_bind( struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind) { - struct hvm_irq_dpci *hvm_irq_dpci; struct hvm_pirq_dpci *pirq_dpci; unsigned int machine_gsi = pt_irq_bind->machine_irq; struct pirq *pirq; @@ -552,17 +622,15 @@ int pt_irq_destroy_bind( spin_lock(&d->event_lock); - hvm_irq_dpci = domain_get_irq_dpci(d); - - if ( hvm_irq_dpci == NULL ) + pirq = pirq_info(d, machine_gsi); + pirq_dpci = pirq_dpci(pirq); + if ( pirq_dpci->flags & HVM_IRQ_DPCI_IDENTITY_GSI ) { + ASSERT(is_hardware_domain(d)); spin_unlock(&d->event_lock); - return -EINVAL; + return -EOPNOTSUPP; } - pirq = pirq_info(d, machine_gsi); - pirq_dpci = pirq_dpci(pirq); - if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_MSI ) { unsigned int bus = pt_irq_bind->u.pci.bus; @@ -570,9 +638,16 @@ int pt_irq_destroy_bind( unsigned int intx = pt_irq_bind->u.pci.intx; unsigned int guest_gsi = hvm_pci_intx_gsi(device, intx); unsigned int link = hvm_pci_intx_link(device, intx); + struct hvm_irq_dpci *hvm_irq_dpci = domain_get_irq_dpci(d); struct hvm_girq_dpci_mapping *girq; struct dev_intx_gsi_link *digl, *tmp; + if ( hvm_irq_dpci == NULL ) + { + spin_unlock(&d->event_lock); + return -EINVAL; + } + list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list ) { if ( girq->bus == bus && @@ -696,7 +771,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq) struct hvm_irq_dpci *dpci = domain_get_irq_dpci(d); struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); - if ( !iommu_enabled || !dpci || !pirq_dpci || + if ( !iommu_enabled || (!is_hardware_domain(d) && !dpci) || !pirq_dpci || !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) ) return 0; @@ -757,7 +832,7 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector) static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) { - if ( unlikely(!hvm_domain_irq(d)->dpci) ) + if ( unlikely(!hvm_domain_irq(d)->dpci) && !is_hardware_domain(d) ) { ASSERT_UNREACHABLE(); return; @@ -789,10 +864,17 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) list_for_each_entry ( digl, &pirq_dpci->digl_list, list ) { + ASSERT(!(pirq_dpci->flags & HVM_IRQ_DPCI_IDENTITY_GSI)); hvm_pci_intx_assert(d, digl->device, digl->intx); pirq_dpci->pending++; } + if ( pirq_dpci->flags & HVM_IRQ_DPCI_IDENTITY_GSI ) + { + hvm_gsi_assert(d, pirq->pirq); + pirq_dpci->pending++; + } + if ( pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE ) { /* for translated MSI to INTx interrupt, eoi as early as possible */ @@ -814,17 +896,12 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) spin_unlock(&d->event_lock); } -static void __hvm_dpci_eoi(struct domain *d, - const struct hvm_girq_dpci_mapping *girq, +static void __hvm_pirq_eoi(struct pirq *pirq, const union vioapic_redir_entry *ent) { - struct pirq *pirq = pirq_info(d, girq->machine_gsi); - struct hvm_pirq_dpci *pirq_dpci; - - if ( !hvm_domain_use_pirq(d, pirq) ) - hvm_pci_intx_deassert(d, girq->device, girq->intx); + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); - pirq_dpci = pirq_dpci(pirq); + ASSERT(pirq_dpci); /* * No need to get vector lock for timer @@ -839,6 +916,32 @@ static void __hvm_dpci_eoi(struct domain *d, pirq_guest_eoi(pirq); } +static void __hvm_dpci_eoi(struct domain *d, + const struct hvm_girq_dpci_mapping *girq, + const union vioapic_redir_entry *ent) +{ + struct pirq *pirq = pirq_info(d, girq->machine_gsi); + + if ( !hvm_domain_use_pirq(d, pirq) ) + hvm_pci_intx_deassert(d, girq->device, girq->intx); + + __hvm_pirq_eoi(pirq, ent); +} + +static void __hvm_gsi_eoi(struct domain *d, unsigned int gsi, + const union vioapic_redir_entry *ent) +{ + struct pirq *pirq = pirq_info(d, gsi); + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); + + /* Check if GSI is actually mapped. */ + if ( !pirq_dpci ) + return; + + hvm_gsi_deassert(d, gsi); + __hvm_pirq_eoi(pirq, ent); +} + void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi, const union vioapic_redir_entry *ent) { @@ -848,6 +951,13 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi, if ( !iommu_enabled ) return; + if ( is_hardware_domain(d) ) + { + spin_lock(&d->event_lock); + __hvm_gsi_eoi(d, guest_gsi, ent); + goto unlock; + } + if ( guest_gsi < NR_ISAIRQS ) { hvm_dpci_isairq_eoi(d, guest_gsi); diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h index 671a6f2e06..0d2c72c109 100644 --- a/xen/include/xen/hvm/irq.h +++ b/xen/include/xen/hvm/irq.h @@ -40,6 +40,7 @@ struct dev_intx_gsi_link { #define _HVM_IRQ_DPCI_EOI_LATCH_SHIFT 3 #define _HVM_IRQ_DPCI_GUEST_PCI_SHIFT 4 #define _HVM_IRQ_DPCI_GUEST_MSI_SHIFT 5 +#define _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT 6 #define _HVM_IRQ_DPCI_TRANSLATE_SHIFT 15 #define HVM_IRQ_DPCI_MACH_PCI (1 << _HVM_IRQ_DPCI_MACH_PCI_SHIFT) #define HVM_IRQ_DPCI_MACH_MSI (1 << _HVM_IRQ_DPCI_MACH_MSI_SHIFT) @@ -47,6 +48,7 @@ struct dev_intx_gsi_link { #define HVM_IRQ_DPCI_EOI_LATCH (1 << _HVM_IRQ_DPCI_EOI_LATCH_SHIFT) #define HVM_IRQ_DPCI_GUEST_PCI (1 << _HVM_IRQ_DPCI_GUEST_PCI_SHIFT) #define HVM_IRQ_DPCI_GUEST_MSI (1 << _HVM_IRQ_DPCI_GUEST_MSI_SHIFT) +#define HVM_IRQ_DPCI_IDENTITY_GSI (1 << _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT) #define HVM_IRQ_DPCI_TRANSLATE (1 << _HVM_IRQ_DPCI_TRANSLATE_SHIFT) #define VMSI_DEST_ID_MASK 0xff @@ -123,6 +125,10 @@ void hvm_isa_irq_assert( void hvm_isa_irq_deassert( struct domain *d, unsigned int isa_irq); +/* Modify state of GSIs. */ +void hvm_gsi_assert(struct domain *d, unsigned int gsi); +void hvm_gsi_deassert(struct domain *d, unsigned int gsi); + int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq); int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data);
Achieve this by expanding pt_irq_create_bind in order to support mapping interrupts of type PT_IRQ_TYPE_PCI to a PVH Dom0. GSIs bound to Dom0 are always identity bound, which means the all the fields inside of the u.pci sub-struct are ignored, and only the machine_irq is actually used in order to determine which GSI the caller wants to bind. Also, the hvm_irq_dpci struct is not used by a PVH Dom0, since that's used to route interrupts and allow different host to guest GSI mappings, which is not used by a PVH Dom0. This requires adding some specific handlers for such directly mapped GSIs, which bypass the PCI interrupt routing done by Xen for HVM guests. Note that currently there's no support for unbinding this interrupts. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- Changes since v1: - Remove the PT_IRQ_TYPE_GSI and instead just use PT_IRQ_TYPE_PCI with a hardware domain special casing. - Check the trigger mode of the Dom0 vIO APIC in order to set the shareable flags in pt_irq_create_bind. --- xen/arch/x86/hvm/irq.c | 28 ++++++ xen/drivers/passthrough/io.c | 212 ++++++++++++++++++++++++++++++++----------- xen/include/xen/hvm/irq.h | 6 ++ 3 files changed, 195 insertions(+), 51 deletions(-)