Message ID | 20170630052436.15212-5-aik@ozlabs.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[+cc Joerg, iommu] On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > From: Yongji Xie <elohimes@gmail.com> > > Some iommu drivers would be initialized after PCI device > enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set > when probing PCI devices although IOMMU enables capability > of IRQ remapping. This patch tests this capability and > set the flag when iommu driver is initialized. > > Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > drivers/iommu/iommu.c | 8 ++++++++ > drivers/pci/probe.c | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index cf7ca7e70777..0b5881ddca09 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) > const struct iommu_ops *ops = cb->ops; > int ret; > > + /* > + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU > + * have capability of IRQ remapping. > + */ > + if (dev_is_pci(dev) && ops->capable && > + ops->capable(IOMMU_CAP_INTR_REMAP)) > + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; This isn't my area, but this addition is really ugly. It doesn't look anything like the rest of the code in add_iommu_group(), so it just feels like a wart. > if (!ops->add_device) > return 0; > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index f2393b7d7ebf..14aac9df3d17 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -17,6 +17,7 @@ > #include <linux/acpi.h> > #include <linux/irqdomain.h> > #include <linux/pm_runtime.h> > +#include <linux/iommu.h> This obviously belongs in another patch, as the compile error showed. > #include "pci.h" > > #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ > -- > 2.11.0 >
On 10/07/17 20:23, Bjorn Helgaas via iommu wrote: > [+cc Joerg, iommu] > > On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> From: Yongji Xie <elohimes@gmail.com> >> >> Some iommu drivers would be initialized after PCI device >> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set >> when probing PCI devices although IOMMU enables capability >> of IRQ remapping. This patch tests this capability and >> set the flag when iommu driver is initialized. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> drivers/iommu/iommu.c | 8 ++++++++ >> drivers/pci/probe.c | 1 + >> 2 files changed, 9 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index cf7ca7e70777..0b5881ddca09 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) >> const struct iommu_ops *ops = cb->ops; >> int ret; >> >> + /* >> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU >> + * have capability of IRQ remapping. >> + */ >> + if (dev_is_pci(dev) && ops->capable && >> + ops->capable(IOMMU_CAP_INTR_REMAP)) >> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; > > This isn't my area, but this addition is really ugly. It doesn't look > anything like the rest of the code in add_iommu_group(), so it just > feels like a wart. I have no idea what the context is here, but this flag looks wrong generally. IRQ remapping is a property of the irqchip and has nothing to do with PCI, so pretending it's a property of PCI buses looks like a massive hack around... something. Also, iommu_capable() is a fundamentally broken and unworkable interface anyway. The existing IRQ remapping code really wants updating to advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that IOMMU_CAP_INTR_REMAP can go away for good. That way, all consumers who actually care about whether IRQ remapping is implemented (see e.g. VFIO) can use irq_domain_check_msi_remap() or check specific devices in a sane and scalable manner. Robin. >> if (!ops->add_device) >> return 0; >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index f2393b7d7ebf..14aac9df3d17 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -17,6 +17,7 @@ >> #include <linux/acpi.h> >> #include <linux/irqdomain.h> >> #include <linux/pm_runtime.h> >> +#include <linux/iommu.h> > > This obviously belongs in another patch, as the compile error showed. > >> #include "pci.h" >> >> #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >> -- >> 2.11.0 >> > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu >
On 11/07/17 20:39, Robin Murphy wrote: > On 10/07/17 20:23, Bjorn Helgaas via iommu wrote: >> [+cc Joerg, iommu] >> >> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >>> From: Yongji Xie <elohimes@gmail.com> >>> >>> Some iommu drivers would be initialized after PCI device >>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set >>> when probing PCI devices although IOMMU enables capability >>> of IRQ remapping. This patch tests this capability and >>> set the flag when iommu driver is initialized. >>> >>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> drivers/iommu/iommu.c | 8 ++++++++ >>> drivers/pci/probe.c | 1 + >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index cf7ca7e70777..0b5881ddca09 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) >>> const struct iommu_ops *ops = cb->ops; >>> int ret; >>> >>> + /* >>> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU >>> + * have capability of IRQ remapping. >>> + */ >>> + if (dev_is_pci(dev) && ops->capable && >>> + ops->capable(IOMMU_CAP_INTR_REMAP)) >>> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; >> >> This isn't my area, but this addition is really ugly. It doesn't look >> anything like the rest of the code in add_iommu_group(), so it just >> feels like a wart. > > I have no idea what the context is here, but this flag looks wrong > generally. IRQ remapping is a property of the irqchip and has nothing to > do with PCI, so pretending it's a property of PCI buses looks like a > massive hack around... something. The context here - should we allow mapping of MSIX BAR to a guest or not. Now it is disabled as the guest can write there anything and make device send MSIX messages to random addresses. However if a PCI bridge can somehow filter these writes, then it is safe. So it is not really remapping what we care about in this patchset, it is filtering/censoring. And the reason to allow MSIX mapping is that it may be adjacent to frequently used MMIO and also reside within same 64K page which happens to be the default page size on PPC64. So even though remapping/filtering is not a property of any PCI bus but it is a part on an IOMMU which is usually (always?) combined with the PCI host bridge adapter so the PCI bus flag makes sense. How else would we pass that flag from IOMMU to the actual device we want to mmap to the userspace? > Also, iommu_capable() is a fundamentally broken and unworkable interface > anyway. The existing IRQ remapping code really wants updating to > advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that > IOMMU_CAP_INTR_REMAP can go away for good. This IRQ_DOMAIN_FLAG_MSI_REMAP is used only on ARM, is there a plan/desire to use it in other places instead of capable(IOMMU_CAP_INTR_REMAP)? > That way, all consumers who > actually care about whether IRQ remapping is implemented (see e.g. VFIO) > can use irq_domain_check_msi_remap() or check specific devices in a sane > and scalable manner. What specific devices are you talking about here? Actual PCI controllers do not have control over MSIX remapping/filtering... > > Robin. > >>> if (!ops->add_device) >>> return 0; >>> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>> index f2393b7d7ebf..14aac9df3d17 100644 >>> --- a/drivers/pci/probe.c >>> +++ b/drivers/pci/probe.c >>> @@ -17,6 +17,7 @@ >>> #include <linux/acpi.h> >>> #include <linux/irqdomain.h> >>> #include <linux/pm_runtime.h> >>> +#include <linux/iommu.h> >> >> This obviously belongs in another patch, as the compile error showed. >> >>> #include "pci.h" >>> >>> #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >>> -- >>> 2.11.0 >>> >> _______________________________________________ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu >> >
On 06/30/2017 01:24 PM, Alexey Kardashevskiy wrote: > From: Yongji Xie <elohimes@gmail.com> > > Some iommu drivers would be initialized after PCI device > enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set > when probing PCI devices although IOMMU enables capability > of IRQ remapping. This patch tests this capability and > set the flag when iommu driver is initialized. > > Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > drivers/iommu/iommu.c | 8 ++++++++ > drivers/pci/probe.c | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index cf7ca7e70777..0b5881ddca09 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) > const struct iommu_ops *ops = cb->ops; > int ret; > > + /* > + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU > + * have capability of IRQ remapping. > + */ > + if (dev_is_pci(dev) && ops->capable && > + ops->capable(IOMMU_CAP_INTR_REMAP)) > + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; [+Kevin] Hi Alexey, Just a reminder, you might want to check the proposed patch to intel-iommu by Alex, at: https://patchwork.kernel.org/patch/9092511/ Without that being the prerequisite, this patch will probably introduce security issues on x86. -- Thanks, Jike > + > if (!ops->add_device) > return 0; > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index f2393b7d7ebf..14aac9df3d17 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -17,6 +17,7 @@ > #include <linux/acpi.h> > #include <linux/irqdomain.h> > #include <linux/pm_runtime.h> > +#include <linux/iommu.h> > #include "pci.h" > > #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >
On 12/07/17 17:04, Jike Song wrote: > On 06/30/2017 01:24 PM, Alexey Kardashevskiy wrote: >> From: Yongji Xie <elohimes@gmail.com> >> >> Some iommu drivers would be initialized after PCI device >> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set >> when probing PCI devices although IOMMU enables capability >> of IRQ remapping. This patch tests this capability and >> set the flag when iommu driver is initialized. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> drivers/iommu/iommu.c | 8 ++++++++ >> drivers/pci/probe.c | 1 + >> 2 files changed, 9 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index cf7ca7e70777..0b5881ddca09 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) >> const struct iommu_ops *ops = cb->ops; >> int ret; >> >> + /* >> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU >> + * have capability of IRQ remapping. >> + */ >> + if (dev_is_pci(dev) && ops->capable && >> + ops->capable(IOMMU_CAP_INTR_REMAP)) >> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; > > [+Kevin] > > > Hi Alexey, > > Just a reminder, you might want to check the proposed patch to intel-iommu > by Alex, at: > > https://patchwork.kernel.org/patch/9092511/ > > Without that being the prerequisite, this patch will probably introduce > security issues on x86. Thanks, noted. Anything like this for AMD/s390? > > -- > Thanks, > Jike > >> + >> if (!ops->add_device) >> return 0; >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index f2393b7d7ebf..14aac9df3d17 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -17,6 +17,7 @@ >> #include <linux/acpi.h> >> #include <linux/irqdomain.h> >> #include <linux/pm_runtime.h> >> +#include <linux/iommu.h> >> #include "pci.h" >> >> #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >>
On Tue, 2017-07-11 at 11:39 +0100, Robin Murphy wrote: > I have no idea what the context is here, but this flag looks wrong > generally. IRQ remapping is a property of the irqchip and has nothing to > do with PCI, so pretending it's a property of PCI buses looks like a > massive hack around... something. > > Also, iommu_capable() is a fundamentally broken and unworkable interface > anyway. The existing IRQ remapping code really wants updating to > advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that > IOMMU_CAP_INTR_REMAP can go away for good. That way, all consumers who > actually care about whether IRQ remapping is implemented (see e.g. VFIO) > can use irq_domain_check_msi_remap() or check specific devices in a sane > and scalable manner. As you rightfully said, you have no idea what the context is :-) This is not an interrupt domain. On powerpc we have a single global unified domains that contains all our MSIs, IPIs, internally interrupts and what not, because of the way our interrupts infrastructure works. So there is no such thing as "a property of the MSI domain". The way the HW works is that the PCI Host Bridge has the ability to filter which device can access which MSIs. This is done by the IOMMU portion of the bridge. Thus it's a filtering capability, not a remapping capability per-se, and it's a property of the IOMMU domain. Sicne this is also a paravitualized interface, the "remapping" part is handled by the HV calls done by the guest to configure the MSIs. The HW filtering ensures that an evil guest cannot do bad things if it goes manually whack the MSI table. Now this issue have been discussed and patches floated around for *YEARS* now and there is still no upstream solution for what is a completely trivial problem: Don't bloody bloock MSI BAR table access on pseries platforms. It kills performance on a number of device due to our 64K pages. A 1-liner fix in qemu would have done it YEARS ago. But instead we have now painted about 1000 bike sheds and going without anything that actually works. Yay. Ben.
On 11/07/17 05:23, Bjorn Helgaas wrote: > [+cc Joerg, iommu] > > On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> From: Yongji Xie <elohimes@gmail.com> >> >> Some iommu drivers would be initialized after PCI device >> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set >> when probing PCI devices although IOMMU enables capability >> of IRQ remapping. This patch tests this capability and >> set the flag when iommu driver is initialized. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> drivers/iommu/iommu.c | 8 ++++++++ >> drivers/pci/probe.c | 1 + >> 2 files changed, 9 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index cf7ca7e70777..0b5881ddca09 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) >> const struct iommu_ops *ops = cb->ops; >> int ret; >> >> + /* >> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU >> + * have capability of IRQ remapping. >> + */ >> + if (dev_is_pci(dev) && ops->capable && >> + ops->capable(IOMMU_CAP_INTR_REMAP)) >> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; > > This isn't my area, but this addition is really ugly. It doesn't look > anything like the rest of the code in add_iommu_group(), so it just > feels like a wart. It does look like a wart, however it did not cause immediate rejection after first couple of revisions of this before I took it over... So. Here is the problem - deliver an MSIX isolation flag from IOMMU to VFIO-PCI driver. The options are: 1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a PCI bus property and it is "like a wart". 2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED and set it when an IOMMU group is created; VFIO-PCI has ways to get to the group and read the flag. 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP; others already use these IOMMU domains. VFIO-PCI's mmap() hook could then check the capability via iommu_capable(bus). The problems is as Robin said: "iommu_capable() is a fundamentally broken and unworkable interface anyway"; however it is still not clear to me why it is unworkable in this particular case of isolation checking. I am missing knowledge about ARM, is there a good overview of these MSIX domains vs. IOMMU on ARM (as MSIX remapping seems not to be an IOMMU property)? Which one to pick? Thanks. > >> if (!ops->add_device) >> return 0; >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index f2393b7d7ebf..14aac9df3d17 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -17,6 +17,7 @@ >> #include <linux/acpi.h> >> #include <linux/irqdomain.h> >> #include <linux/pm_runtime.h> >> +#include <linux/iommu.h> > > This obviously belongs in another patch, as the compile error showed. > >> #include "pci.h" >> >> #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >> -- >> 2.11.0 >>
On 19/07/17 20:02, Alexey Kardashevskiy wrote: > On 11/07/17 05:23, Bjorn Helgaas wrote: >> [+cc Joerg, iommu] >> >> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >>> From: Yongji Xie <elohimes@gmail.com> >>> >>> Some iommu drivers would be initialized after PCI device >>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set >>> when probing PCI devices although IOMMU enables capability >>> of IRQ remapping. This patch tests this capability and >>> set the flag when iommu driver is initialized. >>> >>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> drivers/iommu/iommu.c | 8 ++++++++ >>> drivers/pci/probe.c | 1 + >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index cf7ca7e70777..0b5881ddca09 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) >>> const struct iommu_ops *ops = cb->ops; >>> int ret; >>> >>> + /* >>> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU >>> + * have capability of IRQ remapping. >>> + */ >>> + if (dev_is_pci(dev) && ops->capable && >>> + ops->capable(IOMMU_CAP_INTR_REMAP)) >>> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; >> >> This isn't my area, but this addition is really ugly. It doesn't look >> anything like the rest of the code in add_iommu_group(), so it just >> feels like a wart. > > > It does look like a wart, however it did not cause immediate rejection > after first couple of revisions of this before I took it over... > > So. Here is the problem - deliver an MSIX isolation flag from IOMMU to > VFIO-PCI driver. The options are: > > 1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a > PCI bus property and it is "like a wart". > > 2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED > and set it when an IOMMU group is created; VFIO-PCI has ways to get to the > group and read the flag. > > 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU > groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP; > others already use these IOMMU domains. VFIO-PCI's mmap() hook could then > check the capability via iommu_capable(bus). The problems is as Robin said: > "iommu_capable() is a fundamentally broken and unworkable interface > anyway"; however it is still not clear to me why it is unworkable in this > particular case of isolation checking. > > I am missing knowledge about ARM, is there a good overview of these MSIX > domains vs. IOMMU on ARM (as MSIX remapping seems not to be an IOMMU property)? > > > Which one to pick? Thanks. Anyone, please? > > > >> >>> if (!ops->add_device) >>> return 0; >>> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>> index f2393b7d7ebf..14aac9df3d17 100644 >>> --- a/drivers/pci/probe.c >>> +++ b/drivers/pci/probe.c >>> @@ -17,6 +17,7 @@ >>> #include <linux/acpi.h> >>> #include <linux/irqdomain.h> >>> #include <linux/pm_runtime.h> >>> +#include <linux/iommu.h> >> >> This obviously belongs in another patch, as the compile error showed. >> >>> #include "pci.h" >>> >>> #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ >>> -- >>> 2.11.0 >>> > >
On Wed, Jul 19, 2017 at 08:02:04PM +1000, Alexey Kardashevskiy wrote: > On 11/07/17 05:23, Bjorn Helgaas wrote: > >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >> index cf7ca7e70777..0b5881ddca09 100644 > >> --- a/drivers/iommu/iommu.c > >> +++ b/drivers/iommu/iommu.c > >> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) > >> const struct iommu_ops *ops = cb->ops; > >> int ret; > >> > >> + /* > >> + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU > >> + * have capability of IRQ remapping. > >> + */ > >> + if (dev_is_pci(dev) && ops->capable && > >> + ops->capable(IOMMU_CAP_INTR_REMAP)) > >> + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; We avoid bus-specific hacks in generic iommu code. This has to be done in bus-specific iommu-group setup code. > 1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a > PCI bus property and it is "like a wart". This one is at least debatable. It could be a property of the bus. > 2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED > and set it when an IOMMU group is created; VFIO-PCI has ways to get to the > group and read the flag. That's the best option I see here. > 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU > groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP; > others already use these IOMMU domains. VFIO-PCI's mmap() hook could then > check the capability via iommu_capable(bus). The problems is as Robin said: > "iommu_capable() is a fundamentally broken and unworkable interface > anyway"; however it is still not clear to me why it is unworkable in this > particular case of isolation checking. That one is wrong, IRQ remapping is not a property of a domain. A domain is an abstraction for a device address space. Attaching IRQ information there is just wrong. Joerg
On Wed, 2017-07-26 at 11:50 +0200, Joerg Roedel wrote: > > 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU > > groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP; > > others already use these IOMMU domains. VFIO-PCI's mmap() hook could then > > check the capability via iommu_capable(bus). The problems is as Robin said: > > "iommu_capable() is a fundamentally broken and unworkable interface > > anyway"; however it is still not clear to me why it is unworkable in this > > particular case of isolation checking. > > That one is wrong, IRQ remapping is not a property of a domain. A domain > is an abstraction for a device address space. Attaching IRQ information > there is just wrong. Except it somewhat is ... an MSI is a store in the device address space, the way MSIs are handled and/or filtered can be considered a property of the domain. In our case, it's the exact same piece of HW that defines domains and which MSIs they are allowed to generate. Cheers, Ben.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index cf7ca7e70777..0b5881ddca09 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data) const struct iommu_ops *ops = cb->ops; int ret; + /* + * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU + * have capability of IRQ remapping. + */ + if (dev_is_pci(dev) && ops->capable && + ops->capable(IOMMU_CAP_INTR_REMAP)) + to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; + if (!ops->add_device) return 0; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index f2393b7d7ebf..14aac9df3d17 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -17,6 +17,7 @@ #include <linux/acpi.h> #include <linux/irqdomain.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h> #include "pci.h" #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */