diff mbox

[kernel,v4,4/6] iommu: Set PCI_BUS_FLAGS_MSI_REMAP on iommu driver initialization

Message ID 20170630052436.15212-5-aik@ozlabs.ru (mailing list archive)
State New, archived
Headers show

Commit Message

Alexey Kardashevskiy June 30, 2017, 5:24 a.m. UTC
From: Yongji Xie <elohimes@gmail.com>

Some iommu drivers would be initialized after PCI device
enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
when probing PCI devices although IOMMU enables capability
of IRQ remapping. This patch tests this capability and
set the flag when iommu driver is initialized.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 drivers/iommu/iommu.c | 8 ++++++++
 drivers/pci/probe.c   | 1 +
 2 files changed, 9 insertions(+)

Comments

Bjorn Helgaas July 10, 2017, 7:23 p.m. UTC | #1
[+cc Joerg, iommu]

On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> From: Yongji Xie <elohimes@gmail.com>
>
> Some iommu drivers would be initialized after PCI device
> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
> when probing PCI devices although IOMMU enables capability
> of IRQ remapping. This patch tests this capability and
> set the flag when iommu driver is initialized.
>
> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  drivers/iommu/iommu.c | 8 ++++++++
>  drivers/pci/probe.c   | 1 +
>  2 files changed, 9 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index cf7ca7e70777..0b5881ddca09 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>         const struct iommu_ops *ops = cb->ops;
>         int ret;
>
> +       /*
> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
> +        * have capability of IRQ remapping.
> +        */
> +       if (dev_is_pci(dev) && ops->capable &&
> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;

This isn't my area, but this addition is really ugly.  It doesn't look
anything like the rest of the code in add_iommu_group(), so it just
feels like a wart.

>         if (!ops->add_device)
>                 return 0;
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index f2393b7d7ebf..14aac9df3d17 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -17,6 +17,7 @@
>  #include <linux/acpi.h>
>  #include <linux/irqdomain.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/iommu.h>

This obviously belongs in another patch, as the compile error showed.

>  #include "pci.h"
>
>  #define CARDBUS_LATENCY_TIMER  176     /* secondary latency timer */
> --
> 2.11.0
>
Robin Murphy July 11, 2017, 10:39 a.m. UTC | #2
On 10/07/17 20:23, Bjorn Helgaas via iommu wrote:
> [+cc Joerg, iommu]
> 
> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> From: Yongji Xie <elohimes@gmail.com>
>>
>> Some iommu drivers would be initialized after PCI device
>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
>> when probing PCI devices although IOMMU enables capability
>> of IRQ remapping. This patch tests this capability and
>> set the flag when iommu driver is initialized.
>>
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  drivers/iommu/iommu.c | 8 ++++++++
>>  drivers/pci/probe.c   | 1 +
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index cf7ca7e70777..0b5881ddca09 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>>         const struct iommu_ops *ops = cb->ops;
>>         int ret;
>>
>> +       /*
>> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
>> +        * have capability of IRQ remapping.
>> +        */
>> +       if (dev_is_pci(dev) && ops->capable &&
>> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
>> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
> 
> This isn't my area, but this addition is really ugly.  It doesn't look
> anything like the rest of the code in add_iommu_group(), so it just
> feels like a wart.

I have no idea what the context is here, but this flag looks wrong
generally. IRQ remapping is a property of the irqchip and has nothing to
do with PCI, so pretending it's a property of PCI buses looks like a
massive hack around... something.

Also, iommu_capable() is a fundamentally broken and unworkable interface
anyway. The existing IRQ remapping code really wants updating to
advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that
IOMMU_CAP_INTR_REMAP can go away for good. That way, all consumers who
actually care about whether IRQ remapping is implemented (see e.g. VFIO)
can use irq_domain_check_msi_remap() or check specific devices in a sane
and scalable manner.

Robin.

>>         if (!ops->add_device)
>>                 return 0;
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index f2393b7d7ebf..14aac9df3d17 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/acpi.h>
>>  #include <linux/irqdomain.h>
>>  #include <linux/pm_runtime.h>
>> +#include <linux/iommu.h>
> 
> This obviously belongs in another patch, as the compile error showed.
> 
>>  #include "pci.h"
>>
>>  #define CARDBUS_LATENCY_TIMER  176     /* secondary latency timer */
>> --
>> 2.11.0
>>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
Alexey Kardashevskiy July 12, 2017, 2:47 a.m. UTC | #3
On 11/07/17 20:39, Robin Murphy wrote:
> On 10/07/17 20:23, Bjorn Helgaas via iommu wrote:
>> [+cc Joerg, iommu]
>>
>> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> From: Yongji Xie <elohimes@gmail.com>
>>>
>>> Some iommu drivers would be initialized after PCI device
>>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
>>> when probing PCI devices although IOMMU enables capability
>>> of IRQ remapping. This patch tests this capability and
>>> set the flag when iommu driver is initialized.
>>>
>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  drivers/iommu/iommu.c | 8 ++++++++
>>>  drivers/pci/probe.c   | 1 +
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index cf7ca7e70777..0b5881ddca09 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>>>         const struct iommu_ops *ops = cb->ops;
>>>         int ret;
>>>
>>> +       /*
>>> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
>>> +        * have capability of IRQ remapping.
>>> +        */
>>> +       if (dev_is_pci(dev) && ops->capable &&
>>> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
>>> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
>>
>> This isn't my area, but this addition is really ugly.  It doesn't look
>> anything like the rest of the code in add_iommu_group(), so it just
>> feels like a wart.
> 
> I have no idea what the context is here, but this flag looks wrong
> generally. IRQ remapping is a property of the irqchip and has nothing to
> do with PCI, so pretending it's a property of PCI buses looks like a
> massive hack around... something.

The context here - should we allow mapping of MSIX BAR to a guest or not.
Now it is disabled as the guest can write there anything and make device
send MSIX messages to random addresses. However if a PCI bridge can somehow
filter these writes, then it is safe.

So it is not really remapping what we care about in this patchset, it is
filtering/censoring.

And the reason to allow MSIX mapping is that it may be adjacent to
frequently used MMIO and also reside within same 64K page which happens to
be the default page size on PPC64.

So even though remapping/filtering is not a property of any PCI bus but it
is a part on an IOMMU which is usually (always?) combined with the PCI host
bridge adapter so the PCI bus flag makes sense. How else would we pass that
flag from IOMMU to the actual device we want to mmap to the userspace?


> Also, iommu_capable() is a fundamentally broken and unworkable interface
> anyway. The existing IRQ remapping code really wants updating to
> advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that
> IOMMU_CAP_INTR_REMAP can go away for good.

This IRQ_DOMAIN_FLAG_MSI_REMAP is used only on ARM, is there a plan/desire
to use it in other places instead of capable(IOMMU_CAP_INTR_REMAP)?

> That way, all consumers who
> actually care about whether IRQ remapping is implemented (see e.g. VFIO)
> can use irq_domain_check_msi_remap() or check specific devices in a sane
> and scalable manner.

What specific devices are you talking about here? Actual PCI controllers do
not have control over MSIX remapping/filtering...




> 
> Robin.
> 
>>>         if (!ops->add_device)
>>>                 return 0;
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index f2393b7d7ebf..14aac9df3d17 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -17,6 +17,7 @@
>>>  #include <linux/acpi.h>
>>>  #include <linux/irqdomain.h>
>>>  #include <linux/pm_runtime.h>
>>> +#include <linux/iommu.h>
>>
>> This obviously belongs in another patch, as the compile error showed.
>>
>>>  #include "pci.h"
>>>
>>>  #define CARDBUS_LATENCY_TIMER  176     /* secondary latency timer */
>>> --
>>> 2.11.0
>>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>
Jike Song July 12, 2017, 7:04 a.m. UTC | #4
On 06/30/2017 01:24 PM, Alexey Kardashevskiy wrote:
> From: Yongji Xie <elohimes@gmail.com>
> 
> Some iommu drivers would be initialized after PCI device
> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
> when probing PCI devices although IOMMU enables capability
> of IRQ remapping. This patch tests this capability and
> set the flag when iommu driver is initialized.
> 
> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  drivers/iommu/iommu.c | 8 ++++++++
>  drivers/pci/probe.c   | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index cf7ca7e70777..0b5881ddca09 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>  	const struct iommu_ops *ops = cb->ops;
>  	int ret;
>  
> +	/*
> +	 * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
> +	 * have capability of IRQ remapping.
> +	 */
> +	if (dev_is_pci(dev) && ops->capable &&
> +			ops->capable(IOMMU_CAP_INTR_REMAP))
> +		to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;

[+Kevin]


Hi Alexey,

Just a reminder, you might want to check the proposed patch to intel-iommu
by Alex, at:

	https://patchwork.kernel.org/patch/9092511/

Without that being the prerequisite, this patch will probably introduce
security issues on x86.

--
Thanks,
Jike

> +
>  	if (!ops->add_device)
>  		return 0;
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index f2393b7d7ebf..14aac9df3d17 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -17,6 +17,7 @@
>  #include <linux/acpi.h>
>  #include <linux/irqdomain.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/iommu.h>
>  #include "pci.h"
>  
>  #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
>
Alexey Kardashevskiy July 12, 2017, 7:28 a.m. UTC | #5
On 12/07/17 17:04, Jike Song wrote:
> On 06/30/2017 01:24 PM, Alexey Kardashevskiy wrote:
>> From: Yongji Xie <elohimes@gmail.com>
>>
>> Some iommu drivers would be initialized after PCI device
>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
>> when probing PCI devices although IOMMU enables capability
>> of IRQ remapping. This patch tests this capability and
>> set the flag when iommu driver is initialized.
>>
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  drivers/iommu/iommu.c | 8 ++++++++
>>  drivers/pci/probe.c   | 1 +
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index cf7ca7e70777..0b5881ddca09 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>>  	const struct iommu_ops *ops = cb->ops;
>>  	int ret;
>>  
>> +	/*
>> +	 * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
>> +	 * have capability of IRQ remapping.
>> +	 */
>> +	if (dev_is_pci(dev) && ops->capable &&
>> +			ops->capable(IOMMU_CAP_INTR_REMAP))
>> +		to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
> 
> [+Kevin]
> 
> 
> Hi Alexey,
> 
> Just a reminder, you might want to check the proposed patch to intel-iommu
> by Alex, at:
> 
> 	https://patchwork.kernel.org/patch/9092511/
> 
> Without that being the prerequisite, this patch will probably introduce
> security issues on x86.



Thanks, noted. Anything like this for AMD/s390?




> 
> --
> Thanks,
> Jike
> 
>> +
>>  	if (!ops->add_device)
>>  		return 0;
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index f2393b7d7ebf..14aac9df3d17 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/acpi.h>
>>  #include <linux/irqdomain.h>
>>  #include <linux/pm_runtime.h>
>> +#include <linux/iommu.h>
>>  #include "pci.h"
>>  
>>  #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */
>>
Benjamin Herrenschmidt July 19, 2017, 5:12 a.m. UTC | #6
On Tue, 2017-07-11 at 11:39 +0100, Robin Murphy wrote:
> I have no idea what the context is here, but this flag looks wrong
> generally. IRQ remapping is a property of the irqchip and has nothing to
> do with PCI, so pretending it's a property of PCI buses looks like a
> massive hack around... something.
> 
> Also, iommu_capable() is a fundamentally broken and unworkable interface
> anyway. The existing IRQ remapping code really wants updating to
> advertise IRQ_DOMAIN_FLAG_MSI_REMAP on the relevant MSI domains so that
> IOMMU_CAP_INTR_REMAP can go away for good. That way, all consumers who
> actually care about whether IRQ remapping is implemented (see e.g. VFIO)
> can use irq_domain_check_msi_remap() or check specific devices in a sane
> and scalable manner.

As you rightfully said, you have no idea what the context is :-)

This is not an interrupt domain.

On powerpc we have a single global unified domains that contains all
our MSIs, IPIs, internally interrupts and what not, because of the way
our interrupts infrastructure works.

So there is no such thing as "a property of the MSI domain".

The way the HW works is that the PCI Host Bridge has the ability
to filter which device can access which MSIs. This is done by the IOMMU
portion of the bridge.

Thus it's a filtering capability, not a remapping capability per-se,
and it's a property of the IOMMU domain.

Sicne this is also a paravitualized interface, the "remapping" part
is handled by the HV calls done by the guest to configure the MSIs.

The HW filtering ensures that an evil guest cannot do bad things if
it goes manually whack the MSI table.

Now this issue have been discussed and patches floated around for
*YEARS* now and there is still no upstream solution for what is a
completely trivial problem: Don't bloody bloock MSI BAR table access on
pseries platforms. It kills performance on a number of device due to
our 64K pages.

A 1-liner fix in qemu would have done it YEARS ago. But instead we have
now painted about 1000 bike sheds and going without anything that
actually works. Yay.

Ben.
Alexey Kardashevskiy July 19, 2017, 10:02 a.m. UTC | #7
On 11/07/17 05:23, Bjorn Helgaas wrote:
> [+cc Joerg, iommu]
> 
> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> From: Yongji Xie <elohimes@gmail.com>
>>
>> Some iommu drivers would be initialized after PCI device
>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
>> when probing PCI devices although IOMMU enables capability
>> of IRQ remapping. This patch tests this capability and
>> set the flag when iommu driver is initialized.
>>
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  drivers/iommu/iommu.c | 8 ++++++++
>>  drivers/pci/probe.c   | 1 +
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index cf7ca7e70777..0b5881ddca09 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>>         const struct iommu_ops *ops = cb->ops;
>>         int ret;
>>
>> +       /*
>> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
>> +        * have capability of IRQ remapping.
>> +        */
>> +       if (dev_is_pci(dev) && ops->capable &&
>> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
>> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
> 
> This isn't my area, but this addition is really ugly.  It doesn't look
> anything like the rest of the code in add_iommu_group(), so it just
> feels like a wart.


It does look like a wart, however it did not cause immediate rejection
after first couple of revisions of this before I took it over...

So. Here is the problem - deliver an MSIX isolation flag from IOMMU to
VFIO-PCI driver. The options are:

1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a
PCI bus property and it is "like a wart".

2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED
and set it when an IOMMU group is created; VFIO-PCI has ways to get to the
group and read the flag.

3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU
groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP;
others already use these IOMMU domains. VFIO-PCI's mmap() hook could then
check the capability via iommu_capable(bus). The problems is as Robin said:
"iommu_capable() is a fundamentally broken and unworkable interface
anyway"; however it is still not clear to me why it is unworkable in this
particular case of isolation checking.

I am missing knowledge about ARM, is there a good overview of these MSIX
domains vs. IOMMU on ARM (as MSIX remapping seems not to be an IOMMU property)?


Which one to pick? Thanks.



> 
>>         if (!ops->add_device)
>>                 return 0;
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index f2393b7d7ebf..14aac9df3d17 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/acpi.h>
>>  #include <linux/irqdomain.h>
>>  #include <linux/pm_runtime.h>
>> +#include <linux/iommu.h>
> 
> This obviously belongs in another patch, as the compile error showed.
> 
>>  #include "pci.h"
>>
>>  #define CARDBUS_LATENCY_TIMER  176     /* secondary latency timer */
>> --
>> 2.11.0
>>
Alexey Kardashevskiy July 25, 2017, 6:09 a.m. UTC | #8
On 19/07/17 20:02, Alexey Kardashevskiy wrote:
> On 11/07/17 05:23, Bjorn Helgaas wrote:
>> [+cc Joerg, iommu]
>>
>> On Fri, Jun 30, 2017 at 12:24 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> From: Yongji Xie <elohimes@gmail.com>
>>>
>>> Some iommu drivers would be initialized after PCI device
>>> enumeration. So PCI_BUS_FLAGS_MSI_REMAP would not be set
>>> when probing PCI devices although IOMMU enables capability
>>> of IRQ remapping. This patch tests this capability and
>>> set the flag when iommu driver is initialized.
>>>
>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  drivers/iommu/iommu.c | 8 ++++++++
>>>  drivers/pci/probe.c   | 1 +
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index cf7ca7e70777..0b5881ddca09 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
>>>         const struct iommu_ops *ops = cb->ops;
>>>         int ret;
>>>
>>> +       /*
>>> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
>>> +        * have capability of IRQ remapping.
>>> +        */
>>> +       if (dev_is_pci(dev) && ops->capable &&
>>> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
>>> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
>>
>> This isn't my area, but this addition is really ugly.  It doesn't look
>> anything like the rest of the code in add_iommu_group(), so it just
>> feels like a wart.
> 
> 
> It does look like a wart, however it did not cause immediate rejection
> after first couple of revisions of this before I took it over...
> 
> So. Here is the problem - deliver an MSIX isolation flag from IOMMU to
> VFIO-PCI driver. The options are:
> 
> 1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a
> PCI bus property and it is "like a wart".
> 
> 2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED
> and set it when an IOMMU group is created; VFIO-PCI has ways to get to the
> group and read the flag.
> 
> 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU
> groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP;
> others already use these IOMMU domains. VFIO-PCI's mmap() hook could then
> check the capability via iommu_capable(bus). The problems is as Robin said:
> "iommu_capable() is a fundamentally broken and unworkable interface
> anyway"; however it is still not clear to me why it is unworkable in this
> particular case of isolation checking.
> 
> I am missing knowledge about ARM, is there a good overview of these MSIX
> domains vs. IOMMU on ARM (as MSIX remapping seems not to be an IOMMU property)?
> 
> 
> Which one to pick? Thanks.



Anyone, please?



> 
> 
> 
>>
>>>         if (!ops->add_device)
>>>                 return 0;
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index f2393b7d7ebf..14aac9df3d17 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -17,6 +17,7 @@
>>>  #include <linux/acpi.h>
>>>  #include <linux/irqdomain.h>
>>>  #include <linux/pm_runtime.h>
>>> +#include <linux/iommu.h>
>>
>> This obviously belongs in another patch, as the compile error showed.
>>
>>>  #include "pci.h"
>>>
>>>  #define CARDBUS_LATENCY_TIMER  176     /* secondary latency timer */
>>> --
>>> 2.11.0
>>>
> 
>
Joerg Roedel July 26, 2017, 9:50 a.m. UTC | #9
On Wed, Jul 19, 2017 at 08:02:04PM +1000, Alexey Kardashevskiy wrote:
> On 11/07/17 05:23, Bjorn Helgaas wrote:
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index cf7ca7e70777..0b5881ddca09 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1063,6 +1063,14 @@ static int add_iommu_group(struct device *dev, void *data)
> >>         const struct iommu_ops *ops = cb->ops;
> >>         int ret;
> >>
> >> +       /*
> >> +        * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
> >> +        * have capability of IRQ remapping.
> >> +        */
> >> +       if (dev_is_pci(dev) && ops->capable &&
> >> +                       ops->capable(IOMMU_CAP_INTR_REMAP))
> >> +               to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;

We avoid bus-specific hacks in generic iommu code. This has to be done
in bus-specific iommu-group setup code.

> 1. PCI_BUS_FLAGS_MSI_REMAP on a PCI bus - MSIX isolation is not really a
> PCI bus property and it is "like a wart".

This one is at least debatable. It could be a property of the bus.

> 2. Introduce "flags" in iommu_group and define IOMMU_GROUP_MSIX_ISOLATED
> and set it when an IOMMU group is created; VFIO-PCI has ways to get to the
> group and read the flag.

That's the best option I see here.

> 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU
> groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP;
> others already use these IOMMU domains. VFIO-PCI's mmap() hook could then
> check the capability via iommu_capable(bus). The problems is as Robin said:
> "iommu_capable() is a fundamentally broken and unworkable interface
> anyway"; however it is still not clear to me why it is unworkable in this
> particular case of isolation checking.

That one is wrong, IRQ remapping is not a property of a domain. A domain
is an abstraction for a device address space. Attaching IRQ information
there is just wrong.



	Joerg
Benjamin Herrenschmidt July 26, 2017, 11:33 a.m. UTC | #10
On Wed, 2017-07-26 at 11:50 +0200, Joerg Roedel wrote:
> > 3. Create IOMMU_DOMAIN_UNMANAGED IOMMU domains for PPC64/powernv IOMMU
> > groups and only define capable() hook to report IOMMU_CAP_INTR_REMAP;
> > others already use these IOMMU domains. VFIO-PCI's mmap() hook could then
> > check the capability via iommu_capable(bus). The problems is as Robin said:
> > "iommu_capable() is a fundamentally broken and unworkable interface
> > anyway"; however it is still not clear to me why it is unworkable in this
> > particular case of isolation checking.
> 
> That one is wrong, IRQ remapping is not a property of a domain. A domain
> is an abstraction for a device address space. Attaching IRQ information
> there is just wrong.

Except it somewhat is ... an MSI is a store in the device address
space, the way MSIs are handled and/or filtered can be considered a
property of the domain. In our case, it's the exact same piece of HW
that defines domains and which MSIs they are allowed to generate.

Cheers,
Ben.
diff mbox

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf7ca7e70777..0b5881ddca09 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1063,6 +1063,14 @@  static int add_iommu_group(struct device *dev, void *data)
 	const struct iommu_ops *ops = cb->ops;
 	int ret;
 
+	/*
+	 * Set PCI_BUS_FLAGS_MSI_REMAP for all PCI buses when IOMMU
+	 * have capability of IRQ remapping.
+	 */
+	if (dev_is_pci(dev) && ops->capable &&
+			ops->capable(IOMMU_CAP_INTR_REMAP))
+		to_pci_dev(dev)->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+
 	if (!ops->add_device)
 		return 0;
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f2393b7d7ebf..14aac9df3d17 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -17,6 +17,7 @@ 
 #include <linux/acpi.h>
 #include <linux/irqdomain.h>
 #include <linux/pm_runtime.h>
+#include <linux/iommu.h>
 #include "pci.h"
 
 #define CARDBUS_LATENCY_TIMER	176	/* secondary latency timer */