[v3,3/4] iommu: introduce iommu_groups
diff mbox series

Message ID 20190716101657.23327-4-paul.durrant@citrix.com
State New
Headers show
Series
  • iommu groups + cleanup
Related show

Commit Message

Paul Durrant July 16, 2019, 10:16 a.m. UTC
Some devices may share a single PCIe initiator id, e.g. if they are actually
legacy PCI devices behind a bridge, and hence DMA from such devices will
be subject to the same address translation in the IOMMU. Hence these devices
should be treated as a unit for the purposes of assignment. There are also
other reasons why multiple devices should be treated as a unit, e.g. those
subject to a shared RMRR or those downstream of a bridge that does not
support ACS.

This patch introduces a new struct iommu_group to act as a container for
devices that should be treated as a unit, and builds a list of them as
PCI devices are scanned. The iommu_ops already implement a method,
get_device_group_id(), that is seemingly intended to return the initiator
id for a given SBDF so use this as the mechanism for group assignment in
the first instance. Assignment based on shared RMRR or lack of ACS will be
dealt with in subsequent patches, as will modifications to the device
assignment code.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wl@xen.org>

v3:
 - Fix signed vs. unsigned issue spotted by Roger.

v2:
 - Move code into new drivers/passthrough/groups.c
 - Drop the group index.
 - Handle failure to get group id.
 - Drop the group devs list.
---
 xen/drivers/passthrough/Makefile    |  1 +
 xen/drivers/passthrough/groups.c    | 91 +++++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/x86/iommu.c |  8 +++-
 xen/include/xen/iommu.h             |  7 +++
 xen/include/xen/pci.h               |  2 +
 5 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 xen/drivers/passthrough/groups.c

Comments

Andrew Cooper July 16, 2019, 10:26 a.m. UTC | #1
On 16/07/2019 11:16, Paul Durrant wrote:
> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +    int id;
> +    struct iommu_group *grp;
> +
> +    if ( !ops->get_device_group_id )
> +        return 0;
> +
> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
> +    if ( id < 0 )
> +        return -ENODATA;
> +
> +    grp = get_iommu_group(id);
> +    if ( !grp )
> +        return -ENOMEM;
> +
> +    if ( iommu_verbose )
> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",

No unadorned hex numbers please.  This is a recipe for confusion during
debugging.

Either %#x, or %u, and needs to be fixed on commit if we go with that route.

~Andrew
Andrew Cooper July 16, 2019, 10:35 a.m. UTC | #2
On 16/07/2019 11:31, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper <Andrew.Cooper3@citrix.com>
>> Sent: 16 July 2019 11:26
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
>> Cc: Jan Beulich <jbeulich@suse.com>; George Dunlap <George.Dunlap@citrix.com>; Ian Jackson
>> <Ian.Jackson@citrix.com>; Julien Grall <julien.grall@arm.com>; Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>;
>> Wei Liu <wl@xen.org>
>> Subject: Re: [PATCH v3 3/4] iommu: introduce iommu_groups
>>
>> On 16/07/2019 11:16, Paul Durrant wrote:
>>> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
>>> +{
>>> +    const struct iommu_ops *ops = iommu_get_ops();
>>> +    int id;
>>> +    struct iommu_group *grp;
>>> +
>>> +    if ( !ops->get_device_group_id )
>>> +        return 0;
>>> +
>>> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
>>> +    if ( id < 0 )
>>> +        return -ENODATA;
>>> +
>>> +    grp = get_iommu_group(id);
>>> +    if ( !grp )
>>> +        return -ENOMEM;
>>> +
>>> +    if ( iommu_verbose )
>>> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
>> No unadorned hex numbers please.  This is a recipe for confusion during
>> debugging.
>>
>> Either %#x, or %u, and needs to be fixed on commit if we go with that route.
> Personally I prefer the abstract iommu group index, more like Linux. I'd be happy to re-instate that and use %u. As for the %x here, I view it the same as the unadorned %x for seg, bus, and dev... but I'm not really fussy if you want to add a '#'.

I don't mind if it is rendered in hex or dec, but it must not be
ambiguous when printed.

Otherwise trying to debug issues to do with IOMMU group 12 is going to
be a lesson in pain and misery.

PCI coordinates are not ambiguous, even when they lack the 0x prefix. 
Plain numbers are.

~Andrew
Jan Beulich July 16, 2019, 10:37 a.m. UTC | #3
On 16.07.2019 12:26, Andrew Cooper wrote:
> On 16/07/2019 11:16, Paul Durrant wrote:
>> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
>> +{
>> +    const struct iommu_ops *ops = iommu_get_ops();
>> +    int id;
>> +    struct iommu_group *grp;
>> +
>> +    if ( !ops->get_device_group_id )
>> +        return 0;
>> +
>> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
>> +    if ( id < 0 )
>> +        return -ENODATA;
>> +
>> +    grp = get_iommu_group(id);
>> +    if ( !grp )
>> +        return -ENOMEM;
>> +
>> +    if ( iommu_verbose )
>> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
> 
> No unadorned hex numbers please.  This is a recipe for confusion during
> debugging.
> 
> Either %#x, or %u, and needs to be fixed on commit if we go with that route.

I assume (hope) that you mean this for the right most number only; it's not
entirely unambiguous from your reply.

Jan
Andrew Cooper July 16, 2019, 10:41 a.m. UTC | #4
On 16/07/2019 11:37, Jan Beulich wrote:
> On 16.07.2019 12:26, Andrew Cooper wrote:
>> On 16/07/2019 11:16, Paul Durrant wrote:
>>> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
>>> +{
>>> +    const struct iommu_ops *ops = iommu_get_ops();
>>> +    int id;
>>> +    struct iommu_group *grp;
>>> +
>>> +    if ( !ops->get_device_group_id )
>>> +        return 0;
>>> +
>>> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
>>> +    if ( id < 0 )
>>> +        return -ENODATA;
>>> +
>>> +    grp = get_iommu_group(id);
>>> +    if ( !grp )
>>> +        return -ENOMEM;
>>> +
>>> +    if ( iommu_verbose )
>>> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
>> No unadorned hex numbers please.  This is a recipe for confusion during
>> debugging.
>>
>> Either %#x, or %u, and needs to be fixed on commit if we go with that route.
> I assume (hope) that you mean this for the right most number only; it's not
> entirely unambiguous from your reply.

Oops yes - I only meant the IOMMU group formatter.

The PCI coordinates should get subsumed by %pp in fairly short order.

~Andrew
Jan Beulich July 16, 2019, 11:20 a.m. UTC | #5
On 16.07.2019 12:41, Andrew Cooper wrote:
> On 16/07/2019 11:37, Jan Beulich wrote:
>> On 16.07.2019 12:26, Andrew Cooper wrote:
>>> On 16/07/2019 11:16, Paul Durrant wrote:
>>>> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
>>>> +{
>>>> +    const struct iommu_ops *ops = iommu_get_ops();
>>>> +    int id;
>>>> +    struct iommu_group *grp;
>>>> +
>>>> +    if ( !ops->get_device_group_id )
>>>> +        return 0;
>>>> +
>>>> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
>>>> +    if ( id < 0 )
>>>> +        return -ENODATA;
>>>> +
>>>> +    grp = get_iommu_group(id);
>>>> +    if ( !grp )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if ( iommu_verbose )
>>>> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
>>> No unadorned hex numbers please.  This is a recipe for confusion during
>>> debugging.
>>>
>>> Either %#x, or %u, and needs to be fixed on commit if we go with that route.
>> I assume (hope) that you mean this for the right most number only; it's not
>> entirely unambiguous from your reply.
> 
> Oops yes - I only meant the IOMMU group formatter.
> 
> The PCI coordinates should get subsumed by %pp in fairly short order.

%op I (still) hope ;-)

Jan
Jan Beulich July 24, 2019, 2:29 p.m. UTC | #6
On 16.07.2019 12:16, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/Makefile
> +++ b/xen/drivers/passthrough/Makefile
> @@ -4,6 +4,7 @@ subdir-$(CONFIG_X86) += x86
>  subdir-$(CONFIG_ARM) += arm
>  
>  obj-y += iommu.o
> +obj-$(CONFIG_HAS_PCI) += groups.o

I assume this dependency on PCI is temporary, as there's nothing
inherently tying grouping of devices to PCI (afaict)?

> +int iommu_group_assign(struct pci_dev *pdev, void *arg)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +    int id;
> +    struct iommu_group *grp;
> +
> +    if ( !ops->get_device_group_id )
> +        return 0;

With you making groups mandatory (i.e. even solitary devices getting
put in a group), shouldn't this be -EOPNOTSUPP, maybe accompanied by
ASSERT_UNREACHABLE()?

> +    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
> +    if ( id < 0 )
> +        return -ENODATA;
> +
> +    grp = get_iommu_group(id);
> +    if ( !grp )
> +        return -ENOMEM;
> +
> +    if ( iommu_verbose )
> +        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
> +               pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +               PCI_FUNC(pdev->devfn), grp->id);

I'm not overly happy about this new logging: On modern systems a
debug level run is already rather verbose about PCI devices,
simply because there are so many. If my hope to not see individual
devices put in groups is not going to be fulfilled, can we at least
try to come to some agreement that certain devices which can't
sensibly be passed through won't be assigned groups (and hence
won't produce output here)? A group-less device then would
automatically be unable to have its owner changed.

Jan

Patch
diff mbox series

diff --git a/xen/drivers/passthrough/Makefile b/xen/drivers/passthrough/Makefile
index d50ab188c8..8a77110179 100644
--- a/xen/drivers/passthrough/Makefile
+++ b/xen/drivers/passthrough/Makefile
@@ -4,6 +4,7 @@  subdir-$(CONFIG_X86) += x86
 subdir-$(CONFIG_ARM) += arm
 
 obj-y += iommu.o
+obj-$(CONFIG_HAS_PCI) += groups.o
 obj-$(CONFIG_HAS_PCI) += pci.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
 
diff --git a/xen/drivers/passthrough/groups.c b/xen/drivers/passthrough/groups.c
new file mode 100644
index 0000000000..c6d00980b6
--- /dev/null
+++ b/xen/drivers/passthrough/groups.c
@@ -0,0 +1,91 @@ 
+/*
+ * Copyright (c) 2019 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/iommu.h>
+#include <xen/radix-tree.h>
+
+struct iommu_group {
+    unsigned int id;
+};
+
+static struct radix_tree_root iommu_groups;
+
+void __init iommu_groups_init(void)
+{
+    radix_tree_init(&iommu_groups);
+}
+
+static struct iommu_group *alloc_iommu_group(unsigned int id)
+{
+    struct iommu_group *grp = xzalloc(struct iommu_group);
+
+    if ( !grp )
+        return NULL;
+
+    grp->id = id;
+
+    if ( radix_tree_insert(&iommu_groups, id, grp) )
+    {
+        xfree(grp);
+        grp = NULL;
+    }
+
+    return grp;
+}
+
+static struct iommu_group *get_iommu_group(unsigned int id)
+{
+    struct iommu_group *grp = radix_tree_lookup(&iommu_groups, id);
+
+    if ( !grp )
+        grp = alloc_iommu_group(id);
+
+    return grp;
+}
+
+int iommu_group_assign(struct pci_dev *pdev, void *arg)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    int id;
+    struct iommu_group *grp;
+
+    if ( !ops->get_device_group_id )
+        return 0;
+
+    id = ops->get_device_group_id(pdev->seg, pdev->bus, pdev->devfn);
+    if ( id < 0 )
+        return -ENODATA;
+
+    grp = get_iommu_group(id);
+    if ( !grp )
+        return -ENOMEM;
+
+    if ( iommu_verbose )
+        printk(XENLOG_INFO "Assign %04x:%02x:%02x.%u -> IOMMU group %x\n",
+               pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+               PCI_FUNC(pdev->devfn), grp->id);
+
+    pdev->grp = grp;
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a7438c9c25..90fc750456 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -43,7 +43,13 @@  int __init iommu_hardware_setup(void)
         /* x2apic setup may have previously initialised the struct. */
         ASSERT(iommu_ops.init == iommu_init_ops->ops->init);
 
-    return iommu_init_ops->setup();
+    rc = iommu_init_ops->setup();
+    if ( rc )
+        return rc;
+
+    iommu_groups_init();
+
+    return pci_pdevs_iterate(iommu_group_assign, NULL);
 }
 
 int iommu_enable_x2apic(void)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 48f87480a7..c93f580fdc 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -317,6 +317,13 @@  DECLARE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 extern struct spinlock iommu_pt_cleanup_lock;
 extern struct page_list_head iommu_pt_cleanup_list;
 
+#ifdef CONFIG_HAS_PCI
+
+void iommu_groups_init(void);
+int iommu_group_assign(struct pci_dev *pdev, void *arg);
+
+#endif /* CONFIG_HAS_PCI */
+
 #endif /* _IOMMU_H_ */
 
 /*
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 79eb25417b..e1f887af1c 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -79,6 +79,8 @@  struct pci_dev {
     struct list_head alldevs_list;
     struct list_head domain_list;
 
+    struct iommu_group *grp;
+
     struct list_head msi_list;
 
     struct arch_msix *msix;