diff mbox

vfio: avoid adding same iommu mr for notify

Message ID 1479171568-19175-1-git-send-email-peterx@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Xu Nov. 15, 2016, 12:59 a.m. UTC
When one IOMMU memory region is splitted into multiple memory sections,
vfio will register multiple same notifiers to a vIOMMU for the same
region. That's not sensible. What we need is to register one IOMMU
notifier for each IOMMU region, not per section.

Solution is simple - we traverse the container->giommu_list, and skip
the registration if memory region is already registered.

To make vfio's region_add() short, vfio_listener_region_add_iommu() is
introduced.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/vfio/common.c | 56 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 21 deletions(-)

Comments

Peter Xu Nov. 15, 2016, 1:05 a.m. UTC | #1
On Mon, Nov 14, 2016 at 07:59:28PM -0500, Peter Xu wrote:
> When one IOMMU memory region is splitted into multiple memory sections,
> vfio will register multiple same notifiers to a vIOMMU for the same
> region. That's not sensible. What we need is to register one IOMMU
> notifier for each IOMMU region, not per section.
> 
> Solution is simple - we traverse the container->giommu_list, and skip
> the registration if memory region is already registered.
> 
> To make vfio's region_add() short, vfio_listener_region_add_iommu() is
> introduced.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

This patch will solve Aviv's issue that vfio_map_notify() will dump
many error messages when device assignments are enabled with Intel
vIOMMU.

Alex, could you help verify whether this is the one that you
suggested? I'd be glad to add your suggestted-by if you won't
disagree.

Thanks,

-- peterx
Alexey Kardashevskiy Nov. 15, 2016, 5:11 a.m. UTC | #2
On 15/11/16 11:59, Peter Xu wrote:
> When one IOMMU memory region is splitted into multiple memory sections,


Out of curiosity - when does this happen?


> vfio will register multiple same notifiers to a vIOMMU for the same
> region. That's not sensible. What we need is to register one IOMMU
> notifier for each IOMMU region, not per section.
> 
> Solution is simple - we traverse the container->giommu_list, and skip
> the registration if memory region is already registered.
> 
> To make vfio's region_add() short, vfio_listener_region_add_iommu() is
> introduced.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/vfio/common.c | 56 +++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 801578b..5279fd1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -360,6 +360,40 @@ out:
>      rcu_read_unlock();
>  }
>  
> +static void vfio_listener_region_add_iommu(VFIOContainer *container,
> +                                           MemoryRegionSection *section,
> +                                           hwaddr iova,
> +                                           hwaddr end)
> +{
> +    VFIOGuestIOMMU *giommu;
> +
> +    QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> +        if (giommu->iommu == section->mr) {
> +            /* We have already registered with this MR, skip */
> +            return;
> +        }
> +    }
> +
> +    trace_vfio_listener_region_add_iommu(iova, end);
> +
> +    /*
> +     * FIXME: For VFIO iommu types which have KVM acceleration to
> +     * avoid bouncing all map/unmaps through qemu this way, this
> +     * would be the right place to wire that up (tell the KVM
> +     * device emulation the VFIO iommu handles to use).
> +     */
> +    giommu = g_malloc0(sizeof(*giommu));
> +    giommu->iommu = section->mr;
> +    giommu->iommu_offset = section->offset_within_address_space -
> +        section->offset_within_region;
> +    giommu->container = container;
> +    giommu->n.notify = vfio_iommu_map_notify;
> +    giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> +    QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> +    memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +    memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> @@ -439,27 +473,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      memory_region_ref(section->mr);
>  
>      if (memory_region_is_iommu(section->mr)) {
> -        VFIOGuestIOMMU *giommu;
> -
> -        trace_vfio_listener_region_add_iommu(iova, end);
> -        /*
> -         * FIXME: For VFIO iommu types which have KVM acceleration to
> -         * avoid bouncing all map/unmaps through qemu this way, this
> -         * would be the right place to wire that up (tell the KVM
> -         * device emulation the VFIO iommu handles to use).
> -         */
> -        giommu = g_malloc0(sizeof(*giommu));
> -        giommu->iommu = section->mr;
> -        giommu->iommu_offset = section->offset_within_address_space -
> -                               section->offset_within_region;
> -        giommu->container = container;
> -        giommu->n.notify = vfio_iommu_map_notify;
> -        giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> -        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> -
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> -        memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> -
> +        vfio_listener_region_add_iommu(container, section, iova, end);
>          return;
>      }
>  
>
Peter Xu Nov. 15, 2016, 3:49 p.m. UTC | #3
On Tue, Nov 15, 2016 at 04:11:57PM +1100, Alexey Kardashevskiy wrote:
> On 15/11/16 11:59, Peter Xu wrote:
> > When one IOMMU memory region is splitted into multiple memory sections,
> 
> 
> Out of curiosity - when does this happen?

My understanding: each PCI device will have one IOMMU region when with
vt-d vIOMMU enabled. After interrupt remapping is enabled, another
0xfeexxxxx region is overlapped to filter out the interrupts (see
vtd_find_add_as()). So the big Intel IOMMU region is splitted into
several.

Of course looks like we can remove the interrupt region and only use
one IOMMU region for both DMAR and IR, but I don't think that's a good
one, since maybe one day we'll have another scenario that will split a
single memory region. In that case, I'd prefer we take care of it
during registration.

Thanks,

-- peterx
David Gibson Nov. 17, 2016, 3:24 a.m. UTC | #4
On Mon, Nov 14, 2016 at 07:59:28PM -0500, Peter Xu wrote:
> When one IOMMU memory region is splitted into multiple memory sections,
> vfio will register multiple same notifiers to a vIOMMU for the same
> region. That's not sensible. What we need is to register one IOMMU
> notifier for each IOMMU region, not per section.
> 
> Solution is simple - we traverse the container->giommu_list, and skip
> the registration if memory region is already registered.
> 
> To make vfio's region_add() short, vfio_listener_region_add_iommu() is
> introduced.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

This is wrong.  It will work on the register side, but when the first
section attached to the IOMMU is removed, the IOMMU will be removed
from the list (triggering an unmap of the whole vfio space), rather
than when the *last* section attached to the MR is removed.

You'll get away with it in the simple x86 case, because both sections
will be removed at basically the same time, but it's not correct in
general.

I really think a better approach is to add the section boundary
information to the IOMMUNotifier structure within VFIOGuestIOMMU, and
check that as well as the MR on remove.  That additionally means the
IOMMU notifier won't get called for portions of the MR outside the
mapped sections, which the notifier handler probably isn't going to
expect.

> ---
>  hw/vfio/common.c | 56 +++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 801578b..5279fd1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -360,6 +360,40 @@ out:
>      rcu_read_unlock();
>  }
>  
> +static void vfio_listener_region_add_iommu(VFIOContainer *container,
> +                                           MemoryRegionSection *section,
> +                                           hwaddr iova,
> +                                           hwaddr end)
> +{
> +    VFIOGuestIOMMU *giommu;
> +
> +    QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> +        if (giommu->iommu == section->mr) {
> +            /* We have already registered with this MR, skip */
> +            return;
> +        }
> +    }
> +
> +    trace_vfio_listener_region_add_iommu(iova, end);
> +
> +    /*
> +     * FIXME: For VFIO iommu types which have KVM acceleration to
> +     * avoid bouncing all map/unmaps through qemu this way, this
> +     * would be the right place to wire that up (tell the KVM
> +     * device emulation the VFIO iommu handles to use).
> +     */
> +    giommu = g_malloc0(sizeof(*giommu));
> +    giommu->iommu = section->mr;
> +    giommu->iommu_offset = section->offset_within_address_space -
> +        section->offset_within_region;
> +    giommu->container = container;
> +    giommu->n.notify = vfio_iommu_map_notify;
> +    giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> +    QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> +    memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +    memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> @@ -439,27 +473,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      memory_region_ref(section->mr);
>  
>      if (memory_region_is_iommu(section->mr)) {
> -        VFIOGuestIOMMU *giommu;
> -
> -        trace_vfio_listener_region_add_iommu(iova, end);
> -        /*
> -         * FIXME: For VFIO iommu types which have KVM acceleration to
> -         * avoid bouncing all map/unmaps through qemu this way, this
> -         * would be the right place to wire that up (tell the KVM
> -         * device emulation the VFIO iommu handles to use).
> -         */
> -        giommu = g_malloc0(sizeof(*giommu));
> -        giommu->iommu = section->mr;
> -        giommu->iommu_offset = section->offset_within_address_space -
> -                               section->offset_within_region;
> -        giommu->container = container;
> -        giommu->n.notify = vfio_iommu_map_notify;
> -        giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> -        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> -
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> -        memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> -
> +        vfio_listener_region_add_iommu(container, section, iova, end);
>          return;
>      }
>
Peter Xu Nov. 21, 2016, 6:06 a.m. UTC | #5
On Thu, Nov 17, 2016 at 02:24:54PM +1100, David Gibson wrote:
> On Mon, Nov 14, 2016 at 07:59:28PM -0500, Peter Xu wrote:
> > When one IOMMU memory region is splitted into multiple memory sections,
> > vfio will register multiple same notifiers to a vIOMMU for the same
> > region. That's not sensible. What we need is to register one IOMMU
> > notifier for each IOMMU region, not per section.
> > 
> > Solution is simple - we traverse the container->giommu_list, and skip
> > the registration if memory region is already registered.
> > 
> > To make vfio's region_add() short, vfio_listener_region_add_iommu() is
> > introduced.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> This is wrong.  It will work on the register side, but when the first
> section attached to the IOMMU is removed, the IOMMU will be removed
> from the list (triggering an unmap of the whole vfio space), rather
> than when the *last* section attached to the MR is removed.
> 
> You'll get away with it in the simple x86 case, because both sections
> will be removed at basically the same time, but it's not correct in
> general.

Hmm, I got your point. Instead of keeping multiple VFIOGuestIOMMU as
you have mentioned previous (and below), IMHO another choice is to
have a reference count for each VFIOGuestIOMMU.

> 
> I really think a better approach is to add the section boundary
> information to the IOMMUNotifier structure within VFIOGuestIOMMU, and
> check that as well as the MR on remove.  That additionally means the
> IOMMU notifier won't get called for portions of the MR outside the
> mapped sections, which the notifier handler probably isn't going to
> expect.

Here, the problem is: what we want to listen to here. IMHO, here what
we are trying to listen to is memory region, not memory region
section. In that case, keeping multiple VFIOGuestIOMMU object is
awkward. I would prefer to have a VFIOGuestIOMMU.refcount field as
mentioned above, then:

- inc() every time the IOMMU memory region is added,
- dec() every time the IOMMU memory region is removed,
- free() when dec() == 0

Thanks,

-- peterx
diff mbox

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 801578b..5279fd1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -360,6 +360,40 @@  out:
     rcu_read_unlock();
 }
 
+static void vfio_listener_region_add_iommu(VFIOContainer *container,
+                                           MemoryRegionSection *section,
+                                           hwaddr iova,
+                                           hwaddr end)
+{
+    VFIOGuestIOMMU *giommu;
+
+    QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+        if (giommu->iommu == section->mr) {
+            /* We have already registered with this MR, skip */
+            return;
+        }
+    }
+
+    trace_vfio_listener_region_add_iommu(iova, end);
+
+    /*
+     * FIXME: For VFIO iommu types which have KVM acceleration to
+     * avoid bouncing all map/unmaps through qemu this way, this
+     * would be the right place to wire that up (tell the KVM
+     * device emulation the VFIO iommu handles to use).
+     */
+    giommu = g_malloc0(sizeof(*giommu));
+    giommu->iommu = section->mr;
+    giommu->iommu_offset = section->offset_within_address_space -
+        section->offset_within_region;
+    giommu->container = container;
+    giommu->n.notify = vfio_iommu_map_notify;
+    giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
+    QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+    memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+    memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -439,27 +473,7 @@  static void vfio_listener_region_add(MemoryListener *listener,
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        trace_vfio_listener_region_add_iommu(iova, end);
-        /*
-         * FIXME: For VFIO iommu types which have KVM acceleration to
-         * avoid bouncing all map/unmaps through qemu this way, this
-         * would be the right place to wire that up (tell the KVM
-         * device emulation the VFIO iommu handles to use).
-         */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->iommu_offset = section->offset_within_address_space -
-                               section->offset_within_region;
-        giommu->container = container;
-        giommu->n.notify = vfio_iommu_map_notify;
-        giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-        memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
-
+        vfio_listener_region_add_iommu(container, section, iova, end);
         return;
     }