Message ID | 20190701093034.18873-4-eric.auger@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci | expand |
On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote: > In nested mode, the stage 1 translation tables are owned by > the guest and there is no caching on host side. So there is > no need to replay the mappings. > > As of today, the SMMUv3 nested mode is not yet implemented > and there is no functional VFIO integration without. But > keeping the replay call would execute the default implementation > of memory_region_iommu_replay and attempt to translate the whole > address range, completely stalling qemu. Keeping the MAP/UNMAP > notifier registration allows to hit a warning message in the > SMMUv3 device that tells the user which VFIO device will not > function properly: > > "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: > SMMUv3 does not support notification on MAP: device vfio-pci will not > function properly" > > Besides, removing the replay call now allows the guest to boot. > > Signed-off-by: Eric Auger <eric.auger@redhat.com> > --- > hw/vfio/common.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index a859298fda..9ea58df67a 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, > if (memory_region_is_iommu(section->mr)) { > VFIOGuestIOMMU *giommu; > IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); > + bool nested = false; > int iommu_idx; > > trace_vfio_listener_region_add_iommu(iova, end); > @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, > QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); > > memory_region_register_iommu_notifier(section->mr, &giommu->n); > - memory_region_iommu_replay(giommu->iommu, &giommu->n); > > + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, > + (void *)&nested); > + if (!nested) { > + memory_region_iommu_replay(iommu_mr, &giommu->n); > + } For nested, do we need these IOMMU notifiers after all? I'm asking because the no-IOMMU case of vfio_listener_region_add() seems to suite very well for nested page tables to me. For example, vfio does not need to listen to MAP events any more because we'll simply share the guest IOMMU page table to be the 1st level page table of the host SMMU IIUC. And if we have 2nd page table changes (like memory hotplug) then IMHO vfio_listener_region_add() will do this for us as well just like when there's no SMMU. Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all the memory regions, so it also seems a bit awkward to make it per memory region. If you see the other real user of this flag (which is IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region. Regards,
Hi Peter, On 7/3/19 7:41 AM, Peter Xu wrote: > On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote: >> In nested mode, the stage 1 translation tables are owned by >> the guest and there is no caching on host side. So there is >> no need to replay the mappings. >> >> As of today, the SMMUv3 nested mode is not yet implemented >> and there is no functional VFIO integration without. But >> keeping the replay call would execute the default implementation >> of memory_region_iommu_replay and attempt to translate the whole >> address range, completely stalling qemu. Keeping the MAP/UNMAP >> notifier registration allows to hit a warning message in the >> SMMUv3 device that tells the user which VFIO device will not >> function properly: >> >> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: >> SMMUv3 does not support notification on MAP: device vfio-pci will not >> function properly" >> >> Besides, removing the replay call now allows the guest to boot. >> >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/vfio/common.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index a859298fda..9ea58df67a 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, >> if (memory_region_is_iommu(section->mr)) { >> VFIOGuestIOMMU *giommu; >> IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); >> + bool nested = false; >> int iommu_idx; >> >> trace_vfio_listener_region_add_iommu(iova, end); >> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, >> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); >> >> memory_region_register_iommu_notifier(section->mr, &giommu->n); >> - memory_region_iommu_replay(giommu->iommu, &giommu->n); >> >> + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, >> + (void *)&nested); >> + if (!nested) { >> + memory_region_iommu_replay(iommu_mr, &giommu->n); >> + } > > For nested, do we need these IOMMU notifiers after all? > > I'm asking because the no-IOMMU case of vfio_listener_region_add() > seems to suite very well for nested page tables to me. For example, > vfio does not need to listen to MAP events any more because we'll > simply share the guest IOMMU page table to be the 1st level page table > of the host SMMU IIUC. We don't need the MAP notifier but we need the UNMAP notifier: when the guest invalidates an ASID/IOVA we need to propagate this to the physical IOMMU. As mentioned in the cover letter, at the moment, I still register both MAP/UNMAP notifiers as the MAP notifier registration produces an explicit warning message in the SMMUv3 device. If I remove the registration we will loose this message. I hope this code is just an intermediate state towards the actual nested stage support. And if we have 2nd page table changes (like > memory hotplug) then IMHO vfio_listener_region_add() will do this for > us as well just like when there's no SMMU. In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I use a prereg_listener for stage 2 mappings. > > Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all > the memory regions, so it also seems a bit awkward to make it per > memory region. If you see the other real user of this flag (which is > IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region. That's correct all SMMUv3 regions will return this value. But what other API can be used to query IOMMU level attributes? On the other hand, Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says: This adds get_attr() to IOMMUMemoryRegionClass, like iommu_ops::domain_get_attr in the Linux kernel. and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level. Thanks Eric > > Regards, >
On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote: > Hi Peter, Hi, Eric, > > On 7/3/19 7:41 AM, Peter Xu wrote: > > On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote: > >> In nested mode, the stage 1 translation tables are owned by > >> the guest and there is no caching on host side. So there is > >> no need to replay the mappings. > >> > >> As of today, the SMMUv3 nested mode is not yet implemented > >> and there is no functional VFIO integration without. But > >> keeping the replay call would execute the default implementation > >> of memory_region_iommu_replay and attempt to translate the whole > >> address range, completely stalling qemu. Keeping the MAP/UNMAP > >> notifier registration allows to hit a warning message in the > >> SMMUv3 device that tells the user which VFIO device will not > >> function properly: > >> > >> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: > >> SMMUv3 does not support notification on MAP: device vfio-pci will not > >> function properly" > >> > >> Besides, removing the replay call now allows the guest to boot. > >> > >> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >> --- > >> hw/vfio/common.c | 7 ++++++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >> index a859298fda..9ea58df67a 100644 > >> --- a/hw/vfio/common.c > >> +++ b/hw/vfio/common.c > >> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, > >> if (memory_region_is_iommu(section->mr)) { > >> VFIOGuestIOMMU *giommu; > >> IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); > >> + bool nested = false; > >> int iommu_idx; > >> > >> trace_vfio_listener_region_add_iommu(iova, end); > >> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, > >> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); > >> > >> memory_region_register_iommu_notifier(section->mr, &giommu->n); > >> - memory_region_iommu_replay(giommu->iommu, &giommu->n); > >> > >> + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, > >> + (void *)&nested); > >> + if (!nested) { > >> + memory_region_iommu_replay(iommu_mr, &giommu->n); > >> + } > > > > For nested, do we need these IOMMU notifiers after all? > > > > I'm asking because the no-IOMMU case of vfio_listener_region_add() > > seems to suite very well for nested page tables to me. For example, > > vfio does not need to listen to MAP events any more because we'll > > simply share the guest IOMMU page table to be the 1st level page table > > of the host SMMU IIUC. > We don't need the MAP notifier but we need the UNMAP notifier: when the > guest invalidates an ASID/IOVA we need to propagate this to the physical > IOMMU. Indeed we need the unmaps. However I've got a major confusion here: With nested mode, we should need unmap events for the 1st level rather than the 2nd level, am I right? I mean, the invalidate request should be a GVA range rather than GPA range? While here IIUC vfio_listener_region_add() should be working on GPA address space. I don't know SMMU enough, but for Intel there should have two different kinds of invalidation messages. Currently because we still don't support nested on Intel so the 1st level invalidation is still not yet implemented (VTD_INV_DESC_PIOTLB). And IMHO if it is going to be implemented, I think it should be different comparing to current IOMMU_NOTIFIER_UNMAP in that it should not even need to bind to a memory region, and modules like vfio should simply deliver that exact message to the host IOMMU driver for the GVA range to be invalidated, just like what it will do with the root pointer of guest 1st level page table. > > As mentioned in the cover letter, at the moment, I still register both > MAP/UNMAP notifiers as the MAP notifier registration produces an > explicit warning message in the SMMUv3 device. If I remove the > registration we will loose this message. I hope this code is just an > intermediate state towards the actual nested stage support. I didn't see it in the cover letter. Would you please provide a link to the message? > > And if we have 2nd page table changes (like > > memory hotplug) then IMHO vfio_listener_region_add() will do this for > > us as well just like when there's no SMMU. > > In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup > nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I > use a prereg_listener for stage 2 mappings. > > > > Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all > > the memory regions, so it also seems a bit awkward to make it per > > memory region. If you see the other real user of this flag (which is > > IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region. > > That's correct all SMMUv3 regions will return this value. But what other > API can be used to query IOMMU level attributes? > > On the other hand, > > Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says: > This adds get_attr() to IOMMUMemoryRegionClass, like > iommu_ops::domain_get_attr in the Linux kernel. > > and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level. Yeah it's fine to me. Thanks,
Hi Peter, On 7/3/19 12:21 PM, Peter Xu wrote: > On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote: >> Hi Peter, > > Hi, Eric, > >> >> On 7/3/19 7:41 AM, Peter Xu wrote: >>> On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote: >>>> In nested mode, the stage 1 translation tables are owned by >>>> the guest and there is no caching on host side. So there is >>>> no need to replay the mappings. >>>> >>>> As of today, the SMMUv3 nested mode is not yet implemented >>>> and there is no functional VFIO integration without. But >>>> keeping the replay call would execute the default implementation >>>> of memory_region_iommu_replay and attempt to translate the whole >>>> address range, completely stalling qemu. Keeping the MAP/UNMAP >>>> notifier registration allows to hit a warning message in the >>>> SMMUv3 device that tells the user which VFIO device will not >>>> function properly: >>>> >>>> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: >>>> SMMUv3 does not support notification on MAP: device vfio-pci will not >>>> function properly" >>>> >>>> Besides, removing the replay call now allows the guest to boot. >>>> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>> --- >>>> hw/vfio/common.c | 7 ++++++- >>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index a859298fda..9ea58df67a 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, >>>> if (memory_region_is_iommu(section->mr)) { >>>> VFIOGuestIOMMU *giommu; >>>> IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); >>>> + bool nested = false; >>>> int iommu_idx; >>>> >>>> trace_vfio_listener_region_add_iommu(iova, end); >>>> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, >>>> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); >>>> >>>> memory_region_register_iommu_notifier(section->mr, &giommu->n); >>>> - memory_region_iommu_replay(giommu->iommu, &giommu->n); >>>> >>>> + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, >>>> + (void *)&nested); >>>> + if (!nested) { >>>> + memory_region_iommu_replay(iommu_mr, &giommu->n); >>>> + } >>> >>> For nested, do we need these IOMMU notifiers after all? >>> >>> I'm asking because the no-IOMMU case of vfio_listener_region_add() >>> seems to suite very well for nested page tables to me. For example, >>> vfio does not need to listen to MAP events any more because we'll >>> simply share the guest IOMMU page table to be the 1st level page table >>> of the host SMMU IIUC. >> We don't need the MAP notifier but we need the UNMAP notifier: when the >> guest invalidates an ASID/IOVA we need to propagate this to the physical >> IOMMU. > > Indeed we need the unmaps. However I've got a major confusion here: > With nested mode, we should need unmap events for the 1st level rather > than the 2nd level, am I right? yes that's correct I mean, the invalidate request should > be a GVA range rather than GPA range? While here IIUC > vfio_listener_region_add() should be working on GPA address space. Sorry I don't get your point. My understanding is in vfio_listener_region_add() we detect the addition of an IOMMU MR and init a notifier that covers the input AS it translates (GVA). When the guest sends an IOTLB invalidation on its first stage, this is trapped, we notify the UNMAP notifier and this eventually produces a stage1 invalidation at physical level (through VFIO/IOMMU kernel path). This piece is not yet implemented: see below. > > I don't know SMMU enough, but for Intel there should have two > different kinds of invalidation messages. Currently because we still > don't support nested on Intel so the 1st level invalidation is still > not yet implemented (VTD_INV_DESC_PIOTLB). And IMHO if it is going to > be implemented, I think it should be different comparing to current > IOMMU_NOTIFIER_UNMAP Yes the UNMAP notifier implementation is definitively different. It calls a VFIO iotcl that eventually produces a physical IOMMU stage1 invalidation. See ttps://patchwork.kernel.org/patch/10962721/. Maybe the confusion comes from the fact this patch is *not* an integration for nested SMMUv3 with VFIO. SMMUv3/VFIO still does not work. It just allows the guest to boot by bypassing the replay function. If things are clearer maybe I should simply assert() in case we detect a VFIO device protected by an SMMUv3. in that it should not even need to bind to a > memory region, and modules like vfio should simply deliver that exact > message to the host IOMMU driver for the GVA range to be invalidated, > just like what it will do with the root pointer of guest 1st level > page table. > >> >> As mentioned in the cover letter, at the moment, I still register both >> MAP/UNMAP notifiers as the MAP notifier registration produces an >> explicit warning message in the SMMUv3 device. If I remove the >> registration we will loose this message. I hope this code is just an >> intermediate state towards the actual nested stage support. > > I didn't see it in the cover letter. Would you please provide a link > to the message? Sorry it is in this commit message. Reference to "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: SMMUv3 does not support notification on MAP: device vfio-pci will not function properly" > >> >> And if we have 2nd page table changes (like >>> memory hotplug) then IMHO vfio_listener_region_add() will do this for >>> us as well just like when there's no SMMU. >> >> In the current integration, see [RFC v4 20/27] hw/vfio/common: Setup >> nested stage mappings (https://patchwork.kernel.org/patch/10962721/) I >> use a prereg_listener for stage 2 mappings. >>> >>> Another thing is that IOMMU_ATTR_VFIO_NESTED will be the same for all >>> the memory regions, so it also seems a bit awkward to make it per >>> memory region. If you see the other real user of this flag (which is >>> IOMMU_ATTR_SPAPR_TCE_FD) it's per memory region. >> >> That's correct all SMMUv3 regions will return this value. But what other >> API can be used to query IOMMU level attributes? >> >> On the other hand, >> >> Alexey's commit f1334de60b2 ("memory/iommu: Add get_attr()") says: >> This adds get_attr() to IOMMUMemoryRegionClass, like >> iommu_ops::domain_get_attr in the Linux kernel. >> >> and DOMAIN_ATTR_NESTING is part of enum iommu_attr at kernel level. > > Yeah it's fine to me. Thanks Eric > > Thanks, >
On Wed, Jul 03, 2019 at 12:45:37PM +0200, Auger Eric wrote: > Hi Peter, Hi, Eric, > On 7/3/19 12:21 PM, Peter Xu wrote: > > On Wed, Jul 03, 2019 at 11:04:38AM +0200, Auger Eric wrote: > >> Hi Peter, > > > > Hi, Eric, > > > >> > >> On 7/3/19 7:41 AM, Peter Xu wrote: > >>> On Mon, Jul 01, 2019 at 11:30:31AM +0200, Eric Auger wrote: > >>>> In nested mode, the stage 1 translation tables are owned by > >>>> the guest and there is no caching on host side. So there is > >>>> no need to replay the mappings. > >>>> > >>>> As of today, the SMMUv3 nested mode is not yet implemented > >>>> and there is no functional VFIO integration without. But > >>>> keeping the replay call would execute the default implementation > >>>> of memory_region_iommu_replay and attempt to translate the whole > >>>> address range, completely stalling qemu. Keeping the MAP/UNMAP > >>>> notifier registration allows to hit a warning message in the > >>>> SMMUv3 device that tells the user which VFIO device will not > >>>> function properly: > >>>> > >>>> "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: > >>>> SMMUv3 does not support notification on MAP: device vfio-pci will not > >>>> function properly" > >>>> > >>>> Besides, removing the replay call now allows the guest to boot. > >>>> > >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >>>> --- > >>>> hw/vfio/common.c | 7 ++++++- > >>>> 1 file changed, 6 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>>> index a859298fda..9ea58df67a 100644 > >>>> --- a/hw/vfio/common.c > >>>> +++ b/hw/vfio/common.c > >>>> @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, > >>>> if (memory_region_is_iommu(section->mr)) { > >>>> VFIOGuestIOMMU *giommu; > >>>> IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); > >>>> + bool nested = false; > >>>> int iommu_idx; > >>>> > >>>> trace_vfio_listener_region_add_iommu(iova, end); > >>>> @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, > >>>> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); > >>>> > >>>> memory_region_register_iommu_notifier(section->mr, &giommu->n); > >>>> - memory_region_iommu_replay(giommu->iommu, &giommu->n); > >>>> > >>>> + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, > >>>> + (void *)&nested); > >>>> + if (!nested) { > >>>> + memory_region_iommu_replay(iommu_mr, &giommu->n); > >>>> + } > >>> > >>> For nested, do we need these IOMMU notifiers after all? > >>> > >>> I'm asking because the no-IOMMU case of vfio_listener_region_add() > >>> seems to suite very well for nested page tables to me. For example, > >>> vfio does not need to listen to MAP events any more because we'll > >>> simply share the guest IOMMU page table to be the 1st level page table > >>> of the host SMMU IIUC. > >> We don't need the MAP notifier but we need the UNMAP notifier: when the > >> guest invalidates an ASID/IOVA we need to propagate this to the physical > >> IOMMU. > > > > Indeed we need the unmaps. However I've got a major confusion here: > > With nested mode, we should need unmap events for the 1st level rather > > than the 2nd level, am I right? > > yes that's correct > > I mean, the invalidate request should > > be a GVA range rather than GPA range? While here IIUC > > vfio_listener_region_add() should be working on GPA address space. > > Sorry I don't get your point. My understanding is in > vfio_listener_region_add() we detect the addition of an IOMMU MR and > init a notifier that covers the input AS it translates (GVA). When the > guest sends an IOTLB invalidation on its first stage, this is trapped, > we notify the UNMAP notifier and this eventually produces a stage1 > invalidation at physical level (through VFIO/IOMMU kernel path). This > piece is not yet implemented: see below. > > > > > > I don't know SMMU enough, but for Intel there should have two > > different kinds of invalidation messages. Currently because we still > > don't support nested on Intel so the 1st level invalidation is still > > not yet implemented (VTD_INV_DESC_PIOTLB). And IMHO if it is going to > > be implemented, I think it should be different comparing to current > > IOMMU_NOTIFIER_UNMAP > Yes the UNMAP notifier implementation is definitively different. It > calls a VFIO iotcl that eventually produces a physical IOMMU stage1 > invalidation. See ttps://patchwork.kernel.org/patch/10962721/. [1] > > Maybe the confusion comes from the fact this patch is *not* an > integration for nested SMMUv3 with VFIO. SMMUv3/VFIO still does not > work. It just allows the guest to boot by bypassing the replay function. > If things are clearer maybe I should simply assert() in case we detect a > VFIO device protected by an SMMUv3. Actually that's also my question to your other patch [1]: + if (container->iommu_type == VFIO_TYPE1_NESTING_IOMMU) { + /* Config notifier to propagate guest stage 1 config changes */ + giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset); + iommu_config_notifier_init(&giommu->n, vfio_iommu_nested_notify, + IOMMU_NOTIFIER_CONFIG_PASID, iommu_idx); + QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); + memory_region_register_iommu_notifier(section->mr, &giommu->n); + + /* IOTLB unmap notifier to propagate guest IOTLB invalidations */ + giommu = vfio_alloc_guest_iommu(container, iommu_mr, offset); + iommu_iotlb_notifier_init(&giommu->n, vfio_iommu_unmap_notify, + IOMMU_NOTIFIER_IOTLB_UNMAP, + section->offset_within_region, + int128_get64(llend), + iommu_idx); + QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); + memory_region_register_iommu_notifier(section->mr, &giommu->n); + } else { It'll be fine if we want to do this way finally, but it just let me feel a bit confusing when we register these notifiers with current IOMMU notifiers, because IMHO all these two kinds of events: - PASID root pointer - PASID-based IOTLB invalidations should not bind to any memory region at all, and should not have a concept of "memory range to register". It'll be easier for me to understand if vfio simply registers with IOMMU directly (or maybe registering with the PCI layer could be a bit better from code prospective?) in this case with these two notifiers and there seems to have nothing to do with current memory region framework. My vague memory was that Liu Yi has had some similar work (e.g., introduce some PCI level notifers and let VFIO registers to that instead for the nested case, though that's for Intel but IMHO it suites too for ARM) but I've totally forgotten the details. Thanks,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index a859298fda..9ea58df67a 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -604,6 +604,7 @@ static void vfio_listener_region_add(MemoryListener *listener, if (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu; IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr); + bool nested = false; int iommu_idx; trace_vfio_listener_region_add_iommu(iova, end); @@ -631,8 +632,12 @@ static void vfio_listener_region_add(MemoryListener *listener, QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); memory_region_register_iommu_notifier(section->mr, &giommu->n); - memory_region_iommu_replay(giommu->iommu, &giommu->n); + memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED, + (void *)&nested); + if (!nested) { + memory_region_iommu_replay(iommu_mr, &giommu->n); + } return; }
In nested mode, the stage 1 translation tables are owned by the guest and there is no caching on host side. So there is no need to replay the mappings. As of today, the SMMUv3 nested mode is not yet implemented and there is no functional VFIO integration without. But keeping the replay call would execute the default implementation of memory_region_iommu_replay and attempt to translate the whole address range, completely stalling qemu. Keeping the MAP/UNMAP notifier registration allows to hit a warning message in the SMMUv3 device that tells the user which VFIO device will not function properly: "qemu-system-aarch64: -device vfio-pci,host=0000:89:00.0: warning: SMMUv3 does not support notification on MAP: device vfio-pci will not function properly" Besides, removing the replay call now allows the guest to boot. Signed-off-by: Eric Auger <eric.auger@redhat.com> --- hw/vfio/common.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)