Message ID | 0-v2-f090ae795824+6ad-intel_no_snoop_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Make the iommu driver no-snoop block feature consistent | expand |
On 2022-04-07 16:23, Jason Gunthorpe wrote: > PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented > by a platform as bypassing elements in the DMA coherent CPU cache > hierarchy. A driver can command a device to set this bit on some of its > transactions as a micro-optimization. > > However, the driver is now responsible to synchronize the CPU cache with > the DMA that bypassed it. On x86 this may be done through the wbinvd > instruction, and the i915 GPU driver is the only Linux DMA driver that > calls it. > > The problem comes that KVM on x86 will normally disable the wbinvd > instruction in the guest and render it a NOP. As the driver running in the > guest is not aware the wbinvd doesn't work it may still cause the device > to set the no-snoop bit and the platform will bypass the CPU cache. > Without a working wbinvd there is no way to re-synchronize the CPU cache > and the driver in the VM has data corruption. > > Thus, we see a general direction on x86 that the IOMMU HW is able to block > the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to > to NOP the wbinvd without causing any data corruption. > > This control for Intel IOMMU was exposed by using IOMMU_CACHE and > IOMMU_CAP_CACHE_COHERENCY, however these two values now have multiple > meanings and usages beyond blocking no-snoop and the whole thing has > become confused. AMD IOMMU has the same feature and same IOPTE bits > however it unconditionally blocks no-snoop. > > Change it so that: > - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a > device. It is used by the DMA API/VFIO/etc when the user of the > iommu_domain will not be doing manual cache coherency operations. > > - IOMMU_CAP_CACHE_COHERENCY indicates if IOMMU_CACHE can be used with the > device. > > - The new optional domain op enforce_cache_coherency() will cause the > entire domain to block no-snoop requests - ie there is no way for any > device attached to the domain to opt out of the IOMMU_CACHE behavior. > This is permanent on the domain and must apply to any future devices > attached to it. > > Ideally an iommu driver should implement enforce_cache_coherency() so that > by DMA API domains allow the no-snoop optimization. This leaves it > available to kernel drivers like i915. VFIO will call > enforce_cache_coherency() before establishing any mappings and the domain > should then permanently block no-snoop. > > If enforce_cache_coherency() fails VFIO will communicate back through to > KVM into the arch code via kvm_arch_register_noncoherent_dma() > (only implemented by x86) which triggers a working wbinvd to be made > available to the VM. > > While other iommu drivers are certainly welcome to implement > enforce_cache_coherency(), it is not clear there is any benefit in doing > so right now. > > This is on github: https://github.com/jgunthorpe/linux/commits/intel_no_snoop > > v2: > - Abandon removing IOMMU_CAP_CACHE_COHERENCY - instead make it the cap > flag that indicates IOMMU_CACHE is supported > - Put the VFIO tests for IOMMU_CACHE at VFIO device registration > - In the Intel driver remove the domain->iommu_snooping value, this is > global not per-domain At a glance, this all looks about the right shape to me now, thanks! Ideally I'd hope patch #4 could go straight to device_iommu_capable() from my Thunderbolt series, but we can figure that out in a couple of weeks once Joerg starts queueing 5.19 material. I've got another VFIO patch waiting for the DMA ownership series to land anyway, so it's hardly the end of the world if I have to come back to follow up on this one too. For the series, Acked-by: Robin Murphy <robin.murphy@arm.com> > v1: https://lore.kernel.org/r/0-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com > > Jason Gunthorpe (4): > iommu: Introduce the domain op enforce_cache_coherency() > vfio: Move the Intel no-snoop control off of IOMMU_CACHE > iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for > IOMMU_CACHE > vfio: Require that devices support DMA cache coherence > > drivers/iommu/amd/iommu.c | 7 +++++++ > drivers/iommu/intel/iommu.c | 17 +++++++++++++---- > drivers/vfio/vfio.c | 7 +++++++ > drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++++----------- > include/linux/intel-iommu.h | 2 +- > include/linux/iommu.h | 7 +++++-- > 6 files changed, 52 insertions(+), 18 deletions(-) > > > base-commit: 3123109284176b1532874591f7c81f3837bbdc17
On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > At a glance, this all looks about the right shape to me now, thanks! Thanks! > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from > my Thunderbolt series, but we can figure that out in a couple of weeks once Yes, this does helps that because now the only iommu_capable call is in a context where a device is available :) > Joerg starts queueing 5.19 material. I've got another VFIO patch waiting for > the DMA ownership series to land anyway, so it's hardly the end of the world > if I have to come back to follow up on this one too. Hopefully Joerg will start soon, I also have patches written waiting for the DMA ownership series. Jason
On 2022-04-07 18:43, Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: >> At a glance, this all looks about the right shape to me now, thanks! > > Thanks! > >> Ideally I'd hope patch #4 could go straight to device_iommu_capable() from >> my Thunderbolt series, but we can figure that out in a couple of weeks once > > Yes, this does helps that because now the only iommu_capable call is > in a context where a device is available :) Derp, of course I have *two* VFIO patches waiting, the other one touching the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much as I hate it and would love to boot all that stuff over to drivers/irqchip, it's not in my way so I'm leaving it be for now). I'll have to rebase that anyway, so merging this as-is is absolutely fine! Cheers, Robin.
On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: > On 2022-04-07 18:43, Jason Gunthorpe wrote: > > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > > > At a glance, this all looks about the right shape to me now, thanks! > > > > Thanks! > > > > > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from > > > my Thunderbolt series, but we can figure that out in a couple of weeks once > > > > Yes, this does helps that because now the only iommu_capable call is > > in a context where a device is available :) > > Derp, of course I have *two* VFIO patches waiting, the other one touching > the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much > as I hate it and would love to boot all that stuff over to > drivers/irqchip, Oh me too... > it's not in my way so I'm leaving it be for now). I'll have to rebase that > anyway, so merging this as-is is absolutely fine! This might help your effort - after this series and this below there are no 'bus' users of iommu_capable left at all. From 55d72be40bc0a031711e318c8dd1cb60673d9eca Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe <jgg@nvidia.com> Date: Thu, 7 Apr 2022 16:00:50 -0300 Subject: [PATCH] vfio: Move the IOMMU_CAP_INTR_REMAP to a context with a struct device This check is done to ensure that the device we want to use is fully isolated and the platform does not allow the device's MemWr TLPs to trigger unauthorized MSIs. Instead of doing it in the container context where we only have a group, move the check to open_device where we actually know the device. This is still security safe as userspace cannot trigger an MemWr TLPs without obtaining a device fd. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --- drivers/vfio/vfio.c | 9 +++++++++ drivers/vfio/vfio.h | 1 + drivers/vfio/vfio_iommu_type1.c | 28 +++++++++++++++++----------- 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 9edad767cfdad3..8db5cea1dc1d75 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1355,6 +1355,15 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) if (IS_ERR(device)) return PTR_ERR(device); + /* Confirm this device is compatible with the container */ + if (group->type == VFIO_IOMMU && + group->container->iommu_driver->ops->device_ok) { + ret = group->container->iommu_driver->ops->device_ok( + group->container->iommu_data, device->dev); + if (ret) + goto err_device_put; + } + if (!try_module_get(device->dev->driver->owner)) { ret = -ENODEV; goto err_device_put; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index a6713022115155..3db60de71d42eb 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -66,6 +66,7 @@ struct vfio_iommu_driver_ops { struct iommu_group *group); void (*notify)(void *iommu_data, enum vfio_iommu_notify_type event); + int (*device_ok)(void *iommu_data, struct device *device); }; int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c13b9290e35759..5e966fb0ab9202 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2153,6 +2153,21 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, list_splice_tail(iova_copy, iova); } +static int vfio_iommu_device_ok(void *iommu_data, struct device *device) +{ + bool msi_remap; + + msi_remap = irq_domain_check_msi_remap() || + iommu_capable(device->bus, IOMMU_CAP_INTR_REMAP); + + if (!allow_unsafe_interrupts && !msi_remap) { + pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n", + __func__); + return -EPERM; + } + return 0; +} + static int vfio_iommu_type1_attach_group(void *iommu_data, struct iommu_group *iommu_group, enum vfio_group_type type) { @@ -2160,7 +2175,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, struct vfio_iommu_group *group; struct vfio_domain *domain, *d; struct bus_type *bus = NULL; - bool resv_msi, msi_remap; + bool resv_msi; phys_addr_t resv_msi_base = 0; struct iommu_domain_geometry *geo; LIST_HEAD(iova_copy); @@ -2257,16 +2272,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, INIT_LIST_HEAD(&domain->group_list); list_add(&group->next, &domain->group_list); - msi_remap = irq_domain_check_msi_remap() || - iommu_capable(bus, IOMMU_CAP_INTR_REMAP); - - if (!allow_unsafe_interrupts && !msi_remap) { - pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n", - __func__); - ret = -EPERM; - goto out_detach; - } - /* * If the IOMMU can block non-coherent operations (ie PCIe TLPs with * no-snoop set) then VFIO always turns this feature on because on Intel @@ -3159,6 +3164,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = { .open = vfio_iommu_type1_open, .release = vfio_iommu_type1_release, .ioctl = vfio_iommu_type1_ioctl, + .device_ok = vfio_iommu_device_ok, .attach_group = vfio_iommu_type1_attach_group, .detach_group = vfio_iommu_type1_detach_group, .pin_pages = vfio_iommu_type1_pin_pages,
On 2022-04-07 20:08, Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: >> On 2022-04-07 18:43, Jason Gunthorpe wrote: >>> On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: >>>> At a glance, this all looks about the right shape to me now, thanks! >>> >>> Thanks! >>> >>>> Ideally I'd hope patch #4 could go straight to device_iommu_capable() from >>>> my Thunderbolt series, but we can figure that out in a couple of weeks once >>> >>> Yes, this does helps that because now the only iommu_capable call is >>> in a context where a device is available :) >> >> Derp, of course I have *two* VFIO patches waiting, the other one touching >> the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much >> as I hate it and would love to boot all that stuff over to >> drivers/irqchip, > > Oh me too... > >> it's not in my way so I'm leaving it be for now). I'll have to rebase that >> anyway, so merging this as-is is absolutely fine! > > This might help your effort - after this series and this below there > are no 'bus' users of iommu_capable left at all. Thanks, but I still need a device for the iommu_domain_alloc() as well, so at that point the interrupt check is OK to stay where it is. I figured out a locking strategy per my original idea that seems pretty clean, it just needs vfio_group_viable() to go away first: https://gitlab.arm.com/linux-arm/linux-rm/-/commit/c6057da9f6b5f4b0fb67c6e647d2f8f76a6177fc Cheers, Robin.
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, April 8, 2022 3:08 AM > On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: > > On 2022-04-07 18:43, Jason Gunthorpe wrote: > > > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > > > > At a glance, this all looks about the right shape to me now, thanks! > > > > > > Thanks! > > > > > > > Ideally I'd hope patch #4 could go straight to device_iommu_capable() > from > > > > my Thunderbolt series, but we can figure that out in a couple of weeks > once > > > > > > Yes, this does helps that because now the only iommu_capable call is > > > in a context where a device is available :) > > > > Derp, of course I have *two* VFIO patches waiting, the other one touching > > the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, > much > > as I hate it and would love to boot all that stuff over to > > drivers/irqchip, > > Oh me too... > > > it's not in my way so I'm leaving it be for now). I'll have to rebase that > > anyway, so merging this as-is is absolutely fine! > > This might help your effort - after this series and this below there > are no 'bus' users of iommu_capable left at all. > Out of curiosity, while iommu_capable is being moved to a per-device interface what about irq_domain_check_msi_remap() below (which is also a global check)? > +static int vfio_iommu_device_ok(void *iommu_data, struct device *device) > +{ > + bool msi_remap; > + > + msi_remap = irq_domain_check_msi_remap() || > + iommu_capable(device->bus, IOMMU_CAP_INTR_REMAP); > +
On 2022-04-08 10:08, Tian, Kevin wrote: >> From: Jason Gunthorpe <jgg@nvidia.com> >> Sent: Friday, April 8, 2022 3:08 AM >> On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: >>> On 2022-04-07 18:43, Jason Gunthorpe wrote: >>>> On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: >>>>> At a glance, this all looks about the right shape to me now, thanks! >>>> >>>> Thanks! >>>> >>>>> Ideally I'd hope patch #4 could go straight to device_iommu_capable() >> from >>>>> my Thunderbolt series, but we can figure that out in a couple of weeks >> once >>>> >>>> Yes, this does helps that because now the only iommu_capable call is >>>> in a context where a device is available :) >>> >>> Derp, of course I have *two* VFIO patches waiting, the other one touching >>> the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, >> much >>> as I hate it and would love to boot all that stuff over to >>> drivers/irqchip, >> >> Oh me too... >> >>> it's not in my way so I'm leaving it be for now). I'll have to rebase that >>> anyway, so merging this as-is is absolutely fine! >> >> This might help your effort - after this series and this below there >> are no 'bus' users of iommu_capable left at all. >> > > Out of curiosity, while iommu_capable is being moved to a per-device > interface what about irq_domain_check_msi_remap() below (which > is also a global check)? I suppose it could if anyone cared enough to make the effort - probably a case of resolving specific MSI domains for every device in the group, and potentially having to deal with hotplug later as well. Realistically, though, I wouldn't expect systems to have mixed capabilities in that regard (i.e. where the check would return false even though *some* domains support remapping), so there doesn't seem to be any pressing need to relax it. Cheers, Robin. >> +static int vfio_iommu_device_ok(void *iommu_data, struct device *device) >> +{ >> + bool msi_remap; >> + >> + msi_remap = irq_domain_check_msi_remap() || >> + iommu_capable(device->bus, IOMMU_CAP_INTR_REMAP); >> + >
On Thu, Apr 07, 2022 at 08:27:05PM +0100, Robin Murphy wrote: > On 2022-04-07 20:08, Jason Gunthorpe wrote: > > On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: > > > On 2022-04-07 18:43, Jason Gunthorpe wrote: > > > > On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > > > > > At a glance, this all looks about the right shape to me now, thanks! > > > > > > > > Thanks! > > > > > > > > > Ideally I'd hope patch #4 could go straight to device_iommu_capable() from > > > > > my Thunderbolt series, but we can figure that out in a couple of weeks once > > > > > > > > Yes, this does helps that because now the only iommu_capable call is > > > > in a context where a device is available :) > > > > > > Derp, of course I have *two* VFIO patches waiting, the other one touching > > > the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much > > > as I hate it and would love to boot all that stuff over to > > > drivers/irqchip, > > > > Oh me too... > > > > > it's not in my way so I'm leaving it be for now). I'll have to rebase that > > > anyway, so merging this as-is is absolutely fine! > > > > This might help your effort - after this series and this below there > > are no 'bus' users of iommu_capable left at all. > > Thanks, but I still need a device for the iommu_domain_alloc() as well, so > at that point the interrupt check is OK to stay where it is. It is a simple enough change that could avoid introducing the device_iommu_capable() at all perhaps. > I figured out a locking strategy per my original idea that seems > pretty clean, it just needs vfio_group_viable() to go away first: I think this should be more like: struct vfio_device *vdev; mutex_lock(&group->device_lock); vdev = list_first_entry(group->device_list, struct vfio_device, group_next); ret = driver->ops->attach_group(data, group->iommu_group, group->type, vdev->dev); mutex_unlock(&group->device_lock); Then don't do the iommu_group_for_each_dev() at all. The problem with iommu_group_for_each_dev() is that it may select a struct device that does not have a vfio_device bound to it, so we would be using a random struct device that is not protected by any VFIO device_driver. However, this creates an oddball situation where the vfio_device and it's struct device could become unplugged from the system while the domain that the struct device spawned continues to exist and remains attached to other devices in the same group. ie the iommu driver has to be careful not to retain the struct device input.. I suppose that is inevitable to have sharing of domains across devices, so the iommu drivers will have to accommodate this. Jason
On 2022-04-08 13:18, Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 08:27:05PM +0100, Robin Murphy wrote: >> On 2022-04-07 20:08, Jason Gunthorpe wrote: >>> On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: >>>> On 2022-04-07 18:43, Jason Gunthorpe wrote: >>>>> On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: >>>>>> At a glance, this all looks about the right shape to me now, thanks! >>>>> >>>>> Thanks! >>>>> >>>>>> Ideally I'd hope patch #4 could go straight to device_iommu_capable() from >>>>>> my Thunderbolt series, but we can figure that out in a couple of weeks once >>>>> >>>>> Yes, this does helps that because now the only iommu_capable call is >>>>> in a context where a device is available :) >>>> >>>> Derp, of course I have *two* VFIO patches waiting, the other one touching >>>> the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, which, much >>>> as I hate it and would love to boot all that stuff over to >>>> drivers/irqchip, >>> >>> Oh me too... >>> >>>> it's not in my way so I'm leaving it be for now). I'll have to rebase that >>>> anyway, so merging this as-is is absolutely fine! >>> >>> This might help your effort - after this series and this below there >>> are no 'bus' users of iommu_capable left at all. >> >> Thanks, but I still need a device for the iommu_domain_alloc() as well, so >> at that point the interrupt check is OK to stay where it is. > > It is a simple enough change that could avoid introducing the > device_iommu_capable() at all perhaps. > >> I figured out a locking strategy per my original idea that seems >> pretty clean, it just needs vfio_group_viable() to go away first: > > I think this should be more like: > > struct vfio_device *vdev; > > mutex_lock(&group->device_lock); > vdev = list_first_entry(group->device_list, struct vfio_device, group_next); > ret = driver->ops->attach_group(data, group->iommu_group, > group->type, > vdev->dev); > mutex_unlock(&group->device_lock); > > Then don't do the iommu_group_for_each_dev() at all. > > The problem with iommu_group_for_each_dev() is that it may select a > struct device that does not have a vfio_device bound to it, so we > would be using a random struct device that is not protected by any > VFIO device_driver. Yeah, I was just looking at the final puzzle piece of making sure to nab the actual VFIO device rather than some unbound device that's just along for the ride... If I can't come up with anything more self-contained I'll take this suggestion, thanks. > However, this creates an oddball situation where the vfio_device and > it's struct device could become unplugged from the system while the > domain that the struct device spawned continues to exist and remains > attached to other devices in the same group. ie the iommu driver has > to be careful not to retain the struct device input.. Oh, I rather assumed that VFIO might automatically tear down the container/domain when the last real user disappears. I don't see there being an obvious problem if the domain does hang around with residual non-VFIO devices attached until userspace explicitly closes all the relevant handles, as long as we take care not to release DMA ownership until that point also. As you say, it just looks a bit funny. > I suppose that is inevitable to have sharing of domains across > devices, so the iommu drivers will have to accommodate this. I think domain lifecycle management is already entirely up to the users and not something that IOMMU drivers need to worry about. Drivers should only need to look at per-device data in attach/detach (and, once I've finished, alloc) from the device argument which can be assumed to be valid at that point. Otherwise, all the relevant internal data for domain ops should belong to the domain already. Cheers, Robin.
On Fri, Apr 08, 2022 at 02:11:10PM +0100, Robin Murphy wrote: > > However, this creates an oddball situation where the vfio_device and > > it's struct device could become unplugged from the system while the > > domain that the struct device spawned continues to exist and remains > > attached to other devices in the same group. ie the iommu driver has > > to be careful not to retain the struct device input.. > > Oh, I rather assumed that VFIO might automatically tear down the > container/domain when the last real user disappears. It does, that isn't quite what I mean.. Lets say a simple case with two groups and two devices. Open a VFIO container FD We open group A and SET_CONTAINER it. This results in an domain_A = iommu_domain_alloc(device_A) iommu_attach_group(domain_A, device_A->group) We open group B and SET_CONTAINER it. Using the sharing logic we end up doing iommu_attach_group(domain_A, device_B->group) Now we close group A FD, detatch device_A->group from domain_A and the driver core hot-unplugs device A completely from the system. However, domain_A remains in the system used by group B's open FD. It is a bit funny at least.. I think it is just something to document and be aware of for iommu driver writers that they probably shouldn't try to store the allocation device in their domain struct. IHMO the only purpose of the allocation device is to crystalize the configuration of the iommu_domain at allocation time. > as long as we take care not to release DMA ownership until that point also. > As you say, it just looks a bit funny. The DMA ownership should be OK as we take ownership on each group FD open > > I suppose that is inevitable to have sharing of domains across > > devices, so the iommu drivers will have to accommodate this. > > I think domain lifecycle management is already entirely up to the users and > not something that IOMMU drivers need to worry about. Drivers should only > need to look at per-device data in attach/detach (and, once I've finished, > alloc) from the device argument which can be assumed to be valid at that > point. Otherwise, all the relevant internal data for domain ops should > belong to the domain already. Making attach/detach take a struct device would be nice - but I would expect the attach/detatch to use a strictly paired struct device and I don't think this trick of selecting an arbitary vfio_device will achieve that. So, I suppose VFIO would want to attach/detatch on every vfio_device individually and it would iterate over the group instead of doing a list_first_entry() like above. This would not be hard to do in VFIO. Not sure what the iommu layer would have to do to accommodate this.. Jason
On 2022-04-08 14:35, Jason Gunthorpe wrote: > On Fri, Apr 08, 2022 at 02:11:10PM +0100, Robin Murphy wrote: > >>> However, this creates an oddball situation where the vfio_device and >>> it's struct device could become unplugged from the system while the >>> domain that the struct device spawned continues to exist and remains >>> attached to other devices in the same group. ie the iommu driver has >>> to be careful not to retain the struct device input.. >> >> Oh, I rather assumed that VFIO might automatically tear down the >> container/domain when the last real user disappears. > > It does, that isn't quite what I mean.. > > Lets say a simple case with two groups and two devices. > > Open a VFIO container FD > > We open group A and SET_CONTAINER it. This results in an > domain_A = iommu_domain_alloc(device_A) > iommu_attach_group(domain_A, device_A->group) > > We open group B and SET_CONTAINER it. Using the sharing logic we end > up doing > iommu_attach_group(domain_A, device_B->group) > > Now we close group A FD, detatch device_A->group from domain_A and the > driver core hot-unplugs device A completely from the system. > > However, domain_A remains in the system used by group B's open FD. > > It is a bit funny at least.. I think it is just something to document > and be aware of for iommu driver writers that they probably shouldn't > try to store the allocation device in their domain struct. > > IHMO the only purpose of the allocation device is to crystalize the > configuration of the iommu_domain at allocation time. Oh, for sure. When I implement the API switch, I can certainly try to document it as clearly as possible that the device argument is only for resolving the correct IOMMU ops and target instance, and the resulting domain is still not in any way tied to that specific device. I hadn't thought about how it might look to future developers who aren't already familiar with all the history here, so thanks for the nudge! >> as long as we take care not to release DMA ownership until that point also. >> As you say, it just looks a bit funny. > > The DMA ownership should be OK as we take ownership on each group FD > open > >>> I suppose that is inevitable to have sharing of domains across >>> devices, so the iommu drivers will have to accommodate this. >> >> I think domain lifecycle management is already entirely up to the users and >> not something that IOMMU drivers need to worry about. Drivers should only >> need to look at per-device data in attach/detach (and, once I've finished, >> alloc) from the device argument which can be assumed to be valid at that >> point. Otherwise, all the relevant internal data for domain ops should >> belong to the domain already. > > Making attach/detach take a struct device would be nice - but I would > expect the attach/detatch to use a strictly paired struct device and I > don't think this trick of selecting an arbitary vfio_device will > achieve that. > > So, I suppose VFIO would want to attach/detatch on every vfio_device > individually and it would iterate over the group instead of doing a > list_first_entry() like above. This would not be hard to do in VFIO. It feels like we've already beaten that discussion to death in other threads; regardless of what exact argument the iommu_attach/detach operations end up taking, they have to operate on the whole (explicit or implicit) iommu_group at once, because doing anything else would defeat the point of isolation groups, and be impossible for alias groups. > Not sure what the iommu layer would have to do to accommodate this.. If it's significantly easier for VFIO to just run through a whole list of devices and attach each one without having to keep track of whether they might share an iommu_group which has already been attached, then we can probably relax the API a little such that attaching to a domain which is already the current domain becomes a no-op instead of returning -EBUSY, but I'd rather not create an expectation that anyone *has* to do that. For any other callers that would be forcing *more* iommu_group implementation details onto them, when we all want less. Cheers, Robin.
> From: Robin Murphy <robin.murphy@arm.com> > Sent: Friday, April 8, 2022 6:12 PM > > On 2022-04-08 10:08, Tian, Kevin wrote: > >> From: Jason Gunthorpe <jgg@nvidia.com> > >> Sent: Friday, April 8, 2022 3:08 AM > >> On Thu, Apr 07, 2022 at 07:02:03PM +0100, Robin Murphy wrote: > >>> On 2022-04-07 18:43, Jason Gunthorpe wrote: > >>>> On Thu, Apr 07, 2022 at 06:03:37PM +0100, Robin Murphy wrote: > >>>>> At a glance, this all looks about the right shape to me now, thanks! > >>>> > >>>> Thanks! > >>>> > >>>>> Ideally I'd hope patch #4 could go straight to device_iommu_capable() > >> from > >>>>> my Thunderbolt series, but we can figure that out in a couple of weeks > >> once > >>>> > >>>> Yes, this does helps that because now the only iommu_capable call is > >>>> in a context where a device is available :) > >>> > >>> Derp, of course I have *two* VFIO patches waiting, the other one > touching > >>> the iommu_capable() calls (there's still IOMMU_CAP_INTR_REMAP, > which, > >> much > >>> as I hate it and would love to boot all that stuff over to > >>> drivers/irqchip, > >> > >> Oh me too... > >> > >>> it's not in my way so I'm leaving it be for now). I'll have to rebase that > >>> anyway, so merging this as-is is absolutely fine! > >> > >> This might help your effort - after this series and this below there > >> are no 'bus' users of iommu_capable left at all. > >> > > > > Out of curiosity, while iommu_capable is being moved to a per-device > > interface what about irq_domain_check_msi_remap() below (which > > is also a global check)? > > I suppose it could if anyone cared enough to make the effort - probably > a case of resolving specific MSI domains for every device in the group, > and potentially having to deal with hotplug later as well. > Realistically, though, I wouldn't expect systems to have mixed > capabilities in that regard (i.e. where the check would return false > even though *some* domains support remapping), so there doesn't seem to > be any pressing need to relax it. > Yes, that makes sense if no such example so far.
> From: Robin Murphy <robin.murphy@arm.com> > Sent: Saturday, April 9, 2022 1:45 AM > > > > > So, I suppose VFIO would want to attach/detatch on every vfio_device > > individually and it would iterate over the group instead of doing a > > list_first_entry() like above. This would not be hard to do in VFIO. > > It feels like we've already beaten that discussion to death in other > threads; regardless of what exact argument the iommu_attach/detach > operations end up taking, they have to operate on the whole (explicit or > implicit) iommu_group at once, because doing anything else would defeat > the point of isolation groups, and be impossible for alias groups. > Could you add a few words about what is 'alias group'?