Message ID | 20190325013036.18400-8-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/mdev: IOMMU aware mediated device | expand |
On 3/25/2019 7:00 AM, Lu Baolu wrote: > A parent device might create different types of mediated > devices. For example, a mediated device could be created > by the parent device with full isolation and protection > provided by the IOMMU. One usage case could be found on > Intel platforms where a mediated device is an assignable > subset of a PCI, the DMA requests on behalf of it are all > tagged with a PASID. Since IOMMU supports PASID-granular > translations (scalable mode in VT-d 3.0), this mediated > device could be individually protected and isolated by an > IOMMU. > > This patch adds a new member in the struct mdev_device to > indicate that the mediated device represented by mdev could > be isolated and protected by attaching a domain to a device > represented by mdev->iommu_device. It also adds a helper to > add or set the iommu device. > > * mdev_device->iommu_device > - This, if set, indicates that the mediated device could > be fully isolated and protected by IOMMU via attaching > an iommu domain to this device. If empty, it indicates > using vendor defined isolation, hence bypass IOMMU. > > * mdev_set/get_iommu_device(dev, iommu_device) > - Set or get the iommu device which represents this mdev > in IOMMU's device scope. Drivers don't need to set the > iommu device if it uses vendor defined isolation. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Suggested-by: Kevin Tian <kevin.tian@intel.com> > Suggested-by: Alex Williamson <alex.williamson@redhat.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > --- > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ > drivers/vfio/mdev/mdev_private.h | 1 + > include/linux/mdev.h | 14 ++++++++++++++ > 3 files changed, 33 insertions(+) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b96fedc77ee5..1b6435529166 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) > return 0; > } > > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + mdev->iommu_device = iommu_device; > + > + return 0; > +} > +EXPORT_SYMBOL(mdev_set_iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + return mdev->iommu_device; > +} > +EXPORT_SYMBOL(mdev_get_iommu_device); > + > static int __init mdev_init(void) > { > return mdev_bus_register(); > diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h > index 379758c52b1b..bfb7b22a7cb6 100644 > --- a/drivers/vfio/mdev/mdev_private.h > +++ b/drivers/vfio/mdev/mdev_private.h > @@ -34,6 +34,7 @@ struct mdev_device { > struct list_head next; > struct kobject *type_kobj; > bool active; > + struct device *iommu_device; > }; > > #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) > diff --git a/include/linux/mdev.h b/include/linux/mdev.h > index d7aee90e5da5..df2ea39f47ee 100644 > --- a/include/linux/mdev.h > +++ b/include/linux/mdev.h > @@ -15,6 +15,20 @@ > > struct mdev_device; > > +/* > + * Called by the parent device driver to set the device which represents > + * this mdev in iommu protection scope. By default, the iommu device is > + * NULL, that indicates using vendor defined isolation. > + * > + * @dev: the mediated device that iommu will isolate. > + * @iommu_device: a pci device which represents the iommu for @dev. > + * > + * Return 0 for success, otherwise negative error value. > + */ > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev); > + > /** > * struct mdev_parent_ops - Structure to be registered for each parent device to > * register the device to mdev module. > Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Thanks, Kirti
On Mon, 25 Mar 2019 09:30:34 +0800 Lu Baolu <baolu.lu@linux.intel.com> wrote: > A parent device might create different types of mediated > devices. For example, a mediated device could be created > by the parent device with full isolation and protection > provided by the IOMMU. One usage case could be found on > Intel platforms where a mediated device is an assignable > subset of a PCI, the DMA requests on behalf of it are all > tagged with a PASID. Since IOMMU supports PASID-granular > translations (scalable mode in VT-d 3.0), this mediated > device could be individually protected and isolated by an > IOMMU. > > This patch adds a new member in the struct mdev_device to > indicate that the mediated device represented by mdev could > be isolated and protected by attaching a domain to a device > represented by mdev->iommu_device. It also adds a helper to > add or set the iommu device. > > * mdev_device->iommu_device > - This, if set, indicates that the mediated device could > be fully isolated and protected by IOMMU via attaching > an iommu domain to this device. If empty, it indicates > using vendor defined isolation, hence bypass IOMMU. > > * mdev_set/get_iommu_device(dev, iommu_device) > - Set or get the iommu device which represents this mdev > in IOMMU's device scope. Drivers don't need to set the > iommu device if it uses vendor defined isolation. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Suggested-by: Kevin Tian <kevin.tian@intel.com> > Suggested-by: Alex Williamson <alex.williamson@redhat.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > --- > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ > drivers/vfio/mdev/mdev_private.h | 1 + > include/linux/mdev.h | 14 ++++++++++++++ > 3 files changed, 33 insertions(+) Acked-by: Alex Williamson <alex.williamson@redhat.com> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b96fedc77ee5..1b6435529166 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) > return 0; > } > > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + mdev->iommu_device = iommu_device; > + > + return 0; > +} > +EXPORT_SYMBOL(mdev_set_iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + return mdev->iommu_device; > +} > +EXPORT_SYMBOL(mdev_get_iommu_device); > + > static int __init mdev_init(void) > { > return mdev_bus_register(); > diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h > index 379758c52b1b..bfb7b22a7cb6 100644 > --- a/drivers/vfio/mdev/mdev_private.h > +++ b/drivers/vfio/mdev/mdev_private.h > @@ -34,6 +34,7 @@ struct mdev_device { > struct list_head next; > struct kobject *type_kobj; > bool active; > + struct device *iommu_device; > }; > > #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) > diff --git a/include/linux/mdev.h b/include/linux/mdev.h > index d7aee90e5da5..df2ea39f47ee 100644 > --- a/include/linux/mdev.h > +++ b/include/linux/mdev.h > @@ -15,6 +15,20 @@ > > struct mdev_device; > > +/* > + * Called by the parent device driver to set the device which represents > + * this mdev in iommu protection scope. By default, the iommu device is > + * NULL, that indicates using vendor defined isolation. > + * > + * @dev: the mediated device that iommu will isolate. > + * @iommu_device: a pci device which represents the iommu for @dev. > + * > + * Return 0 for success, otherwise negative error value. > + */ > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); > + > +struct device *mdev_get_iommu_device(struct device *dev); > + > /** > * struct mdev_parent_ops - Structure to be registered for each parent device to > * register the device to mdev module.
> -----Original Message----- > From: linux-kernel-owner@vger.kernel.org <linux-kernel- > owner@vger.kernel.org> On Behalf Of Kirti Wankhede > Sent: Tuesday, March 26, 2019 4:33 AM > To: Lu Baolu <baolu.lu@linux.intel.com>; Joerg Roedel <joro@8bytes.org>; > David Woodhouse <dwmw2@infradead.org>; Alex Williamson > <alex.williamson@redhat.com> > Cc: ashok.raj@intel.com; sanjay.k.kumar@intel.com; > jacob.jun.pan@intel.com; kevin.tian@intel.com; Jean-Philippe Brucker <jean- > philippe.brucker@arm.com>; yi.l.liu@intel.com; yi.y.sun@intel.com; > peterx@redhat.com; tiwei.bie@intel.com; xin.zeng@intel.com; > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; linux- > kernel@vger.kernel.org; Jacob Pan <jacob.jun.pan@linux.intel.com>; Neo Jia > <cjia@nvidia.com> > Subject: Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in > mdev_device > > > > On 3/25/2019 7:00 AM, Lu Baolu wrote: > > A parent device might create different types of mediated devices. For > > example, a mediated device could be created by the parent device with > > full isolation and protection provided by the IOMMU. One usage case > > could be found on Intel platforms where a mediated device is an > > assignable subset of a PCI, the DMA requests on behalf of it are all > > tagged with a PASID. Since IOMMU supports PASID-granular translations > > (scalable mode in VT-d 3.0), this mediated device could be > > individually protected and isolated by an IOMMU. > > > > This patch adds a new member in the struct mdev_device to indicate > > that the mediated device represented by mdev could be isolated and > > protected by attaching a domain to a device represented by > > mdev->iommu_device. It also adds a helper to add or set the iommu > > device. > > > > * mdev_device->iommu_device > > - This, if set, indicates that the mediated device could > > be fully isolated and protected by IOMMU via attaching > > an iommu domain to this device. If empty, it indicates > > using vendor defined isolation, hence bypass IOMMU. > > > > * mdev_set/get_iommu_device(dev, iommu_device) > > - Set or get the iommu device which represents this mdev > > in IOMMU's device scope. Drivers don't need to set the > > iommu device if it uses vendor defined isolation. > > > > Cc: Ashok Raj <ashok.raj@intel.com> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Cc: Kevin Tian <kevin.tian@intel.com> > > Cc: Liu Yi L <yi.l.liu@intel.com> > > Suggested-by: Kevin Tian <kevin.tian@intel.com> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com> > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > > Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > > --- > > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ > > drivers/vfio/mdev/mdev_private.h | 1 + > > include/linux/mdev.h | 14 ++++++++++++++ > > 3 files changed, 33 insertions(+) > > > > diff --git a/drivers/vfio/mdev/mdev_core.c > > b/drivers/vfio/mdev/mdev_core.c index b96fedc77ee5..1b6435529166 > > 100644 > > --- a/drivers/vfio/mdev/mdev_core.c > > +++ b/drivers/vfio/mdev/mdev_core.c > > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool > force_remove) > > return 0; > > } > > > > +int mdev_set_iommu_device(struct device *dev, struct device > > +*iommu_device) { > > + struct mdev_device *mdev = to_mdev_device(dev); > > + > > + mdev->iommu_device = iommu_device; > > + > > + return 0; > > +} > > +EXPORT_SYMBOL(mdev_set_iommu_device); > > + > > +struct device *mdev_get_iommu_device(struct device *dev) { > > + struct mdev_device *mdev = to_mdev_device(dev); > > + > > + return mdev->iommu_device; > > +} > > +EXPORT_SYMBOL(mdev_get_iommu_device); > > + > > static int __init mdev_init(void) > > { > > return mdev_bus_register(); > > diff --git a/drivers/vfio/mdev/mdev_private.h > > b/drivers/vfio/mdev/mdev_private.h > > index 379758c52b1b..bfb7b22a7cb6 100644 > > --- a/drivers/vfio/mdev/mdev_private.h > > +++ b/drivers/vfio/mdev/mdev_private.h > > @@ -34,6 +34,7 @@ struct mdev_device { > > struct list_head next; > > struct kobject *type_kobj; > > bool active; > > + struct device *iommu_device; > > }; > > This is not a performance path, but it is a good practice to create naturally aligned/packed structures. You should define struct device *iommu_device; before bool active.
On Wed, 27 Mar 2019 14:17:57 +0000 Parav Pandit <parav@mellanox.com> wrote: > > -----Original Message----- > > From: linux-kernel-owner@vger.kernel.org <linux-kernel- > > owner@vger.kernel.org> On Behalf Of Kirti Wankhede > > Sent: Tuesday, March 26, 2019 4:33 AM > > To: Lu Baolu <baolu.lu@linux.intel.com>; Joerg Roedel <joro@8bytes.org>; > > David Woodhouse <dwmw2@infradead.org>; Alex Williamson > > <alex.williamson@redhat.com> > > Cc: ashok.raj@intel.com; sanjay.k.kumar@intel.com; > > jacob.jun.pan@intel.com; kevin.tian@intel.com; Jean-Philippe Brucker <jean- > > philippe.brucker@arm.com>; yi.l.liu@intel.com; yi.y.sun@intel.com; > > peterx@redhat.com; tiwei.bie@intel.com; xin.zeng@intel.com; > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; linux- > > kernel@vger.kernel.org; Jacob Pan <jacob.jun.pan@linux.intel.com>; Neo Jia > > <cjia@nvidia.com> > > Subject: Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in > > mdev_device > > > > > > > > On 3/25/2019 7:00 AM, Lu Baolu wrote: > > > A parent device might create different types of mediated devices. For > > > example, a mediated device could be created by the parent device with > > > full isolation and protection provided by the IOMMU. One usage case > > > could be found on Intel platforms where a mediated device is an > > > assignable subset of a PCI, the DMA requests on behalf of it are all > > > tagged with a PASID. Since IOMMU supports PASID-granular translations > > > (scalable mode in VT-d 3.0), this mediated device could be > > > individually protected and isolated by an IOMMU. > > > > > > This patch adds a new member in the struct mdev_device to indicate > > > that the mediated device represented by mdev could be isolated and > > > protected by attaching a domain to a device represented by > > > mdev->iommu_device. It also adds a helper to add or set the iommu > > > device. > > > > > > * mdev_device->iommu_device > > > - This, if set, indicates that the mediated device could > > > be fully isolated and protected by IOMMU via attaching > > > an iommu domain to this device. If empty, it indicates > > > using vendor defined isolation, hence bypass IOMMU. > > > > > > * mdev_set/get_iommu_device(dev, iommu_device) > > > - Set or get the iommu device which represents this mdev > > > in IOMMU's device scope. Drivers don't need to set the > > > iommu device if it uses vendor defined isolation. > > > > > > Cc: Ashok Raj <ashok.raj@intel.com> > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > Cc: Liu Yi L <yi.l.liu@intel.com> > > > Suggested-by: Kevin Tian <kevin.tian@intel.com> > > > Suggested-by: Alex Williamson <alex.williamson@redhat.com> > > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > > > Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > > > --- > > > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ > > > drivers/vfio/mdev/mdev_private.h | 1 + > > > include/linux/mdev.h | 14 ++++++++++++++ > > > 3 files changed, 33 insertions(+) > > > > > > diff --git a/drivers/vfio/mdev/mdev_core.c > > > b/drivers/vfio/mdev/mdev_core.c index b96fedc77ee5..1b6435529166 > > > 100644 > > > --- a/drivers/vfio/mdev/mdev_core.c > > > +++ b/drivers/vfio/mdev/mdev_core.c > > > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool > > force_remove) > > > return 0; > > > } > > > > > > +int mdev_set_iommu_device(struct device *dev, struct device > > > +*iommu_device) { > > > + struct mdev_device *mdev = to_mdev_device(dev); > > > + > > > + mdev->iommu_device = iommu_device; > > > + > > > + return 0; > > > +} > > > +EXPORT_SYMBOL(mdev_set_iommu_device); > > > + > > > +struct device *mdev_get_iommu_device(struct device *dev) { > > > + struct mdev_device *mdev = to_mdev_device(dev); > > > + > > > + return mdev->iommu_device; > > > +} > > > +EXPORT_SYMBOL(mdev_get_iommu_device); > > > + > > > static int __init mdev_init(void) > > > { > > > return mdev_bus_register(); > > > diff --git a/drivers/vfio/mdev/mdev_private.h > > > b/drivers/vfio/mdev/mdev_private.h > > > index 379758c52b1b..bfb7b22a7cb6 100644 > > > --- a/drivers/vfio/mdev/mdev_private.h > > > +++ b/drivers/vfio/mdev/mdev_private.h > > > @@ -34,6 +34,7 @@ struct mdev_device { > > > struct list_head next; > > > struct kobject *type_kobj; > > > bool active; > > > + struct device *iommu_device; > > > }; > > > > This is not a performance path, but it is a good practice to create naturally aligned/packed structures. > You should define struct device *iommu_device; before bool active. Agreed, if someone wants to fixup before commit or if there's another spin please do so, otherwise we can adjust it in a trivial follow-up patch. Thanks, Alex
On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote: > A parent device might create different types of mediated > devices. For example, a mediated device could be created > by the parent device with full isolation and protection > provided by the IOMMU. One usage case could be found on > Intel platforms where a mediated device is an assignable > subset of a PCI, the DMA requests on behalf of it are all > tagged with a PASID. Since IOMMU supports PASID-granular > translations (scalable mode in VT-d 3.0), this mediated > device could be individually protected and isolated by an > IOMMU. > > This patch adds a new member in the struct mdev_device to > indicate that the mediated device represented by mdev could > be isolated and protected by attaching a domain to a device > represented by mdev->iommu_device. It also adds a helper to > add or set the iommu device. > > * mdev_device->iommu_device > - This, if set, indicates that the mediated device could > be fully isolated and protected by IOMMU via attaching > an iommu domain to this device. If empty, it indicates > using vendor defined isolation, hence bypass IOMMU. > > * mdev_set/get_iommu_device(dev, iommu_device) > - Set or get the iommu device which represents this mdev > in IOMMU's device scope. Drivers don't need to set the > iommu device if it uses vendor defined isolation. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Suggested-by: Kevin Tian <kevin.tian@intel.com> > Suggested-by: Alex Williamson <alex.williamson@redhat.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > --- > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ > drivers/vfio/mdev/mdev_private.h | 1 + > include/linux/mdev.h | 14 ++++++++++++++ > 3 files changed, 33 insertions(+) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b96fedc77ee5..1b6435529166 100644 > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) > return 0; > } > > +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + mdev->iommu_device = iommu_device; > + > + return 0; > +} > +EXPORT_SYMBOL(mdev_set_iommu_device); I was looking at these functions when touching the mdev stuff and I have some concerns. 1) Please don't merge dead code. It is a year later and there is still no in-tree user for any of this. This is not our process. Even worse it was exported so it looks like this dead code is supporting out of tree modules. 2) Why is this like this? Every struct device already has a connection to the iommu layer and every mdev has a struct device all its own. Why did we need to add special 'if (mdev)' stuff all over the place? This smells like the same abuse Thomas and I pointed out for the interrupt domains. After my next series the mdev drivers will have direct access to the vfio_device. So an alternative to using the struct device, or adding 'if mdev' is to add an API to the vfio_device world to inject what iommu configuration is needed from that direction instead of trying to discover it from a struct device. 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons it was not acceptable to do this for the interrupt side either. 4) It seems pretty clear to me this will be heavily impacted by the /dev/ioasid discussion. Please consider removing the dead code now. Basically, please fix this before trying to get idxd mdev merged as the first user. Jason
Hi Jason, On 4/7/21 4:00 AM, Jason Gunthorpe wrote: > On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote: >> A parent device might create different types of mediated >> devices. For example, a mediated device could be created >> by the parent device with full isolation and protection >> provided by the IOMMU. One usage case could be found on >> Intel platforms where a mediated device is an assignable >> subset of a PCI, the DMA requests on behalf of it are all >> tagged with a PASID. Since IOMMU supports PASID-granular >> translations (scalable mode in VT-d 3.0), this mediated >> device could be individually protected and isolated by an >> IOMMU. >> >> This patch adds a new member in the struct mdev_device to >> indicate that the mediated device represented by mdev could >> be isolated and protected by attaching a domain to a device >> represented by mdev->iommu_device. It also adds a helper to >> add or set the iommu device. >> >> * mdev_device->iommu_device >> - This, if set, indicates that the mediated device could >> be fully isolated and protected by IOMMU via attaching >> an iommu domain to this device. If empty, it indicates >> using vendor defined isolation, hence bypass IOMMU. >> >> * mdev_set/get_iommu_device(dev, iommu_device) >> - Set or get the iommu device which represents this mdev >> in IOMMU's device scope. Drivers don't need to set the >> iommu device if it uses vendor defined isolation. >> >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Suggested-by: Kevin Tian <kevin.tian@intel.com> >> Suggested-by: Alex Williamson <alex.williamson@redhat.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> >> --- >> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ >> drivers/vfio/mdev/mdev_private.h | 1 + >> include/linux/mdev.h | 14 ++++++++++++++ >> 3 files changed, 33 insertions(+) >> >> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c >> index b96fedc77ee5..1b6435529166 100644 >> +++ b/drivers/vfio/mdev/mdev_core.c >> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) >> return 0; >> } >> >> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) >> +{ >> + struct mdev_device *mdev = to_mdev_device(dev); >> + >> + mdev->iommu_device = iommu_device; >> + >> + return 0; >> +} >> +EXPORT_SYMBOL(mdev_set_iommu_device); > > I was looking at these functions when touching the mdev stuff and I > have some concerns. > > 1) Please don't merge dead code. It is a year later and there is still > no in-tree user for any of this. This is not our process. Even > worse it was exported so it looks like this dead code is supporting > out of tree modules. > > 2) Why is this like this? Every struct device already has a connection > to the iommu layer and every mdev has a struct device all its own. > > Why did we need to add special 'if (mdev)' stuff all over the > place? This smells like the same abuse Thomas > and I pointed out for the interrupt domains. I've ever tried to implement a bus iommu_ops for mdev devices. https://lore.kernel.org/lkml/20201030045809.957927-1-baolu.lu@linux.intel.com/ Any comments? Best regards, baolu > > After my next series the mdev drivers will have direct access to > the vfio_device. So an alternative to using the struct device, or > adding 'if mdev' is to add an API to the vfio_device world to > inject what iommu configuration is needed from that direction > instead of trying to discover it from a struct device. > > 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in > drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons > it was not acceptable to do this for the interrupt side either. > > 4) It seems pretty clear to me this will be heavily impacted by the > /dev/ioasid discussion. Please consider removing the dead code now. > > Basically, please fix this before trying to get idxd mdev merged as > the first user. > > Jason >
On Wed, Apr 07, 2021 at 09:58:05AM +0800, Lu Baolu wrote: > I've ever tried to implement a bus iommu_ops for mdev devices. > > https://lore.kernel.org/lkml/20201030045809.957927-1-baolu.lu@linux.intel.com/ > > Any comments? You still have the symbol_get, so something continues to be wrong with that series: + mdev_bus = symbol_get(mdev_bus_type); + if (mdev_bus) { + if (bus == mdev_bus && !iommu_present(bus)) { + symbol_put(mdev_bus_type); Jason
Hi Jason, On 4/7/21 4:00 AM, Jason Gunthorpe wrote: > On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote: >> A parent device might create different types of mediated >> devices. For example, a mediated device could be created >> by the parent device with full isolation and protection >> provided by the IOMMU. One usage case could be found on >> Intel platforms where a mediated device is an assignable >> subset of a PCI, the DMA requests on behalf of it are all >> tagged with a PASID. Since IOMMU supports PASID-granular >> translations (scalable mode in VT-d 3.0), this mediated >> device could be individually protected and isolated by an >> IOMMU. >> >> This patch adds a new member in the struct mdev_device to >> indicate that the mediated device represented by mdev could >> be isolated and protected by attaching a domain to a device >> represented by mdev->iommu_device. It also adds a helper to >> add or set the iommu device. >> >> * mdev_device->iommu_device >> - This, if set, indicates that the mediated device could >> be fully isolated and protected by IOMMU via attaching >> an iommu domain to this device. If empty, it indicates >> using vendor defined isolation, hence bypass IOMMU. >> >> * mdev_set/get_iommu_device(dev, iommu_device) >> - Set or get the iommu device which represents this mdev >> in IOMMU's device scope. Drivers don't need to set the >> iommu device if it uses vendor defined isolation. >> >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Suggested-by: Kevin Tian <kevin.tian@intel.com> >> Suggested-by: Alex Williamson <alex.williamson@redhat.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> >> --- >> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++ >> drivers/vfio/mdev/mdev_private.h | 1 + >> include/linux/mdev.h | 14 ++++++++++++++ >> 3 files changed, 33 insertions(+) >> >> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c >> index b96fedc77ee5..1b6435529166 100644 >> +++ b/drivers/vfio/mdev/mdev_core.c >> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) >> return 0; >> } >> >> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) >> +{ >> + struct mdev_device *mdev = to_mdev_device(dev); >> + >> + mdev->iommu_device = iommu_device; >> + >> + return 0; >> +} >> +EXPORT_SYMBOL(mdev_set_iommu_device); > > I was looking at these functions when touching the mdev stuff and I > have some concerns. > > 1) Please don't merge dead code. It is a year later and there is still > no in-tree user for any of this. This is not our process. Even > worse it was exported so it looks like this dead code is supporting > out of tree modules. > > 2) Why is this like this? Every struct device already has a connection > to the iommu layer and every mdev has a struct device all its own. > > Why did we need to add special 'if (mdev)' stuff all over the > place? This smells like the same abuse Thomas > and I pointed out for the interrupt domains. > > After my next series the mdev drivers will have direct access to > the vfio_device. So an alternative to using the struct device, or > adding 'if mdev' is to add an API to the vfio_device world to > inject what iommu configuration is needed from that direction > instead of trying to discover it from a struct device. Just want to make sure that I understand you correctly. We should use the existing IOMMU in-kernel APIs to connect mdev with the iommu subsystem, so that the upper lays don't need to use something like (if dev_is_mdev) to handle mdev differently. Do I get you correctly? > > 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in > drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons > it was not acceptable to do this for the interrupt side either. Yes. Agreed. I will look into it. > > 4) It seems pretty clear to me this will be heavily impacted by the > /dev/ioasid discussion. Please consider removing the dead code now. > > Basically, please fix this before trying to get idxd mdev merged as > the first user. > > Jason > Best regards, baolu
On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote: > > After my next series the mdev drivers will have direct access to > > the vfio_device. So an alternative to using the struct device, or > > adding 'if mdev' is to add an API to the vfio_device world to > > inject what iommu configuration is needed from that direction > > instead of trying to discover it from a struct device. > > Just want to make sure that I understand you correctly. > > We should use the existing IOMMU in-kernel APIs to connect mdev with the > iommu subsystem, so that the upper lays don't need to use something > like (if dev_is_mdev) to handle mdev differently. Do I get you > correctly? After going through all the /dev/ioasid stuff I'm pretty convinced that none of the PASID use cases for mdev should need any iommu connection from the mdev_device - this is an artifact of trying to cram the vfio container and group model into the mdev world and is not good design. The PASID interfaces for /dev/ioasid should use the 'struct pci_device' for everything and never pass in a mdev_device to the iommu layer. /dev/ioasid should be designed to support this operation and is why I strongly want to see the actual vfio_device implementation handle the connection to the iommu layer and not keep trying to hack through building what is actually a vfio_device specific connection through the type1 container code. > > 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in > > drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons > > it was not acceptable to do this for the interrupt side either. > > Yes. Agreed. I will look into it. This will be harder, but the same logic applies - it serves to allow controlling an ioasid without involving the vfio_device. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, May 12, 2021 1:37 AM > > On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote: > > > > After my next series the mdev drivers will have direct access to > > > the vfio_device. So an alternative to using the struct device, or > > > adding 'if mdev' is to add an API to the vfio_device world to > > > inject what iommu configuration is needed from that direction > > > instead of trying to discover it from a struct device. > > > > Just want to make sure that I understand you correctly. > > > > We should use the existing IOMMU in-kernel APIs to connect mdev with the > > iommu subsystem, so that the upper lays don't need to use something > > like (if dev_is_mdev) to handle mdev differently. Do I get you > > correctly? > > After going through all the /dev/ioasid stuff I'm pretty convinced > that none of the PASID use cases for mdev should need any iommu > connection from the mdev_device - this is an artifact of trying to > cram the vfio container and group model into the mdev world and is not > good design. > > The PASID interfaces for /dev/ioasid should use the 'struct > pci_device' for everything and never pass in a mdev_device to the > iommu layer. 'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support non-pci devices? > > /dev/ioasid should be designed to support this operation and is why I > strongly want to see the actual vfio_device implementation handle the > connection to the iommu layer and not keep trying to hack through > building what is actually a vfio_device specific connection through > the type1 container code. > I assume the so-called connection here implies using iommu_attach_device to attach a device to an iommu domain. Did you suggest this connection must be done by the mdev driver which implements vfio_device and then passing iommu domain to /dev/ioasid when attaching the device to an IOASID? sort of like: ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain); If yes, this conflicts with one design in /dev/ioasid proposal that we're working on. In earlier discussion we agreed that each ioasid is associated to a singleton iommu domain and all devices that are attached to this ioasid with compatible iommu capabilities just share this domain. It implies that iommu domain is allocated at ATTACH_IOASID phase (e.g. when the 1st device is attached to an ioasid). Pre-allocating domain by vfio_device driver means that every device (SR-IOV or mdev) has its own domain thus cannot share ioasid then. Did I misunderstand your intention? Baolu and I discussed below draft proposal to avoid passing mdev_device to the iommu layer. Please check whether it makes sense: // for every device attached to an ioasid // mdev is represented by pasid (allocated by mdev driver) // pf/vf has INVALID_IOASID in pasid struct dev_info { struct list_head next; struct device *device; u32 pasid; } // for every allocated ioasid struct ioasid_info { // the handle to convey iommu operations struct iommu_domain *domain; // metadata for map/unmap struct rb_node dma_list; // the list of attached device struct dev_info *dev_list; ... } // called by VFIO/VDPA int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid) { // allocate a new dev_info, filled with device/pasid // allocate iommu domain if it's the 1st attached device // check iommu compatibility if an domain already exists // attach the device to the iommu domain if (pasid == INVALID_IOASID) iommu_attach_device(domain, device); else iommu_aux_attach_device(domain, device, pasid); // add dev_info to the dev_list of ioasid_info } // when attaching PF/VF to an ioasid ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid); -> get vfio_device of device_fd -> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID); // when attaching a mdev to an ioasid ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid); -> get vfio_device of device_fd -> find mdev_parent of vfio_device -> find pasid allocated to this mdev -> ioasid_attach_device(parent->dev, ioasid, pasid); starting from this point the vfio device has been connected to the iommu layer. /dev/ioasid can accept pgtable cmd on this ioasid and associated domain. Thanks Kevin
On Wed, May 12, 2021 at 07:46:05AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, May 12, 2021 1:37 AM > > > > On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote: > > > > > > After my next series the mdev drivers will have direct access to > > > > the vfio_device. So an alternative to using the struct device, or > > > > adding 'if mdev' is to add an API to the vfio_device world to > > > > inject what iommu configuration is needed from that direction > > > > instead of trying to discover it from a struct device. > > > > > > Just want to make sure that I understand you correctly. > > > > > > We should use the existing IOMMU in-kernel APIs to connect mdev with the > > > iommu subsystem, so that the upper lays don't need to use something > > > like (if dev_is_mdev) to handle mdev differently. Do I get you > > > correctly? > > > > After going through all the /dev/ioasid stuff I'm pretty convinced > > that none of the PASID use cases for mdev should need any iommu > > connection from the mdev_device - this is an artifact of trying to > > cram the vfio container and group model into the mdev world and is not > > good design. > > > > The PASID interfaces for /dev/ioasid should use the 'struct > > pci_device' for everything and never pass in a mdev_device to the > > iommu layer. > > 'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support > non-pci devices? I don't know. PASID is a PCI concept, I half expect to have at least some wrappers for PCI specific IOMMU APIs so there is reasonable type safety possible. But maybe it is all general enough that isn't needed. > I assume the so-called connection here implies using iommu_attach_device > to attach a device to an iommu domain. yes > Did you suggest this connection must be done by the mdev driver > which implements vfio_device and then passing iommu domain to > /dev/ioasid when attaching the device to an IOASID? Why do we need iommu domain in a uAPI at all? It is an artifact of the current kernel implementation > sort of like: ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain); ioasid and device_fd completely describe the IOMMU parameters, don't they? > If yes, this conflicts with one design in /dev/ioasid proposal that we're > working on. In earlier discussion we agreed that each ioasid is associated > to a singleton iommu domain and all devices that are attached to this > ioasid with compatible iommu capabilities just share this domain. I think you need to stand back a bit more from the current detailed implementation of the iommu API. /dev/ioasid needs to be able to create IOASID objects that can be used with as many devices as reasonable without duplicating the page tables. If or how that maps to todays iommu layer I don't know. Remember the uAPI is forever, the kernel internals can change. > Baolu and I discussed below draft proposal to avoid passing mdev_device > to the iommu layer. Please check whether it makes sense: > > // for every device attached to an ioasid > // mdev is represented by pasid (allocated by mdev driver) > // pf/vf has INVALID_IOASID in pasid > struct dev_info { > struct list_head next; > struct device *device; > u32 pasid; > } This is a list of "attachments"? sure > // for every allocated ioasid > struct ioasid_info { > // the handle to convey iommu operations > struct iommu_domain *domain; > // metadata for map/unmap > struct rb_node dma_list; > // the list of attached device > struct dev_info *dev_list; > ... > } Yes, probably something basically like that > // called by VFIO/VDPA > int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid) 'u32 ioasid' should be a 'struct ioasid_info *' and 'pasid' is not needed because ioasif_info->dev_list[..]->pasid already stores the value. Keep in mind at this API level the 'struct device *' here shuld be the PCI device never the mdev device. > { > // allocate a new dev_info, filled with device/pasid > // allocate iommu domain if it's the 1st attached device > // check iommu compatibility if an domain already exists > > // attach the device to the iommu domain > if (pasid == INVALID_IOASID) > iommu_attach_device(domain, device); > else > iommu_aux_attach_device(domain, device, pasid); And if device is the pci_device I don't really see how this works. This API layer would need to create some dummy struct device to attach the aux domain too if you want to keep re-using the stuff we have today. The idea that a PASID is 1:1 with a 'struct device' is completely VFIO centric thinking. > // when attaching PF/VF to an ioasid > ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid); This would have to be a (ioasif_fd, ioasid) tuple as an ioasid is scoped within each fd. > -> get vfio_device of device_fd > -> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID); The device knows if it is going to use a PASID or not. The API level here should always very explicitly indicate *device* intention. If the device knows it is going to form DMA's with a PASID tag then it absoultely must be completely explict and clear in the API to the layers below that is what is happening. 'INVALID_IOASID' does not communicate that ideal to me. > // when attaching a mdev to an ioasid > ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid); > -> get vfio_device of device_fd > -> find mdev_parent of vfio_device > -> find pasid allocated to this mdev > -> ioasid_attach_device(parent->dev, ioasid, pasid); Again no, mdev shouldn't be involved, it is the wrong layering. Jason
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index b96fedc77ee5..1b6435529166 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove) return 0; } +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + mdev->iommu_device = iommu_device; + + return 0; +} +EXPORT_SYMBOL(mdev_set_iommu_device); + +struct device *mdev_get_iommu_device(struct device *dev) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return mdev->iommu_device; +} +EXPORT_SYMBOL(mdev_get_iommu_device); + static int __init mdev_init(void) { return mdev_bus_register(); diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index 379758c52b1b..bfb7b22a7cb6 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -34,6 +34,7 @@ struct mdev_device { struct list_head next; struct kobject *type_kobj; bool active; + struct device *iommu_device; }; #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) diff --git a/include/linux/mdev.h b/include/linux/mdev.h index d7aee90e5da5..df2ea39f47ee 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -15,6 +15,20 @@ struct mdev_device; +/* + * Called by the parent device driver to set the device which represents + * this mdev in iommu protection scope. By default, the iommu device is + * NULL, that indicates using vendor defined isolation. + * + * @dev: the mediated device that iommu will isolate. + * @iommu_device: a pci device which represents the iommu for @dev. + * + * Return 0 for success, otherwise negative error value. + */ +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); + +struct device *mdev_get_iommu_device(struct device *dev); + /** * struct mdev_parent_ops - Structure to be registered for each parent device to * register the device to mdev module.