Message ID | 20180511190641.23008-2-jean-philippe.brucker@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 11 May 2018 20:06:02 +0100 Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > Shared Virtual Addressing (SVA) provides a way for device drivers to > bind process address spaces to devices. This requires the IOMMU to > support page table format and features compatible with the CPUs, and > usually requires the system to support I/O Page Faults (IOPF) and > Process Address Space ID (PASID). When all of these are available, > DMA can access virtual addresses of a process. A PASID is allocated > for each process, and the device driver programs it into the device > in an implementation-specific way. > > Add a new API for sharing process page tables with devices. Introduce > two IOMMU operations, sva_device_init() and sva_device_shutdown(), > that prepare the IOMMU driver for SVA. For example allocate PASID > tables and fault queues. Subsequent patches will implement the bind() > and unbind() operations. > > Support for I/O Page Faults will be added in a later patch using a new > feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin > down all shared mappings. Other feature bits that may be added in the > future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address > spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address > space to a process. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > > --- > v1->v2: > * Add sva_param structure to iommu_param > * CONFIG option is only selectable by IOMMU drivers > --- > drivers/iommu/Kconfig | 4 ++ > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-sva.c | 110 > ++++++++++++++++++++++++++++++++++++++ include/linux/iommu.h | > 32 +++++++++++ 4 files changed, 147 insertions(+) > create mode 100644 drivers/iommu/iommu-sva.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 7564237f788d..cca8e06903c7 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -74,6 +74,10 @@ config IOMMU_DMA > select IOMMU_IOVA > select NEED_SG_DMA_LENGTH > > +config IOMMU_SVA > + bool > + select IOMMU_API > + > config FSL_PAMU > bool "Freescale IOMMU support" > depends on PCI > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 1fb695854809..1dbcc89ebe4c 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o > obj-$(CONFIG_IOMMU_API) += iommu-traces.o > obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o > obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o > +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o > obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c > new file mode 100644 > index 000000000000..8b4afb7c63ae > --- /dev/null > +++ b/drivers/iommu/iommu-sva.c > @@ -0,0 +1,110 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Manage PASIDs and bind process address spaces to devices. > + * > + * Copyright (C) 2018 ARM Ltd. > + */ > + > +#include <linux/iommu.h> > +#include <linux/slab.h> > + > +/** > + * iommu_sva_device_init() - Initialize Shared Virtual Addressing > for a device > + * @dev: the device > + * @features: bitmask of features that need to be initialized > + * @max_pasid: max PASID value supported by the device > + * > + * Users of the bind()/unbind() API must call this function to > initialize all > + * features required for SVA. > + * > + * The device must support multiple address spaces (e.g. PCI PASID). > By default > + * the PASID allocated during bind() is limited by the IOMMU > capacity, and by > + * the device PASID width defined in the PCI capability or in the > firmware > + * description. Setting @max_pasid to a non-zero value smaller than > this limit > + * overrides it. > + * seems the min_pasid never gets used. do you really need it? > + * The device should not be performing any DMA while this function > is running, > + * otherwise the behavior is undefined. > + * > + * Return 0 if initialization succeeded, or an error. > + */ > +int iommu_sva_device_init(struct device *dev, unsigned long features, > + unsigned int max_pasid) > +{ > + int ret; > + struct iommu_sva_param *param; > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + > + if (!domain || !domain->ops->sva_device_init) > + return -ENODEV; > + > + if (features) > + return -EINVAL; should it be !features? > + > + param = kzalloc(sizeof(*param), GFP_KERNEL); > + if (!param) > + return -ENOMEM; > + > + param->features = features; > + param->max_pasid = max_pasid; > + > + /* > + * IOMMU driver updates the limits depending on the IOMMU > and device > + * capabilities. > + */ > + ret = domain->ops->sva_device_init(dev, param); > + if (ret) > + goto err_free_param; > + > + mutex_lock(&dev->iommu_param->lock); > + if (dev->iommu_param->sva_param) > + ret = -EEXIST; > + else > + dev->iommu_param->sva_param = param; > + mutex_unlock(&dev->iommu_param->lock); > + if (ret) > + goto err_device_shutdown; > + > + return 0; > + > +err_device_shutdown: > + if (domain->ops->sva_device_shutdown) > + domain->ops->sva_device_shutdown(dev, param); > + > +err_free_param: > + kfree(param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_sva_device_init); > + > +/** > + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing > for a device > + * @dev: the device > + * > + * Disable SVA. Device driver should ensure that the device isn't > performing any > + * DMA while this function is running. > + */ > +int iommu_sva_device_shutdown(struct device *dev) > +{ > + struct iommu_sva_param *param; > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + > + if (!domain) > + return -ENODEV; > + > + mutex_lock(&dev->iommu_param->lock); > + param = dev->iommu_param->sva_param; > + dev->iommu_param->sva_param = NULL; > + mutex_unlock(&dev->iommu_param->lock); > + if (!param) > + return -ENODEV; > + > + if (domain->ops->sva_device_shutdown) > + domain->ops->sva_device_shutdown(dev, param); seems a little mismatch here, do you need pass the param. I don't think there is anything else model specific iommu driver need to do for the param. > + > + kfree(param); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown); > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 0933f726d2e6..2efe7738bedb 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -212,6 +212,12 @@ struct page_response_msg { > u64 private_data; > }; > > +struct iommu_sva_param { > + unsigned long features; > + unsigned int min_pasid; > + unsigned int max_pasid; > +}; > + > /** > * struct iommu_ops - iommu ops and capabilities > * @capable: check capability > @@ -219,6 +225,8 @@ struct page_response_msg { > * @domain_free: free iommu domain > * @attach_dev: attach device to an iommu domain > * @detach_dev: detach device from an iommu domain > + * @sva_device_init: initialize Shared Virtual Adressing for a device > + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a > device > * @map: map a physically contiguous memory region to an iommu domain > * @unmap: unmap a physically contiguous memory region from an iommu > domain > * @map_sg: map a scatter-gather list of physically contiguous > memory chunks @@ -256,6 +264,10 @@ struct iommu_ops { > > int (*attach_dev)(struct iommu_domain *domain, struct device > *dev); void (*detach_dev)(struct iommu_domain *domain, struct device > *dev); > + int (*sva_device_init)(struct device *dev, > + struct iommu_sva_param *param); > + void (*sva_device_shutdown)(struct device *dev, > + struct iommu_sva_param *param); > int (*map)(struct iommu_domain *domain, unsigned long iova, > phys_addr_t paddr, size_t size, int prot); > size_t (*unmap)(struct iommu_domain *domain, unsigned long > iova, @@ -413,6 +425,7 @@ struct iommu_fault_param { > * struct iommu_param - collection of per-device IOMMU data > * > * @fault_param: IOMMU detected device fault reporting data > + * @sva_param: SVA parameters > * > * TODO: migrate other per device data pointers under > iommu_dev_data, e.g. > * struct iommu_group *iommu_group; > @@ -421,6 +434,7 @@ struct iommu_fault_param { > struct iommu_param { > struct mutex lock; > struct iommu_fault_param *fault_param; > + struct iommu_sva_param *sva_param; > }; > > int iommu_device_register(struct iommu_device *iommu); > @@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct > iommu_domain *domain, > #endif /* CONFIG_IOMMU_API */ > > +#ifdef CONFIG_IOMMU_SVA > +extern int iommu_sva_device_init(struct device *dev, unsigned long > features, > + unsigned int max_pasid); > +extern int iommu_sva_device_shutdown(struct device *dev); > +#else /* CONFIG_IOMMU_SVA */ > +static inline int iommu_sva_device_init(struct device *dev, > + unsigned long features, > + unsigned int max_pasid) > +{ > + return -ENODEV; > +} > + > +static inline int iommu_sva_device_shutdown(struct device *dev) > +{ > + return -ENODEV; > +} > +#endif /* CONFIG_IOMMU_SVA */ > + > #endif /* __LINUX_IOMMU_H */ [Jacob Pan]
Hi Jacob, Thanks for reviewing this On 16/05/18 21:41, Jacob Pan wrote: >> + * The device must support multiple address spaces (e.g. PCI PASID). >> By default >> + * the PASID allocated during bind() is limited by the IOMMU >> capacity, and by >> + * the device PASID width defined in the PCI capability or in the >> firmware >> + * description. Setting @max_pasid to a non-zero value smaller than >> this limit >> + * overrides it. >> + * > seems the min_pasid never gets used. do you really need it? Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve PASID 0 >> + * The device should not be performing any DMA while this function >> is running, >> + * otherwise the behavior is undefined. >> + * >> + * Return 0 if initialization succeeded, or an error. >> + */ >> +int iommu_sva_device_init(struct device *dev, unsigned long features, >> + unsigned int max_pasid) >> +{ >> + int ret; >> + struct iommu_sva_param *param; >> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); >> + >> + if (!domain || !domain->ops->sva_device_init) >> + return -ENODEV; >> + >> + if (features) >> + return -EINVAL; > should it be !features? This checks if the user sets any unsupported bit in features. No feature is supported right now, but patch 09/40 adds IOMMU_SVA_FEAT_IOPF, and changes this line to "features & ~IOMMU_SVA_FEAT_IOPF" >> + mutex_lock(&dev->iommu_param->lock); >> + param = dev->iommu_param->sva_param; >> + dev->iommu_param->sva_param = NULL; >> + mutex_unlock(&dev->iommu_param->lock); >> + if (!param) >> + return -ENODEV; >> + >> + if (domain->ops->sva_device_shutdown) >> + domain->ops->sva_device_shutdown(dev, param); > seems a little mismatch here, do you need pass the param. I don't think > there is anything else model specific iommu driver need to do for the > param. SMMU doesn't use it, but maybe it would remind other IOMMU driver which features were enabled, so they don't have to keep track of that themselves? I can remove it if it isn't useful Thanks, Jean
On Thu, 17 May 2018 11:02:02 +0100 Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > Hi Jacob, > > Thanks for reviewing this > > On 16/05/18 21:41, Jacob Pan wrote: > [...] > > seems the min_pasid never gets used. do you really need it? > > Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve > PASID 0 > > [...] > > should it be !features? > > This checks if the user sets any unsupported bit in features. No > feature is supported right now, but patch 09/40 adds > IOMMU_SVA_FEAT_IOPF, and changes this line to "features & > ~IOMMU_SVA_FEAT_IOPF" > > >> + mutex_lock(&dev->iommu_param->lock); > >> + param = dev->iommu_param->sva_param; > >> + dev->iommu_param->sva_param = NULL; > >> + mutex_unlock(&dev->iommu_param->lock); > >> + if (!param) > >> + return -ENODEV; > >> + > >> + if (domain->ops->sva_device_shutdown) > >> + domain->ops->sva_device_shutdown(dev, param); > > seems a little mismatch here, do you need pass the param. I don't > > think there is anything else model specific iommu driver need to do > > for the param. > > SMMU doesn't use it, but maybe it would remind other IOMMU driver > which features were enabled, so they don't have to keep track of that > themselves? I can remove it if it isn't useful > If there is a use case, I guess iommu driver can always retrieve the param from struct device. > Thanks, > Jean [Jacob Pan]
Hi Jean-Philippe, On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote: > Shared Virtual Addressing (SVA) provides a way for device drivers to bind > process address spaces to devices. This requires the IOMMU to support page > table format and features compatible with the CPUs, and usually requires > the system to support I/O Page Faults (IOPF) and Process Address Space ID > (PASID). When all of these are available, DMA can access virtual addresses > of a process. A PASID is allocated for each process, and the device driver > programs it into the device in an implementation-specific way. > > Add a new API for sharing process page tables with devices. Introduce two > IOMMU operations, sva_device_init() and sva_device_shutdown(), that > prepare the IOMMU driver for SVA. For example allocate PASID tables and > fault queues. Subsequent patches will implement the bind() and unbind() > operations. > > Support for I/O Page Faults will be added in a later patch using a new > feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin > down all shared mappings. Other feature bits that may be added in the > future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address > spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address > space to a process. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > > --- > v1->v2: > * Add sva_param structure to iommu_param > * CONFIG option is only selectable by IOMMU drivers > --- > drivers/iommu/Kconfig | 4 ++ > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-sva.c | 110 ++++++++++++++++++++++++++++++++++++++ > include/linux/iommu.h | 32 +++++++++++ > 4 files changed, 147 insertions(+) > create mode 100644 drivers/iommu/iommu-sva.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 7564237f788d..cca8e06903c7 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -74,6 +74,10 @@ config IOMMU_DMA > select IOMMU_IOVA > select NEED_SG_DMA_LENGTH > > +config IOMMU_SVA > + bool > + select IOMMU_API > + > config FSL_PAMU > bool "Freescale IOMMU support" > depends on PCI > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 1fb695854809..1dbcc89ebe4c 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o > obj-$(CONFIG_IOMMU_API) += iommu-traces.o > obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o > obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o > +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o > obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c > new file mode 100644 > index 000000000000..8b4afb7c63ae > --- /dev/null > +++ b/drivers/iommu/iommu-sva.c > @@ -0,0 +1,110 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Manage PASIDs and bind process address spaces to devices. > + * > + * Copyright (C) 2018 ARM Ltd. > + */ > + > +#include <linux/iommu.h> > +#include <linux/slab.h> > + > +/** > + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device > + * @dev: the device > + * @features: bitmask of features that need to be initialized > + * @max_pasid: max PASID value supported by the device > + * > + * Users of the bind()/unbind() API must call this function to initialize all > + * features required for SVA. > + * > + * The device must support multiple address spaces (e.g. PCI PASID). By default > + * the PASID allocated during bind() is limited by the IOMMU capacity, and by > + * the device PASID width defined in the PCI capability or in the firmware > + * description. Setting @max_pasid to a non-zero value smaller than this limit > + * overrides it. > + * > + * The device should not be performing any DMA while this function is running, > + * otherwise the behavior is undefined. > + * > + * Return 0 if initialization succeeded, or an error. > + */ > +int iommu_sva_device_init(struct device *dev, unsigned long features, > + unsigned int max_pasid) what about min_pasid? > +{ > + int ret; > + struct iommu_sva_param *param; > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + > + if (!domain || !domain->ops->sva_device_init) > + return -ENODEV; > + > + if (features) > + return -EINVAL; > + > + param = kzalloc(sizeof(*param), GFP_KERNEL); > + if (!param) > + return -ENOMEM; > + > + param->features = features; > + param->max_pasid = max_pasid; > + > + /* > + * IOMMU driver updates the limits depending on the IOMMU and device > + * capabilities. > + */ > + ret = domain->ops->sva_device_init(dev, param); > + if (ret) > + goto err_free_param; So you are likely to call sva_device_init even if it was already called (ie. dev->iommu_param->sva_param is already set). Can't you test whether dev->iommu_param->sva_param is already set first? > + > + mutex_lock(&dev->iommu_param->lock); > + if (dev->iommu_param->sva_param) > + ret = -EEXIST; > + else > + dev->iommu_param->sva_param = param; > + mutex_unlock(&dev->iommu_param->lock); > + if (ret) > + goto err_device_shutdown; > + > + return 0; > + > +err_device_shutdown: > + if (domain->ops->sva_device_shutdown) > + domain->ops->sva_device_shutdown(dev, param); > + > +err_free_param: > + kfree(param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_sva_device_init); > + > +/** > + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device > + * @dev: the device > + * > + * Disable SVA. Device driver should ensure that the device isn't performing any > + * DMA while this function is running. > + */ > +int iommu_sva_device_shutdown(struct device *dev) Not sure the returned value is required for a shutdown operation. > +{ > + struct iommu_sva_param *param; > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + > + if (!domain) > + return -ENODEV; > + > + mutex_lock(&dev->iommu_param->lock); > + param = dev->iommu_param->sva_param; > + dev->iommu_param->sva_param = NULL; > + mutex_unlock(&dev->iommu_param->lock); > + if (!param) > + return -ENODEV; > + > + if (domain->ops->sva_device_shutdown) > + domain->ops->sva_device_shutdown(dev, param); > + > + kfree(param); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown); > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 0933f726d2e6..2efe7738bedb 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -212,6 +212,12 @@ struct page_response_msg { > u64 private_data; > }; > > +struct iommu_sva_param { What are the feature values? > + unsigned long features; > + unsigned int min_pasid; > + unsigned int max_pasid; > +}; > + > /** > * struct iommu_ops - iommu ops and capabilities > * @capable: check capability > @@ -219,6 +225,8 @@ struct page_response_msg { > * @domain_free: free iommu domain > * @attach_dev: attach device to an iommu domain > * @detach_dev: detach device from an iommu domain > + * @sva_device_init: initialize Shared Virtual Adressing for a device Addressing > + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device Addressing > * @map: map a physically contiguous memory region to an iommu domain > * @unmap: unmap a physically contiguous memory region from an iommu domain > * @map_sg: map a scatter-gather list of physically contiguous memory chunks > @@ -256,6 +264,10 @@ struct iommu_ops { > > int (*attach_dev)(struct iommu_domain *domain, struct device *dev); > void (*detach_dev)(struct iommu_domain *domain, struct device *dev); > + int (*sva_device_init)(struct device *dev, > + struct iommu_sva_param *param); > + void (*sva_device_shutdown)(struct device *dev, > + struct iommu_sva_param *param); > int (*map)(struct iommu_domain *domain, unsigned long iova, > phys_addr_t paddr, size_t size, int prot); > size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, > @@ -413,6 +425,7 @@ struct iommu_fault_param { > * struct iommu_param - collection of per-device IOMMU data > * > * @fault_param: IOMMU detected device fault reporting data > + * @sva_param: SVA parameters > * > * TODO: migrate other per device data pointers under iommu_dev_data, e.g. > * struct iommu_group *iommu_group; > @@ -421,6 +434,7 @@ struct iommu_fault_param { > struct iommu_param { > struct mutex lock; > struct iommu_fault_param *fault_param; > + struct iommu_sva_param *sva_param; > }; > > int iommu_device_register(struct iommu_device *iommu); > @@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain, > > #endif /* CONFIG_IOMMU_API */ > > +#ifdef CONFIG_IOMMU_SVA > +extern int iommu_sva_device_init(struct device *dev, unsigned long features, > + unsigned int max_pasid); > +extern int iommu_sva_device_shutdown(struct device *dev); > +#else /* CONFIG_IOMMU_SVA */ > +static inline int iommu_sva_device_init(struct device *dev, > + unsigned long features, > + unsigned int max_pasid) > +{ > + return -ENODEV; > +} > + > +static inline int iommu_sva_device_shutdown(struct device *dev) > +{ > + return -ENODEV; > +} > +#endif /* CONFIG_IOMMU_SVA */ > + > #endif /* __LINUX_IOMMU_H */ > Thanks Eric
Hi Eric, Thanks for reviewing On 05/09/2018 12:29, Auger Eric wrote: >> +int iommu_sva_device_init(struct device *dev, unsigned long features, >> + unsigned int max_pasid) > what about min_pasid? No one asked for it... The max_pasid parameter is here for drivers that have vendor-specific PASID size limits, such as AMD KFD (see kfd_iommu_device_init and https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases the PASID size will only depend on the PCI PASID capability and the IOMMU limits, both known by the IOMMU driver, so device drivers won't have to set max_pasid. IOMMU drivers need to set min_pasid in the sva_device_init callback because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0 (Intel Vt-d rev2), but at the moment I can't see a reason for device drivers to override min_pasid >> + /* >> + * IOMMU driver updates the limits depending on the IOMMU and device >> + * capabilities. >> + */ >> + ret = domain->ops->sva_device_init(dev, param); >> + if (ret) >> + goto err_free_param; > So you are likely to call sva_device_init even if it was already called > (ie. dev->iommu_param->sva_param is already set). Can't you test whether > dev->iommu_param->sva_param is already set first? Right, that's probably better >> +/** >> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device >> + * @dev: the device >> + * >> + * Disable SVA. Device driver should ensure that the device isn't performing any >> + * DMA while this function is running. >> + */ >> +int iommu_sva_device_shutdown(struct device *dev) > Not sure the returned value is required for a shutdown operation. I don't know either. I like them because they help me debug, but are otherwise rather useless if we don't describe precise semantics. The caller cannot do anything with it. Given that the corresponding IOMMU op is already void, I can change this function to void as well >> +struct iommu_sva_param { > What are the feature values? At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09 >> + unsigned long features; >> + unsigned int min_pasid; >> + unsigned int max_pasid; >> +}; >> + >> /** >> * struct iommu_ops - iommu ops and capabilities >> * @capable: check capability >> @@ -219,6 +225,8 @@ struct page_response_msg { >> * @domain_free: free iommu domain >> * @attach_dev: attach device to an iommu domain >> * @detach_dev: detach device from an iommu domain >> + * @sva_device_init: initialize Shared Virtual Adressing for a device > Addressing >> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device > Addressing Nice catch Thanks, Jean
Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker: > Hi Eric, > > Thanks for reviewing > > On 05/09/2018 12:29, Auger Eric wrote: >>> +int iommu_sva_device_init(struct device *dev, unsigned long features, >>> + unsigned int max_pasid) >> what about min_pasid? > No one asked for it... The max_pasid parameter is here for drivers that > have vendor-specific PASID size limits, such as AMD KFD (see > kfd_iommu_device_init and > https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases > the PASID size will only depend on the PCI PASID capability and the > IOMMU limits, both known by the IOMMU driver, so device drivers won't > have to set max_pasid. > > IOMMU drivers need to set min_pasid in the sva_device_init callback > because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0 > (Intel Vt-d rev2), but at the moment I can't see a reason for device > drivers to override min_pasid Sorry to ruin your day, but if I'm not completely mistaken PASID zero is reserved in the AMD KFD as well. Regards, Christian. > >>> + /* >>> + * IOMMU driver updates the limits depending on the IOMMU and device >>> + * capabilities. >>> + */ >>> + ret = domain->ops->sva_device_init(dev, param); >>> + if (ret) >>> + goto err_free_param; >> So you are likely to call sva_device_init even if it was already called >> (ie. dev->iommu_param->sva_param is already set). Can't you test whether >> dev->iommu_param->sva_param is already set first? > Right, that's probably better > >>> +/** >>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device >>> + * @dev: the device >>> + * >>> + * Disable SVA. Device driver should ensure that the device isn't performing any >>> + * DMA while this function is running. >>> + */ >>> +int iommu_sva_device_shutdown(struct device *dev) >> Not sure the returned value is required for a shutdown operation. > I don't know either. I like them because they help me debug, but are > otherwise rather useless if we don't describe precise semantics. The > caller cannot do anything with it. Given that the corresponding IOMMU op > is already void, I can change this function to void as well > >>> +struct iommu_sva_param { >> What are the feature values? > At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09 > >>> + unsigned long features; >>> + unsigned int min_pasid; >>> + unsigned int max_pasid; >>> +}; >>> + >>> /** >>> * struct iommu_ops - iommu ops and capabilities >>> * @capable: check capability >>> @@ -219,6 +225,8 @@ struct page_response_msg { >>> * @domain_free: free iommu domain >>> * @attach_dev: attach device to an iommu domain >>> * @detach_dev: detach device from an iommu domain >>> + * @sva_device_init: initialize Shared Virtual Adressing for a device >> Addressing >>> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device >> Addressing > Nice catch > > Thanks, > Jean
On 06/09/2018 12:12, Christian König wrote: > Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker: >> Hi Eric, >> >> Thanks for reviewing >> >> On 05/09/2018 12:29, Auger Eric wrote: >>>> +int iommu_sva_device_init(struct device *dev, unsigned long features, >>>> + unsigned int max_pasid) >>> what about min_pasid? >> No one asked for it... The max_pasid parameter is here for drivers that >> have vendor-specific PASID size limits, such as AMD KFD (see >> kfd_iommu_device_init and >> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases >> the PASID size will only depend on the PCI PASID capability and the >> IOMMU limits, both known by the IOMMU driver, so device drivers won't >> have to set max_pasid. >> >> IOMMU drivers need to set min_pasid in the sva_device_init callback >> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0 >> (Intel Vt-d rev2), but at the moment I can't see a reason for device >> drivers to override min_pasid > > Sorry to ruin your day, but if I'm not completely mistaken PASID zero is > reserved in the AMD KFD as well. Heh, fair enough. I'll add the min_pasid parameter Thanks, Jean
Am 06.09.2018 um 14:45 schrieb Jean-Philippe Brucker: > On 06/09/2018 12:12, Christian König wrote: >> Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker: >>> Hi Eric, >>> >>> Thanks for reviewing >>> >>> On 05/09/2018 12:29, Auger Eric wrote: >>>>> +int iommu_sva_device_init(struct device *dev, unsigned long features, >>>>> + unsigned int max_pasid) >>>> what about min_pasid? >>> No one asked for it... The max_pasid parameter is here for drivers that >>> have vendor-specific PASID size limits, such as AMD KFD (see >>> kfd_iommu_device_init and >>> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases >>> the PASID size will only depend on the PCI PASID capability and the >>> IOMMU limits, both known by the IOMMU driver, so device drivers won't >>> have to set max_pasid. >>> >>> IOMMU drivers need to set min_pasid in the sva_device_init callback >>> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0 >>> (Intel Vt-d rev2), but at the moment I can't see a reason for device >>> drivers to override min_pasid >> Sorry to ruin your day, but if I'm not completely mistaken PASID zero is >> reserved in the AMD KFD as well. > Heh, fair enough. I'll add the min_pasid parameter I will take this as an opportunity to summarize some of the requirements we have for PASID management from the amdgpu driver point of view: 1. We need to be able to allocate PASID between 1 and some maximum. Zero is reserved as far as I know, but we don't necessary need a minimum. 2. We need to be able to allocate PASIDs without a process address space backing it. E.g. our hardware uses PASIDs even without Shared Virtual Addressing enabled to distinct clients from each other. Would be a pity if we need to still have a separate PASID handling because the system wide is only available when IOMMU is turned on. 3. Even after destruction of a process address space we need some grace period before a PASID is reused because it can be that the specific PASID is still in some hardware queues etc... At bare minimum all device drivers using process binding need to explicitly note to the core when they are done with a PASID. 4. It would be nice to have to be able to set a "void *" for each PASID/device combination while binding to a process which then can be queried later on based on the PASID. E.g. when you have a per PASID/device structure around anyway, just add an extra field. 5. It would be nice to have to allocate multiple PASIDs for the same process address space. E.g. some teams at AMD want to use a separate GPU address space for their userspace client library. I'm still trying to avoid that, but it is perfectly possible that we are going to need that. Additional to that it is sometimes quite useful for debugging to isolate where exactly an incorrect access (segfault) is coming from. Let me know if there are some problems with that, especially I want to know if there is pushback on #5 so that I can forward that :) Thanks, Christian. > > Thanks, > Jean
On 07/09/2018 09:55, Christian König wrote: > I will take this as an opportunity to summarize some of the requirements > we have for PASID management from the amdgpu driver point of view: That's incredibly useful, thanks :) > 1. We need to be able to allocate PASID between 1 and some maximum. Zero > is reserved as far as I know, but we don't necessary need a minimum. Should be fine. The PASID range is restricted by the PCI PASID capability, firmware description (for non-PCI devices), the IOMMU capacity, and what the device driver passes to iommu_sva_device_init. Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm not mistaken), so the KFD driver will need to pass min_pasid=1 to make sure that 0 isn't allocated. > 2. We need to be able to allocate PASIDs without a process address space > backing it. E.g. our hardware uses PASIDs even without Shared Virtual > Addressing enabled to distinct clients from each other. > Would be a pity if we need to still have a separate PASID > handling because the system wide is only available when IOMMU is turned on. I'm still not sure about this one. From my point of view we shouldn't add to the IOMMU subsystem helpers for devices without an IOMMU. iommu-sva expects everywhere that the device has an iommu_domain, it's the first thing we check on entry. Bypassing all of this would call idr_alloc() directly, and wouldn't have any code in common with the current iommu-sva. So it seems like you need a layer on top of iommu-sva calling idr_alloc() when an IOMMU isn't present, but I don't think it should be in drivers/iommu/ > 3. Even after destruction of a process address space we need some grace > period before a PASID is reused because it can be that the specific > PASID is still in some hardware queues etc... > At bare minimum all device drivers using process binding need > to explicitly note to the core when they are done with a PASID. Right, much of the horribleness in iommu-sva deals with this: The process dies, iommu-sva is notified and calls the mm_exit() function passed by the device driver to iommu_sva_device_init(). In mm_exit() the device driver needs to clear any reference to the PASID in hardware and in its own structures. When the device driver returns from mm_exit(), it effectively tells the core that it has finished using the PASID, and iommu-sva can reuse the PASID for another process. mm_exit() is allowed to block, so the device driver has time to clean up and flush the queues. If the device driver finishes using the PASID before the process exits, it just calls unbind(). > 4. It would be nice to have to be able to set a "void *" for each > PASID/device combination while binding to a process which then can be > queried later on based on the PASID. > E.g. when you have a per PASID/device structure around anyway, > just add an extra field. iommu_sva_bind_device() takes a "drvdata" pointer that is stored internally for the PASID/device combination (iommu_bond). It is passed to mm_exit(), but I haven't added anything for the device driver to query it back. > 5. It would be nice to have to allocate multiple PASIDs for the same > process address space. > E.g. some teams at AMD want to use a separate GPU address space > for their userspace client library. I'm still trying to avoid that, but > it is perfectly possible that we are going to need that. Two PASIDs pointing to the same process pgd? At first glance it seems feasible, maybe with a flag passed to bind() and a few changes to internal structures. It will duplicate ATC invalidation commands for each process address space change (munmap etc) so you might take a performance hit. Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to what you describe, but I don't plan to support it in this series (the io_mm model is already pretty complicated). I think it can be added without too much effort in a future series, though with a different flag name since we'd like to use "private PASID" for something else (https://www.spinics.net/lists/dri-devel/msg177007.html). Thanks, Jean > Additional to that it is sometimes quite useful for debugging > to isolate where exactly an incorrect access (segfault) is coming from. > > Let me know if there are some problems with that, especially I want to > know if there is pushback on #5 so that I can forward that :) > > Thanks, > Christian. > >> >> Thanks, >> Jean >
Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker: > On 07/09/2018 09:55, Christian König wrote: >> I will take this as an opportunity to summarize some of the requirements >> we have for PASID management from the amdgpu driver point of view: > That's incredibly useful, thanks :) > >> 1. We need to be able to allocate PASID between 1 and some maximum. Zero >> is reserved as far as I know, but we don't necessary need a minimum. > Should be fine. The PASID range is restricted by the PCI PASID > capability, firmware description (for non-PCI devices), the IOMMU > capacity, and what the device driver passes to iommu_sva_device_init. > Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm > not mistaken), so the KFD driver will need to pass min_pasid=1 to make > sure that 0 isn't allocated. > >> 2. We need to be able to allocate PASIDs without a process address space >> backing it. E.g. our hardware uses PASIDs even without Shared Virtual >> Addressing enabled to distinct clients from each other. >> Would be a pity if we need to still have a separate PASID >> handling because the system wide is only available when IOMMU is turned on. > I'm still not sure about this one. From my point of view we shouldn't > add to the IOMMU subsystem helpers for devices without an IOMMU. I agree on that. > iommu-sva expects everywhere that the device has an iommu_domain, it's > the first thing we check on entry. Bypassing all of this would call > idr_alloc() directly, and wouldn't have any code in common with the > current iommu-sva. So it seems like you need a layer on top of iommu-sva > calling idr_alloc() when an IOMMU isn't present, but I don't think it > should be in drivers/iommu/ In this case I question if the PASID handling should be under drivers/iommu at all. See I can have a mix of VM context which are bound to processes (some few) and VM contexts which are standalone and doesn't care for a process address space. But for each VM context I need a distinct PASID for the hardware to work. I can live if we say if IOMMU is completely disabled we use a simple ida to allocate them, but when IOMMU is enabled I certainly need a way to reserve a PASID without an associated process. >> 3. Even after destruction of a process address space we need some grace >> period before a PASID is reused because it can be that the specific >> PASID is still in some hardware queues etc... >> At bare minimum all device drivers using process binding need >> to explicitly note to the core when they are done with a PASID. > Right, much of the horribleness in iommu-sva deals with this: > > The process dies, iommu-sva is notified and calls the mm_exit() function > passed by the device driver to iommu_sva_device_init(). In mm_exit() the > device driver needs to clear any reference to the PASID in hardware and > in its own structures. When the device driver returns from mm_exit(), it > effectively tells the core that it has finished using the PASID, and > iommu-sva can reuse the PASID for another process. mm_exit() is allowed > to block, so the device driver has time to clean up and flush the queues. > > If the device driver finishes using the PASID before the process exits, > it just calls unbind(). Exactly that's what Michal Hocko is probably going to not like at all. Can we have a different approach where each driver is informed by the mm_exit(), but needs to explicitly call unbind() before a PASID is reused? During that teardown transition it would be ideal if that PASID only points to a dummy root page directory with only invalid entries. > >> 4. It would be nice to have to be able to set a "void *" for each >> PASID/device combination while binding to a process which then can be >> queried later on based on the PASID. >> E.g. when you have a per PASID/device structure around anyway, >> just add an extra field. > iommu_sva_bind_device() takes a "drvdata" pointer that is stored > internally for the PASID/device combination (iommu_bond). It is passed > to mm_exit(), but I haven't added anything for the device driver to > query it back. Nice! Looks like all we need additionally is a function to retrieve that based on the PASID. >> 5. It would be nice to have to allocate multiple PASIDs for the same >> process address space. >> E.g. some teams at AMD want to use a separate GPU address space >> for their userspace client library. I'm still trying to avoid that, but >> it is perfectly possible that we are going to need that. > Two PASIDs pointing to the same process pgd? At first glance it seems > feasible, maybe with a flag passed to bind() and a few changes to > internal structures. It will duplicate ATC invalidation commands for > each process address space change (munmap etc) so you might take a > performance hit. > > Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to > what you describe, but I don't plan to support it in this series (the > io_mm model is already pretty complicated). I think it can be added > without too much effort in a future series, though with a different flag > name since we'd like to use "private PASID" for something else > (https://www.spinics.net/lists/dri-devel/msg177007.html). To be honest I hoped that you would say: No never! So that I have a good argument to pushback on such requirements :) But if it's doable it would be at least nice to have for debugging. Thanks a lot for working on that, Christian. > > Thanks, > Jean > >> Additional to that it is sometimes quite useful for debugging >> to isolate where exactly an incorrect access (segfault) is coming from. >> >> Let me know if there are some problems with that, especially I want to >> know if there is pushback on #5 so that I can forward that :) >> >> Thanks, >> Christian. >> >>> Thanks, >>> Jean
On Fri, 7 Sep 2018 20:02:54 +0200 Christian König <christian.koenig@amd.com> wrote: > Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker: > > On 07/09/2018 09:55, Christian König wrote: > >> I will take this as an opportunity to summarize some of the > >> requirements we have for PASID management from the amdgpu driver > >> point of view: > > That's incredibly useful, thanks :) > > > >> 1. We need to be able to allocate PASID between 1 and some > >> maximum. Zero is reserved as far as I know, but we don't necessary > >> need a minimum. > [...] > >> 2. We need to be able to allocate PASIDs without a process address > >> space backing it. E.g. our hardware uses PASIDs even without > >> Shared Virtual Addressing enabled to distinct clients from each > >> other. Would be a pity if we need to still have a separate PASID > >> handling because the system wide is only available when IOMMU is > >> turned on. > [...] > > I agree on that. > > > iommu-sva expects everywhere that the device has an iommu_domain, > > it's the first thing we check on entry. Bypassing all of this would > > call idr_alloc() directly, and wouldn't have any code in common > > with the current iommu-sva. So it seems like you need a layer on > > top of iommu-sva calling idr_alloc() when an IOMMU isn't present, > > but I don't think it should be in drivers/iommu/ > > In this case I question if the PASID handling should be under > drivers/iommu at all. > > See I can have a mix of VM context which are bound to processes (some > few) and VM contexts which are standalone and doesn't care for a > process address space. But for each VM context I need a distinct > PASID for the hardware to work. > > I can live if we say if IOMMU is completely disabled we use a simple > ida to allocate them, but when IOMMU is enabled I certainly need a > way to reserve a PASID without an associated process. > VT-d would also have such requirement. There is a virtual command register for allocate and free PASID for VM use. When that PASID allocation request gets propagated to the host IOMMU driver, we need to allocate PASID w/o mm. If the PASID allocation is done via VFIO, can we have FD to track PASID life cycle instead of mm_exit()? i.e. all FDs get closed before mm_exit, I assume? > >> 3. Even after destruction of a process address space we need some > >> grace period before a PASID is reused because it can be that the > >> specific PASID is still in some hardware queues etc... > >> At bare minimum all device drivers using process binding > >> need to explicitly note to the core when they are done with a > >> PASID. > > Right, much of the horribleness in iommu-sva deals with this: > > > > The process dies, iommu-sva is notified and calls the mm_exit() > > function passed by the device driver to iommu_sva_device_init(). In > > mm_exit() the device driver needs to clear any reference to the > > PASID in hardware and in its own structures. When the device driver > > returns from mm_exit(), it effectively tells the core that it has > > finished using the PASID, and iommu-sva can reuse the PASID for > > another process. mm_exit() is allowed to block, so the device > > driver has time to clean up and flush the queues. > > > > If the device driver finishes using the PASID before the process > > exits, it just calls unbind(). > > Exactly that's what Michal Hocko is probably going to not like at all. > > Can we have a different approach where each driver is informed by the > mm_exit(), but needs to explicitly call unbind() before a PASID is > reused? > > During that teardown transition it would be ideal if that PASID only > points to a dummy root page directory with only invalid entries. > I guess this can be vendor specific, In VT-d I plan to mark PASID entry not present and disable fault reporting while draining remaining activities. > > > >> 4. It would be nice to have to be able to set a "void *" for each > >> PASID/device combination while binding to a process which then can > >> be queried later on based on the PASID. > >> E.g. when you have a per PASID/device structure around > >> anyway, just add an extra field. > > iommu_sva_bind_device() takes a "drvdata" pointer that is stored > > internally for the PASID/device combination (iommu_bond). It is > > passed to mm_exit(), but I haven't added anything for the device > > driver to query it back. > > Nice! Looks like all we need additionally is a function to retrieve > that based on the PASID. > > >> 5. It would be nice to have to allocate multiple PASIDs for the > >> same process address space. > >> E.g. some teams at AMD want to use a separate GPU > >> address space for their userspace client library. I'm still trying > >> to avoid that, but it is perfectly possible that we are going to > >> need that. > > Two PASIDs pointing to the same process pgd? At first glance it > > seems feasible, maybe with a flag passed to bind() and a few > > changes to internal structures. It will duplicate ATC invalidation > > commands for each process address space change (munmap etc) so you > > might take a performance hit. > > > > Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar > > to what you describe, but I don't plan to support it in this series > > (the io_mm model is already pretty complicated). I think it can be > > added without too much effort in a future series, though with a > > different flag name since we'd like to use "private PASID" for > > something else > > (https://www.spinics.net/lists/dri-devel/msg177007.html). > > To be honest I hoped that you would say: No never! So that I have a > good argument to pushback on such requirements :) > > But if it's doable it would be at least nice to have for debugging. > > Thanks a lot for working on that, > Christian. > > > > > Thanks, > > Jean > > > >> Additional to that it is sometimes quite useful for > >> debugging to isolate where exactly an incorrect access (segfault) > >> is coming from. > >> > >> Let me know if there are some problems with that, especially I > >> want to know if there is pushback on #5 so that I can forward > >> that :) > >> > >> Thanks, > >> Christian. > >> > >>> Thanks, > >>> Jean > [Jacob Pan]
Am 07.09.2018 um 23:25 schrieb Jacob Pan: > On Fri, 7 Sep 2018 20:02:54 +0200 > Christian König <christian.koenig@amd.com> wrote: >> [SNIP] >>> iommu-sva expects everywhere that the device has an iommu_domain, >>> it's the first thing we check on entry. Bypassing all of this would >>> call idr_alloc() directly, and wouldn't have any code in common >>> with the current iommu-sva. So it seems like you need a layer on >>> top of iommu-sva calling idr_alloc() when an IOMMU isn't present, >>> but I don't think it should be in drivers/iommu/ >> In this case I question if the PASID handling should be under >> drivers/iommu at all. >> >> See I can have a mix of VM context which are bound to processes (some >> few) and VM contexts which are standalone and doesn't care for a >> process address space. But for each VM context I need a distinct >> PASID for the hardware to work. >> >> I can live if we say if IOMMU is completely disabled we use a simple >> ida to allocate them, but when IOMMU is enabled I certainly need a >> way to reserve a PASID without an associated process. >> > VT-d would also have such requirement. There is a virtual command > register for allocate and free PASID for VM use. When that PASID > allocation request gets propagated to the host IOMMU driver, we need to > allocate PASID w/o mm. > > If the PASID allocation is done via VFIO, can we have FD to track PASID > life cycle instead of mm_exit()? i.e. all FDs get closed before > mm_exit, I assume? Yes, exactly. I just need a PASID which is never used by the OS for a process and we can easily give that back when the last FD reference is closed. >>>> 3. Even after destruction of a process address space we need some >>>> grace period before a PASID is reused because it can be that the >>>> specific PASID is still in some hardware queues etc... >>>> At bare minimum all device drivers using process binding >>>> need to explicitly note to the core when they are done with a >>>> PASID. >>> Right, much of the horribleness in iommu-sva deals with this: >>> >>> The process dies, iommu-sva is notified and calls the mm_exit() >>> function passed by the device driver to iommu_sva_device_init(). In >>> mm_exit() the device driver needs to clear any reference to the >>> PASID in hardware and in its own structures. When the device driver >>> returns from mm_exit(), it effectively tells the core that it has >>> finished using the PASID, and iommu-sva can reuse the PASID for >>> another process. mm_exit() is allowed to block, so the device >>> driver has time to clean up and flush the queues. >>> >>> If the device driver finishes using the PASID before the process >>> exits, it just calls unbind(). >> Exactly that's what Michal Hocko is probably going to not like at all. >> >> Can we have a different approach where each driver is informed by the >> mm_exit(), but needs to explicitly call unbind() before a PASID is >> reused? >> >> During that teardown transition it would be ideal if that PASID only >> points to a dummy root page directory with only invalid entries. >> > I guess this can be vendor specific, In VT-d I plan to mark PASID > entry not present and disable fault reporting while draining remaining > activities. Sounds good to me. Point is at least in the case where the process was killed by the OOM killer we should not block in mm_exit(). Instead operations issued by the process to a device driver which uses SVA needs to be terminated as soon as possible to make sure that the OOM killer can advance. Thanks, Christian.
On 08/09/2018 08:29, Christian König wrote: > Yes, exactly. I just need a PASID which is never used by the OS for a > process and we can easily give that back when the last FD reference is > closed. Alright, iommu-sva can get its PASID from this external allocator as well, as long as it has an interface similar to idr. Where would it go, drivers/base/, mm/, kernel/...? >>>> The process dies, iommu-sva is notified and calls the mm_exit() >>>> function passed by the device driver to iommu_sva_device_init(). In >>>> mm_exit() the device driver needs to clear any reference to the >>>> PASID in hardware and in its own structures. When the device driver >>>> returns from mm_exit(), it effectively tells the core that it has >>>> finished using the PASID, and iommu-sva can reuse the PASID for >>>> another process. mm_exit() is allowed to block, so the device >>>> driver has time to clean up and flush the queues. >>>> >>>> If the device driver finishes using the PASID before the process >>>> exits, it just calls unbind(). >>> Exactly that's what Michal Hocko is probably going to not like at all. >>> >>> Can we have a different approach where each driver is informed by the >>> mm_exit(), but needs to explicitly call unbind() before a PASID is >>> reused? It's awful from the IOMMU driver perspective. In addition to "enabled" and "disabled" PASID states, you add "disabled but DMA still running normally". Between that new state and "disabled", the IOMMU will be flooded by translation faults (non-recoverable ones), which it needs to ignore instead of reporting to the kernel. Not all IOMMUs can deal with this in hardware (SMMU and VT-d can quiesce translation faults per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to filter fault events themselves, depending on the PASID state. >>> During that teardown transition it would be ideal if that PASID only >>> points to a dummy root page directory with only invalid entries. >>> >> I guess this can be vendor specific, In VT-d I plan to mark PASID >> entry not present and disable fault reporting while draining remaining >> activities. > > Sounds good to me. > > Point is at least in the case where the process was killed by the OOM > killer we should not block in mm_exit(). > > Instead operations issued by the process to a device driver which uses > SVA needs to be terminated as soon as possible to make sure that the OOM > killer can advance. I don't see how we're preventing the OOM killer from advancing, so I'm looking for a stronger argument that justifies adding this complexity to IOMMU drivers. Time limit of the release MMU notifier, locking requirement, a concrete example where things break, even a comment somewhere in mm/ would do... In my tests I can't manage to disturb the OOM killer, but I could be missing some special case. Even if the mm_exit() callback (unrealistically) sleeps for 60 seconds, the OOM killer isn't affected because oom_reap_task_mm() wipes the victim's address space in another thread, either before or while the release notifier is running. Thanks, Jean
Am 12.09.2018 um 14:40 schrieb Jean-Philippe Brucker: > On 08/09/2018 08:29, Christian König wrote: >> Yes, exactly. I just need a PASID which is never used by the OS for a >> process and we can easily give that back when the last FD reference is >> closed. > Alright, iommu-sva can get its PASID from this external allocator as > well, as long as it has an interface similar to idr. Where would it go, > drivers/base/, mm/, kernel/...? Good question, my initial instinct was to put it under drivers/pci. But AFAIKS now you are supporting SVA implementations which are not based on PCI. So drivers/base sounds like a good place to me. > >>>>> The process dies, iommu-sva is notified and calls the mm_exit() >>>>> function passed by the device driver to iommu_sva_device_init(). In >>>>> mm_exit() the device driver needs to clear any reference to the >>>>> PASID in hardware and in its own structures. When the device driver >>>>> returns from mm_exit(), it effectively tells the core that it has >>>>> finished using the PASID, and iommu-sva can reuse the PASID for >>>>> another process. mm_exit() is allowed to block, so the device >>>>> driver has time to clean up and flush the queues. >>>>> >>>>> If the device driver finishes using the PASID before the process >>>>> exits, it just calls unbind(). >>>> Exactly that's what Michal Hocko is probably going to not like at all. >>>> >>>> Can we have a different approach where each driver is informed by the >>>> mm_exit(), but needs to explicitly call unbind() before a PASID is >>>> reused? > It's awful from the IOMMU driver perspective. In addition to "enabled" > and "disabled" PASID states, you add "disabled but DMA still running > normally". Between that new state and "disabled", the IOMMU will be > flooded by translation faults (non-recoverable ones), which it needs to > ignore instead of reporting to the kernel. Not all IOMMUs can deal with > this in hardware (SMMU and VT-d can quiesce translation faults > per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to > filter fault events themselves, depending on the PASID state. Puh, yeah that is probably true. Ok let us skip that for a moment, we just need to invest more work in killing DMA operations quickly when the process address space is teared down. >>>> During that teardown transition it would be ideal if that PASID only >>>> points to a dummy root page directory with only invalid entries. >>>> >>> I guess this can be vendor specific, In VT-d I plan to mark PASID >>> entry not present and disable fault reporting while draining remaining >>> activities. >> Sounds good to me. >> >> Point is at least in the case where the process was killed by the OOM >> killer we should not block in mm_exit(). >> >> Instead operations issued by the process to a device driver which uses >> SVA needs to be terminated as soon as possible to make sure that the OOM >> killer can advance. > I don't see how we're preventing the OOM killer from advancing, so I'm > looking for a stronger argument that justifies adding this complexity to > IOMMU drivers. Time limit of the release MMU notifier, locking > requirement, a concrete example where things break, even a comment > somewhere in mm/ would do... > > In my tests I can't manage to disturb the OOM killer, but I could be > missing some special case. Even if the mm_exit() callback > (unrealistically) sleeps for 60 seconds, Well you are *COMPLETELY* under estimating this. A compute task with a huge wave launch can take multiple minutes to tear down. That's why I'm so concerned about that, but to be honest I think that just the hardware needs to become better and we need to be able to block dead tasks from spawning threads again. Regards, Christian. > the OOM killer isn't affected > because oom_reap_task_mm() wipes the victim's address space in another > thread, either before or while the release notifier is running. > > Thanks, > Jean
> From: Jacob Pan [mailto:jacob.jun.pan@linux.intel.com] > Sent: Saturday, September 8, 2018 5:25 AM > > > iommu-sva expects everywhere that the device has an iommu_domain, > > > it's the first thing we check on entry. Bypassing all of this would > > > call idr_alloc() directly, and wouldn't have any code in common > > > with the current iommu-sva. So it seems like you need a layer on > > > top of iommu-sva calling idr_alloc() when an IOMMU isn't present, > > > but I don't think it should be in drivers/iommu/ > > > > In this case I question if the PASID handling should be under > > drivers/iommu at all. > > > > See I can have a mix of VM context which are bound to processes (some > > few) and VM contexts which are standalone and doesn't care for a > > process address space. But for each VM context I need a distinct > > PASID for the hardware to work. I'm confused about VM context vs. process. Is VM referring to Virtual Machine or something else? If yes, I don't understand the binding part - what VM context is bound to (host?) process? > > > > I can live if we say if IOMMU is completely disabled we use a simple > > ida to allocate them, but when IOMMU is enabled I certainly need a > > way to reserve a PASID without an associated process. > > > VT-d would also have such requirement. There is a virtual command > register for allocate and free PASID for VM use. When that PASID > allocation request gets propagated to the host IOMMU driver, we need to > allocate PASID w/o mm. > VT-d is a bit different. In host side, PASID allocation always happens in Qemu's context, so those PASIDs are recorded with Qemu process, though the entries may point to guest page tables instead of host mm of Qemu. Thanks Kevin
> From: Christian König > Sent: Friday, September 7, 2018 4:56 PM > > 5. It would be nice to have to allocate multiple PASIDs for the same > process address space. > E.g. some teams at AMD want to use a separate GPU address space > for their userspace client library. I'm still trying to avoid that, but > it is perfectly possible that we are going to need that. > Additional to that it is sometimes quite useful for debugging > to isolate where exactly an incorrect access (segfault) is coming from. > > Let me know if there are some problems with that, especially I want to > know if there is pushback on #5 so that I can forward that :) > We have similar requirement, except that it is "multiple PASIDs for same process" instead of "for same process address space". Intel VT-d goes to a 'true' system-wide PASID allocation policy, cross both host processes and guest processes. As Jacob explains, there will be a virtual cmd register on virtual vtd, through which guest IOMMU driver requests to get system-wide PASIDs allocated by host IOMMU driver. with that design, Qemu represents all guest processes in host side, thus will get "multiple PASIDs allocated for same process". However instead of binding all PASIDs to same host address space of Qemu, each of PASID entry points to guest address space if used by guest process. Thanks Kevin
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 7564237f788d..cca8e06903c7 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -74,6 +74,10 @@ config IOMMU_DMA select IOMMU_IOVA select NEED_SG_DMA_LENGTH +config IOMMU_SVA + bool + select IOMMU_API + config FSL_PAMU bool "Freescale IOMMU support" depends on PCI diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 1fb695854809..1dbcc89ebe4c 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c new file mode 100644 index 000000000000..8b4afb7c63ae --- /dev/null +++ b/drivers/iommu/iommu-sva.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Manage PASIDs and bind process address spaces to devices. + * + * Copyright (C) 2018 ARM Ltd. + */ + +#include <linux/iommu.h> +#include <linux/slab.h> + +/** + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device + * @dev: the device + * @features: bitmask of features that need to be initialized + * @max_pasid: max PASID value supported by the device + * + * Users of the bind()/unbind() API must call this function to initialize all + * features required for SVA. + * + * The device must support multiple address spaces (e.g. PCI PASID). By default + * the PASID allocated during bind() is limited by the IOMMU capacity, and by + * the device PASID width defined in the PCI capability or in the firmware + * description. Setting @max_pasid to a non-zero value smaller than this limit + * overrides it. + * + * The device should not be performing any DMA while this function is running, + * otherwise the behavior is undefined. + * + * Return 0 if initialization succeeded, or an error. + */ +int iommu_sva_device_init(struct device *dev, unsigned long features, + unsigned int max_pasid) +{ + int ret; + struct iommu_sva_param *param; + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + + if (!domain || !domain->ops->sva_device_init) + return -ENODEV; + + if (features) + return -EINVAL; + + param = kzalloc(sizeof(*param), GFP_KERNEL); + if (!param) + return -ENOMEM; + + param->features = features; + param->max_pasid = max_pasid; + + /* + * IOMMU driver updates the limits depending on the IOMMU and device + * capabilities. + */ + ret = domain->ops->sva_device_init(dev, param); + if (ret) + goto err_free_param; + + mutex_lock(&dev->iommu_param->lock); + if (dev->iommu_param->sva_param) + ret = -EEXIST; + else + dev->iommu_param->sva_param = param; + mutex_unlock(&dev->iommu_param->lock); + if (ret) + goto err_device_shutdown; + + return 0; + +err_device_shutdown: + if (domain->ops->sva_device_shutdown) + domain->ops->sva_device_shutdown(dev, param); + +err_free_param: + kfree(param); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_sva_device_init); + +/** + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device + * @dev: the device + * + * Disable SVA. Device driver should ensure that the device isn't performing any + * DMA while this function is running. + */ +int iommu_sva_device_shutdown(struct device *dev) +{ + struct iommu_sva_param *param; + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + + if (!domain) + return -ENODEV; + + mutex_lock(&dev->iommu_param->lock); + param = dev->iommu_param->sva_param; + dev->iommu_param->sva_param = NULL; + mutex_unlock(&dev->iommu_param->lock); + if (!param) + return -ENODEV; + + if (domain->ops->sva_device_shutdown) + domain->ops->sva_device_shutdown(dev, param); + + kfree(param); + + return 0; +} +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0933f726d2e6..2efe7738bedb 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -212,6 +212,12 @@ struct page_response_msg { u64 private_data; }; +struct iommu_sva_param { + unsigned long features; + unsigned int min_pasid; + unsigned int max_pasid; +}; + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability @@ -219,6 +225,8 @@ struct page_response_msg { * @domain_free: free iommu domain * @attach_dev: attach device to an iommu domain * @detach_dev: detach device from an iommu domain + * @sva_device_init: initialize Shared Virtual Adressing for a device + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain * @map_sg: map a scatter-gather list of physically contiguous memory chunks @@ -256,6 +264,10 @@ struct iommu_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); void (*detach_dev)(struct iommu_domain *domain, struct device *dev); + int (*sva_device_init)(struct device *dev, + struct iommu_sva_param *param); + void (*sva_device_shutdown)(struct device *dev, + struct iommu_sva_param *param); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, @@ -413,6 +425,7 @@ struct iommu_fault_param { * struct iommu_param - collection of per-device IOMMU data * * @fault_param: IOMMU detected device fault reporting data + * @sva_param: SVA parameters * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -421,6 +434,7 @@ struct iommu_fault_param { struct iommu_param { struct mutex lock; struct iommu_fault_param *fault_param; + struct iommu_sva_param *sva_param; }; int iommu_device_register(struct iommu_device *iommu); @@ -920,4 +934,22 @@ static inline int iommu_sva_invalidate(struct iommu_domain *domain, #endif /* CONFIG_IOMMU_API */ +#ifdef CONFIG_IOMMU_SVA +extern int iommu_sva_device_init(struct device *dev, unsigned long features, + unsigned int max_pasid); +extern int iommu_sva_device_shutdown(struct device *dev); +#else /* CONFIG_IOMMU_SVA */ +static inline int iommu_sva_device_init(struct device *dev, + unsigned long features, + unsigned int max_pasid) +{ + return -ENODEV; +} + +static inline int iommu_sva_device_shutdown(struct device *dev) +{ + return -ENODEV; +} +#endif /* CONFIG_IOMMU_SVA */ + #endif /* __LINUX_IOMMU_H */
Shared Virtual Addressing (SVA) provides a way for device drivers to bind process address spaces to devices. This requires the IOMMU to support page table format and features compatible with the CPUs, and usually requires the system to support I/O Page Faults (IOPF) and Process Address Space ID (PASID). When all of these are available, DMA can access virtual addresses of a process. A PASID is allocated for each process, and the device driver programs it into the device in an implementation-specific way. Add a new API for sharing process page tables with devices. Introduce two IOMMU operations, sva_device_init() and sva_device_shutdown(), that prepare the IOMMU driver for SVA. For example allocate PASID tables and fault queues. Subsequent patches will implement the bind() and unbind() operations. Support for I/O Page Faults will be added in a later patch using a new feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin down all shared mappings. Other feature bits that may be added in the future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address space to a process. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> --- v1->v2: * Add sva_param structure to iommu_param * CONFIG option is only selectable by IOMMU drivers --- drivers/iommu/Kconfig | 4 ++ drivers/iommu/Makefile | 1 + drivers/iommu/iommu-sva.c | 110 ++++++++++++++++++++++++++++++++++++++ include/linux/iommu.h | 32 +++++++++++ 4 files changed, 147 insertions(+) create mode 100644 drivers/iommu/iommu-sva.c