Message ID | 19e20e54d41a0c1ab7403264e1016c4b19293135.1730313494.git.nicolinc@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE) | expand |
On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > +{ > + struct iommufd_vdevice *vdev = > + container_of(obj, struct iommufd_vdevice, obj); > + struct iommufd_viommu *viommu = vdev->viommu; > + > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); There are crazy races that would cause this not to work. Another thread could have successfully destroyed whatever caused EEXIST and the successfully registered this same vdev to the same id. Then this will wrongly erase the other threads entry. It would be better to skip the erase directly if the EEXIST unwind is being taken. Jason
On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > +{ > > + struct iommufd_vdevice *vdev = > > + container_of(obj, struct iommufd_vdevice, obj); > > + struct iommufd_viommu *viommu = vdev->viommu; > > + > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > There are crazy races that would cause this not to work. Another > thread could have successfully destroyed whatever caused EEXIST and > the successfully registered this same vdev to the same id. Then this > will wrongly erase the other threads entry. > > It would be better to skip the erase directly if the EEXIST unwind is > being taken. Hmm, is the "another thread" an alloc() or a destroy()? It doesn't seem to me that there could be another destroy() on the same object since this current destroy() is the abort to an unfinalized object. And it doesn't seem that another alloc() will get the same vdev ptr since every vdev allocation in the alloc() will be different? That being said, I think we could play safer with the followings: ------------------------------------------------------------------- @@ -88,7 +88,6 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu = vdev->viommu; - /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); put_device(vdev->dev); @@ -128,18 +127,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_idev; } + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + iommufd_object_abort(ucmd->ictx, &vdev->obj); + rc = xa_err(curr) ?: -EEXIST; + goto out_put_idev; + } + vdev->id = virt_id; vdev->dev = idev->dev; get_device(idev->dev); vdev->viommu = viommu; refcount_inc(&viommu->obj.users); - curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); - if (curr) { - rc = xa_err(curr) ?: -EEXIST; - goto out_abort; - } - cmd->out_vdevice_id = vdev->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) ------------------------------------------------------------------- Thanks Nicolin
On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote: > On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > > +{ > > > + struct iommufd_vdevice *vdev = > > > + container_of(obj, struct iommufd_vdevice, obj); > > > + struct iommufd_viommu *viommu = vdev->viommu; > > > + > > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > > > There are crazy races that would cause this not to work. Another > > thread could have successfully destroyed whatever caused EEXIST and > > the successfully registered this same vdev to the same id. Then this > > will wrongly erase the other threads entry. > > > > It would be better to skip the erase directly if the EEXIST unwind is > > being taken. > > Hmm, is the "another thread" an alloc() or a destroy()? I was thinking both > It doesn't seem to me that there could be another destroy() on the > same object since this current destroy() is the abort to an > unfinalized object. And it doesn't seem that another alloc() will > get the same vdev ptr since every vdev allocation in the alloc() > will be different? Ah so you are saying that since the vdev 'old' is local to this thread it can't possibly by aliased by another? I was worried the id could be aliased, but yes, that seems right that the vdev cmpxchg would reject that. So lets leave it Jason
On Thu, Oct 31, 2024 at 02:04:46PM -0300, Jason Gunthorpe wrote: > On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote: > > On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > > > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > > > +{ > > > > + struct iommufd_vdevice *vdev = > > > > + container_of(obj, struct iommufd_vdevice, obj); > > > > + struct iommufd_viommu *viommu = vdev->viommu; > > > > + > > > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > > > > > There are crazy races that would cause this not to work. Another > > > thread could have successfully destroyed whatever caused EEXIST and > > > the successfully registered this same vdev to the same id. Then this > > > will wrongly erase the other threads entry. > > > > > > It would be better to skip the erase directly if the EEXIST unwind is > > > being taken. > > > > Hmm, is the "another thread" an alloc() or a destroy()? > > I was thinking both > > > It doesn't seem to me that there could be another destroy() on the > > same object since this current destroy() is the abort to an > > unfinalized object. And it doesn't seem that another alloc() will > > get the same vdev ptr since every vdev allocation in the alloc() > > will be different? > > Ah so you are saying that since the vdev 'old' is local to this thread > it can't possibly by aliased by another? > > I was worried the id could be aliased, but yes, that seems right that > the vdev cmpxchg would reject that. > > So lets leave it Ack. I'll still update this since xa_cmpxchg can give other errno: + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ - /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ Thanks Nicolin
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index e8f5ef550cc9..062656c19a07 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -507,8 +507,26 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); } +static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vdevice_destroy(struct iommufd_object *obj); + +struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_viommu *viommu; + struct device *dev; + u64 id; /* per-vIOMMU virtual ID */ +}; #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index f03c75410938..ee58c5c573ec 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -10,6 +10,7 @@ #include <linux/errno.h> #include <linux/refcount.h> #include <linux/types.h> +#include <linux/xarray.h> struct device; struct file; @@ -31,6 +32,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -89,6 +91,8 @@ struct iommufd_viommu { const struct iommufd_viommu_ops *ops; + struct xarray vdevs; + unsigned int type; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index a498d4838f9a..9b5236004b8e 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -53,6 +53,7 @@ enum { IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f, IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, + IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, }; /** @@ -864,4 +865,25 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) + * @size: sizeof(struct iommu_vdevice_alloc) + * @viommu_id: vIOMMU ID to associate with the virtual device + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table + * + * Allocate a virtual device instance (for a physical device) against a vIOMMU. + * This instance holds the device's information (related to its vIOMMU) in a VM. + */ +struct iommu_vdevice_alloc { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 out_vdevice_id; + __aligned_u64 virt_id; +}; +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index cc514f9bc3e6..d735fe04197f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -308,6 +308,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_vdevice_alloc vdev; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -363,6 +364,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, virt_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -501,6 +504,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, + [IOMMUFD_OBJ_VDEVICE] = { + .destroy = iommufd_vdevice_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 888239b78667..c82b4a07a4b1 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -11,6 +11,7 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) if (viommu->ops && viommu->ops->destroy) viommu->ops->destroy(viommu); refcount_dec(&viommu->hwpt->common.obj.users); + xa_destroy(&viommu->vdevs); } int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -53,6 +54,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_hwpt; } + xa_init(&viommu->vdevs); viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; @@ -79,3 +81,77 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + refcount_dec(&viommu->obj.users); + put_device(vdev->dev); +} + +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vdevice_alloc *cmd = ucmd->cmd; + struct iommufd_vdevice *vdev, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 virt_id = cmd->virt_id; + int rc = 0; + + /* virt_id indexes an xarray */ + if (virt_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_put_idev; + } + + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); + if (IS_ERR(vdev)) { + rc = PTR_ERR(vdev); + goto out_put_idev; + } + + vdev->id = virt_id; + vdev->dev = idev->dev; + get_device(idev->dev); + vdev->viommu = viommu; + refcount_inc(&viommu->obj.users); + + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ?: -EEXIST; + goto out_abort; + } + + cmd->out_vdevice_id = vdev->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vdev->obj); + goto out_put_idev; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +}
Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct device) against a vIOMMU (struct iommufd_viommu) object in a VM. This vDEVICE object (and its structure) holds all the infos and attributes in the VM, regarding the device related to the vIOMMU. As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure would hold some vData for Confidential Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray that belongs to a vIOMMU object. Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection of a device object and an iommufd_viommu object, take two refcounts in the ioctl handler. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/iommufd_private.h | 18 ++++++ include/linux/iommufd.h | 4 ++ include/uapi/linux/iommufd.h | 22 +++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 76 +++++++++++++++++++++++++ 5 files changed, 126 insertions(+)