diff mbox series

[06/10] vfio-iommufd: Allow iommufd to be used in place of a container fd

Message ID 6-v1-4991695894d8+211-vfio_iommufd_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Connect VFIO to IOMMUFD | expand

Commit Message

Jason Gunthorpe Oct. 25, 2022, 6:50 p.m. UTC
This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an
iommufd.

In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
not be attached to any groups.

From a VFIO perspective this means that the VFIO_GROUP_GET_STATUS and
VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for
IOMMFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that
VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to
viability.

As GET_DEVICE_FD can fail for many reasons already this is not expected to
be a meaningful difference.

Reorganize the tests for if the group has an assigned container or iommu
into a vfio_group_has_iommu() function and consolidate all the duplicated
WARN_ON's etc related to this.

Call container functions only if a container is actually present on the
group.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/Kconfig     |  1 +
 drivers/vfio/container.c |  7 ++--
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 76 ++++++++++++++++++++++++++++++++--------
 4 files changed, 69 insertions(+), 17 deletions(-)

Comments

Tian, Kevin Nov. 1, 2022, 8:09 a.m. UTC | #1
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, October 26, 2022 2:51 AM
>
>  menuconfig VFIO
>  	tristate "VFIO Non-Privileged userspace driver framework"
>  	select IOMMU_API
> +	depends on IOMMUFD || !IOMMUFD

Out of curiosity. What is the meaning of this dependency claim?

> @@ -717,12 +735,23 @@ static int vfio_group_ioctl_set_container(struct
> vfio_group *group,
>  	}
> 
>  	container = vfio_container_from_file(f.file);
> -	ret = -EINVAL;

this changes the errno from -EINVAL to -EBADF for the original container
path. Is it desired?

>  	if (container) {
>  		ret = vfio_container_attach_group(container, group);
>  		goto out_unlock;
>  	}
> 
> +	iommufd = iommufd_ctx_from_file(f.file);
> +	if (!IS_ERR(iommufd)) {

The only errno which iommufd_ctx_from_file() may return is -EBADFD
which duplicates with -EBADF assignment in following line.

What about having it return NULL pointer similar as the container
helper does?

> +		u32 ioas_id;
> +
> +		group->iommufd = iommufd;
> +		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);

exchange the order of above two lines and only assign group->iommufd
when the compat call succeeds.

> @@ -900,7 +940,7 @@ static int vfio_group_ioctl_get_status(struct
> vfio_group *group,
>  		return -ENODEV;
>  	}
> 
> -	if (group->container)
> +	if (group->container || group->iommufd)
>  		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
>  				VFIO_GROUP_FLAGS_VIABLE;

Copy some explanation from commit msg to explain the subtle difference
between container and iommufd.
Nicolin Chen Nov. 1, 2022, 9:19 a.m. UTC | #2
On Tue, Nov 01, 2022 at 08:09:52AM +0000, Tian, Kevin wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, October 26, 2022 2:51 AM
> >
> >  menuconfig VFIO
> >       tristate "VFIO Non-Privileged userspace driver framework"
> >       select IOMMU_API
> > +     depends on IOMMUFD || !IOMMUFD
> 
> Out of curiosity. What is the meaning of this dependency claim?

"is it a module or not" -- from https://lwn.net/Articles/683476/
Jason Gunthorpe Nov. 1, 2022, 11:51 a.m. UTC | #3
On Tue, Nov 01, 2022 at 02:19:04AM -0700, Nicolin Chen wrote:
> On Tue, Nov 01, 2022 at 08:09:52AM +0000, Tian, Kevin wrote:
> 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, October 26, 2022 2:51 AM
> > >
> > >  menuconfig VFIO
> > >       tristate "VFIO Non-Privileged userspace driver framework"
> > >       select IOMMU_API
> > > +     depends on IOMMUFD || !IOMMUFD
> > 
> > Out of curiosity. What is the meaning of this dependency claim?
> 
> "is it a module or not" -- from https://lwn.net/Articles/683476/

Yes, it is the kconfig pattern for "This symbol optionally uses the
other symbol, and if the other symbol is turned on then it has to be
the right y/m value"

ie rejects vfio being built-in but iommufd being a module

Jason
Jason Gunthorpe Nov. 1, 2022, 12:40 p.m. UTC | #4
On Tue, Nov 01, 2022 at 08:09:52AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, October 26, 2022 2:51 AM
> >
> >  menuconfig VFIO
> >  	tristate "VFIO Non-Privileged userspace driver framework"
> >  	select IOMMU_API
> > +	depends on IOMMUFD || !IOMMUFD
> 
> Out of curiosity. What is the meaning of this dependency claim?
> 
> > @@ -717,12 +735,23 @@ static int vfio_group_ioctl_set_container(struct
> > vfio_group *group,
> >  	}
> > 
> >  	container = vfio_container_from_file(f.file);
> > -	ret = -EINVAL;
> 
> this changes the errno from -EINVAL to -EBADF for the original container
> path. Is it desired?

Yes, EBADFD is the right error code (it is a typo it was EBADF)

> >  	if (container) {
> >  		ret = vfio_container_attach_group(container, group);
> >  		goto out_unlock;
> >  	}
> > 
> > +	iommufd = iommufd_ctx_from_file(f.file);
> > +	if (!IS_ERR(iommufd)) {
> 
> The only errno which iommufd_ctx_from_file() may return is -EBADFD
> which duplicates with -EBADF assignment in following line.

The concept is that other places using iommufd_ctx_from_file() should
forward the return code directly. vfio is probably the only thing that
is going to be multiplexing like this.

> > +		u32 ioas_id;
> > +
> > +		group->iommufd = iommufd;
> > +		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
> 
> exchange the order of above two lines and only assign group->iommufd
> when the compat call succeeds.

Yeah, that is probably a small bug:

-               group->iommufd = iommufd;
                ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+               if (ret) {
+                       iommufd_ctx_put(group->iommufd);
+                       goto out_unlock;
+               }
+
+               group->iommufd = iommufd;
                goto out_unlock;


> > @@ -900,7 +940,7 @@ static int vfio_group_ioctl_get_status(struct
> > vfio_group *group,
> >  		return -ENODEV;
> >  	}
> > 
> > -	if (group->container)
> > +	if (group->container || group->iommufd)
> >  		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
> >  				VFIO_GROUP_FLAGS_VIABLE;
> 
> Copy some explanation from commit msg to explain the subtle difference
> between container and iommufd.

	/*
 	 * With the container FD the iommu_group_claim_dma_owner() is done
	 * during SET_CONTAINER but for IOMMFD this is done during
	 * VFIO_GROUP_GET_DEVICE_FD. Meaning that with iommufd
	 * VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due
	 * to viability.
	 */

Thanks,
Jason
Yi Liu Nov. 2, 2022, 7:28 a.m. UTC | #5
On 2022/10/26 02:50, Jason Gunthorpe wrote:
> This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an
> iommufd.
> 
> In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
> not be attached to any groups.

is there any special reason that we cannot attach the IOAS in the SET
container phase or SET_IOMMU phase?

> 
>  From a VFIO perspective this means that the VFIO_GROUP_GET_STATUS and
> VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
> the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for
> IOMMFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that

s/IOMMFD/IOMMUFD

> VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to
> viability.
> 
> As GET_DEVICE_FD can fail for many reasons already this is not expected to
> be a meaningful difference.
> 
> Reorganize the tests for if the group has an assigned container or iommu
> into a vfio_group_has_iommu() function and consolidate all the duplicated
> WARN_ON's etc related to this.
> 
> Call container functions only if a container is actually present on the
> group.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/vfio/Kconfig     |  1 +
>   drivers/vfio/container.c |  7 ++--
>   drivers/vfio/vfio.h      |  2 ++
>   drivers/vfio/vfio_main.c | 76 ++++++++++++++++++++++++++++++++--------
>   4 files changed, 69 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 86c381ceb9a1e9..1118d322eec97d 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -2,6 +2,7 @@
>   menuconfig VFIO
>   	tristate "VFIO Non-Privileged userspace driver framework"
>   	select IOMMU_API
> +	depends on IOMMUFD || !IOMMUFD
>   	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
>   	select INTERVAL_TREE
>   	help
> diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
> index d97747dfb05d02..8772dad6808539 100644
> --- a/drivers/vfio/container.c
> +++ b/drivers/vfio/container.c
> @@ -516,8 +516,11 @@ int vfio_group_use_container(struct vfio_group *group)
>   {
>   	lockdep_assert_held(&group->group_lock);
>   
> -	if (!group->container || !group->container->iommu_driver ||
> -	    WARN_ON(!group->container_users))
> +	/*
> +	 * The container fd has been assigned with VFIO_GROUP_SET_CONTAINER but
> +	 * VFIO_SET_IOMMU hasn't been done yet.
> +	 */
> +	if (!group->container->iommu_driver)
>   		return -EINVAL;
>   
>   	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 247590334e14b0..985e13d52989ca 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -10,6 +10,7 @@
>   #include <linux/cdev.h>
>   #include <linux/module.h>
>   
> +struct iommufd_ctx;
>   struct iommu_group;
>   struct vfio_device;
>   struct vfio_container;
> @@ -60,6 +61,7 @@ struct vfio_group {
>   	struct kvm			*kvm;
>   	struct file			*opened_file;
>   	struct blocking_notifier_head	notifier;
> +	struct iommufd_ctx		*iommufd;
>   };
>   
>   /* events for the backend driver notify callback */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index a8d1fbfcc3ddad..cf0ea744de931e 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -35,6 +35,7 @@
>   #include <linux/pm_runtime.h>
>   #include <linux/interval_tree.h>
>   #include <linux/iova_bitmap.h>
> +#include <linux/iommufd.h>
>   #include "vfio.h"
>   
>   #define DRIVER_VERSION	"0.3"
> @@ -665,6 +666,16 @@ EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
>   /*
>    * VFIO Group fd, /dev/vfio/$GROUP
>    */
> +static bool vfio_group_has_iommu(struct vfio_group *group)
> +{
> +	lockdep_assert_held(&group->group_lock);
> +	if (!group->container)
> +		WARN_ON(group->container_users);
> +	else
> +		WARN_ON(!group->container_users);
> +	return group->container || group->iommufd;
> +}
> +
>   /*
>    * VFIO_GROUP_UNSET_CONTAINER should fail if there are other users or
>    * if there was no container to unset.  Since the ioctl is called on
> @@ -676,15 +687,21 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
>   	int ret = 0;
>   
>   	mutex_lock(&group->group_lock);
> -	if (!group->container) {
> +	if (!vfio_group_has_iommu(group)) {
>   		ret = -EINVAL;
>   		goto out_unlock;
>   	}
> -	if (group->container_users != 1) {
> -		ret = -EBUSY;
> -		goto out_unlock;
> +	if (group->container) {
> +		if (group->container_users != 1) {
> +			ret = -EBUSY;
> +			goto out_unlock;
> +		}
> +		vfio_group_detach_container(group);
> +	}
> +	if (group->iommufd) {
> +		iommufd_ctx_put(group->iommufd);
> +		group->iommufd = NULL;
>   	}
> -	vfio_group_detach_container(group);
>   
>   out_unlock:
>   	mutex_unlock(&group->group_lock);
> @@ -695,6 +712,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>   					  int __user *arg)
>   {
>   	struct vfio_container *container;
> +	struct iommufd_ctx *iommufd;
>   	struct fd f;
>   	int ret;
>   	int fd;
> @@ -707,7 +725,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>   		return -EBADF;
>   
>   	mutex_lock(&group->group_lock);
> -	if (group->container || WARN_ON(group->container_users)) {
> +	if (vfio_group_has_iommu(group)) {
>   		ret = -EINVAL;
>   		goto out_unlock;
>   	}
> @@ -717,12 +735,23 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>   	}
>   
>   	container = vfio_container_from_file(f.file);
> -	ret = -EINVAL;
>   	if (container) {
>   		ret = vfio_container_attach_group(container, group);
>   		goto out_unlock;
>   	}
>   
> +	iommufd = iommufd_ctx_from_file(f.file);
> +	if (!IS_ERR(iommufd)) {
> +		u32 ioas_id;
> +
> +		group->iommufd = iommufd;
> +		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
> +		goto out_unlock;
> +	}
> +
> +	/* The FD passed is not recognized. */
> +	ret = -EBADF;
> +
>   out_unlock:
>   	mutex_unlock(&group->group_lock);
>   	fdput(f);
> @@ -752,9 +781,16 @@ static int vfio_device_first_open(struct vfio_device *device)
>   	 * it during close_device.
>   	 */
>   	mutex_lock(&device->group->group_lock);
> -	ret = vfio_group_use_container(device->group);
> -	if (ret)
> +	if (!vfio_group_has_iommu(device->group)) {
> +		ret = -EINVAL;
>   		goto err_module_put;
> +	}
> +
> +	if (device->group->container) {
> +		ret = vfio_group_use_container(device->group);
> +		if (ret)
> +			goto err_module_put;
> +	}
>   
>   	device->kvm = device->group->kvm;
>   	if (device->ops->open_device) {
> @@ -762,14 +798,16 @@ static int vfio_device_first_open(struct vfio_device *device)
>   		if (ret)
>   			goto err_container;
>   	}
> -	vfio_device_container_register(device);
> +	if (device->group->container)
> +		vfio_device_container_register(device);
>   	mutex_unlock(&device->group->group_lock);
>   	return 0;
>   
>   err_container:
> -	vfio_group_unuse_container(device->group);
> -err_module_put:
> +	if (device->group->container)
> +		vfio_group_unuse_container(device->group);
>   	device->kvm = NULL;
> +err_module_put:
>   	mutex_unlock(&device->group->group_lock);
>   	module_put(device->dev->driver->owner);
>   	return ret;
> @@ -780,11 +818,13 @@ static void vfio_device_last_close(struct vfio_device *device)
>   	lockdep_assert_held(&device->dev_set->lock);
>   
>   	mutex_lock(&device->group->group_lock);
> -	vfio_device_container_unregister(device);
> +	if (device->group->container)
> +		vfio_device_container_unregister(device);
>   	if (device->ops->close_device)
>   		device->ops->close_device(device);
>   	device->kvm = NULL;
> -	vfio_group_unuse_container(device->group);
> +	if (device->group->container)
> +		vfio_group_unuse_container(device->group);
>   	mutex_unlock(&device->group->group_lock);
>   	module_put(device->dev->driver->owner);
>   }
> @@ -900,7 +940,7 @@ static int vfio_group_ioctl_get_status(struct vfio_group *group,
>   		return -ENODEV;
>   	}
>   
> -	if (group->container)
> +	if (group->container || group->iommufd)
>   		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
>   				VFIO_GROUP_FLAGS_VIABLE;
>   	else if (!iommu_group_dma_owner_claimed(group->iommu_group))
> @@ -983,6 +1023,10 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
>   	WARN_ON(group->notifier.head);
>   	if (group->container)
>   		vfio_group_detach_container(group);
> +	if (group->iommufd) {
> +		iommufd_ctx_put(group->iommufd);
> +		group->iommufd = NULL;
> +	}
>   	group->opened_file = NULL;
>   	mutex_unlock(&group->group_lock);
>   	return 0;
> @@ -1879,6 +1923,8 @@ static void __exit vfio_cleanup(void)
>   module_init(vfio_init);
>   module_exit(vfio_cleanup);
>   
> +MODULE_IMPORT_NS(IOMMUFD);
> +MODULE_IMPORT_NS(IOMMUFD_VFIO);
>   MODULE_VERSION(DRIVER_VERSION);
>   MODULE_LICENSE("GPL v2");
>   MODULE_AUTHOR(DRIVER_AUTHOR);
Tian, Kevin Nov. 3, 2022, 4:39 a.m. UTC | #6
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 1, 2022 7:51 PM
> 
> On Tue, Nov 01, 2022 at 02:19:04AM -0700, Nicolin Chen wrote:
> > On Tue, Nov 01, 2022 at 08:09:52AM +0000, Tian, Kevin wrote:
> >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Wednesday, October 26, 2022 2:51 AM
> > > >
> > > >  menuconfig VFIO
> > > >       tristate "VFIO Non-Privileged userspace driver framework"
> > > >       select IOMMU_API
> > > > +     depends on IOMMUFD || !IOMMUFD
> > >
> > > Out of curiosity. What is the meaning of this dependency claim?
> >
> > "is it a module or not" -- from https://lwn.net/Articles/683476/
> 
> Yes, it is the kconfig pattern for "This symbol optionally uses the
> other symbol, and if the other symbol is turned on then it has to be
> the right y/m value"
> 
> ie rejects vfio being built-in but iommufd being a module
> 

Thanks. a good learning. 
Jason Gunthorpe Nov. 7, 2022, 11:45 p.m. UTC | #7
On Wed, Nov 02, 2022 at 03:28:20PM +0800, Yi Liu wrote:
> On 2022/10/26 02:50, Jason Gunthorpe wrote:
> > This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an
> > iommufd.
> > 
> > In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
> > not be attached to any groups.
> 
> is there any special reason that we cannot attach the IOAS in the SET
> container phase or SET_IOMMU phase?

It is because iommufd has been deliberately made to work only on
struct device * not iommu_groups, and when we go to do the
SET_CONTAINER we have no idea what the device will be.

So defering the operation is the cleanest approach.

> >  From a VFIO perspective this means that the VFIO_GROUP_GET_STATUS and
> > VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
> > the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for
> > IOMMFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that
> 
> s/IOMMFD/IOMMUFD

Done

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 86c381ceb9a1e9..1118d322eec97d 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@ 
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	select IOMMU_API
+	depends on IOMMUFD || !IOMMUFD
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
 	select INTERVAL_TREE
 	help
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d97747dfb05d02..8772dad6808539 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -516,8 +516,11 @@  int vfio_group_use_container(struct vfio_group *group)
 {
 	lockdep_assert_held(&group->group_lock);
 
-	if (!group->container || !group->container->iommu_driver ||
-	    WARN_ON(!group->container_users))
+	/*
+	 * The container fd has been assigned with VFIO_GROUP_SET_CONTAINER but
+	 * VFIO_SET_IOMMU hasn't been done yet.
+	 */
+	if (!group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 247590334e14b0..985e13d52989ca 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -10,6 +10,7 @@ 
 #include <linux/cdev.h>
 #include <linux/module.h>
 
+struct iommufd_ctx;
 struct iommu_group;
 struct vfio_device;
 struct vfio_container;
@@ -60,6 +61,7 @@  struct vfio_group {
 	struct kvm			*kvm;
 	struct file			*opened_file;
 	struct blocking_notifier_head	notifier;
+	struct iommufd_ctx		*iommufd;
 };
 
 /* events for the backend driver notify callback */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a8d1fbfcc3ddad..cf0ea744de931e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -35,6 +35,7 @@ 
 #include <linux/pm_runtime.h>
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
+#include <linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -665,6 +666,16 @@  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 /*
  * VFIO Group fd, /dev/vfio/$GROUP
  */
+static bool vfio_group_has_iommu(struct vfio_group *group)
+{
+	lockdep_assert_held(&group->group_lock);
+	if (!group->container)
+		WARN_ON(group->container_users);
+	else
+		WARN_ON(!group->container_users);
+	return group->container || group->iommufd;
+}
+
 /*
  * VFIO_GROUP_UNSET_CONTAINER should fail if there are other users or
  * if there was no container to unset.  Since the ioctl is called on
@@ -676,15 +687,21 @@  static int vfio_group_ioctl_unset_container(struct vfio_group *group)
 	int ret = 0;
 
 	mutex_lock(&group->group_lock);
-	if (!group->container) {
+	if (!vfio_group_has_iommu(group)) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	if (group->container_users != 1) {
-		ret = -EBUSY;
-		goto out_unlock;
+	if (group->container) {
+		if (group->container_users != 1) {
+			ret = -EBUSY;
+			goto out_unlock;
+		}
+		vfio_group_detach_container(group);
+	}
+	if (group->iommufd) {
+		iommufd_ctx_put(group->iommufd);
+		group->iommufd = NULL;
 	}
-	vfio_group_detach_container(group);
 
 out_unlock:
 	mutex_unlock(&group->group_lock);
@@ -695,6 +712,7 @@  static int vfio_group_ioctl_set_container(struct vfio_group *group,
 					  int __user *arg)
 {
 	struct vfio_container *container;
+	struct iommufd_ctx *iommufd;
 	struct fd f;
 	int ret;
 	int fd;
@@ -707,7 +725,7 @@  static int vfio_group_ioctl_set_container(struct vfio_group *group,
 		return -EBADF;
 
 	mutex_lock(&group->group_lock);
-	if (group->container || WARN_ON(group->container_users)) {
+	if (vfio_group_has_iommu(group)) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
@@ -717,12 +735,23 @@  static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	}
 
 	container = vfio_container_from_file(f.file);
-	ret = -EINVAL;
 	if (container) {
 		ret = vfio_container_attach_group(container, group);
 		goto out_unlock;
 	}
 
+	iommufd = iommufd_ctx_from_file(f.file);
+	if (!IS_ERR(iommufd)) {
+		u32 ioas_id;
+
+		group->iommufd = iommufd;
+		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+		goto out_unlock;
+	}
+
+	/* The FD passed is not recognized. */
+	ret = -EBADF;
+
 out_unlock:
 	mutex_unlock(&group->group_lock);
 	fdput(f);
@@ -752,9 +781,16 @@  static int vfio_device_first_open(struct vfio_device *device)
 	 * it during close_device.
 	 */
 	mutex_lock(&device->group->group_lock);
-	ret = vfio_group_use_container(device->group);
-	if (ret)
+	if (!vfio_group_has_iommu(device->group)) {
+		ret = -EINVAL;
 		goto err_module_put;
+	}
+
+	if (device->group->container) {
+		ret = vfio_group_use_container(device->group);
+		if (ret)
+			goto err_module_put;
+	}
 
 	device->kvm = device->group->kvm;
 	if (device->ops->open_device) {
@@ -762,14 +798,16 @@  static int vfio_device_first_open(struct vfio_device *device)
 		if (ret)
 			goto err_container;
 	}
-	vfio_device_container_register(device);
+	if (device->group->container)
+		vfio_device_container_register(device);
 	mutex_unlock(&device->group->group_lock);
 	return 0;
 
 err_container:
-	vfio_group_unuse_container(device->group);
-err_module_put:
+	if (device->group->container)
+		vfio_group_unuse_container(device->group);
 	device->kvm = NULL;
+err_module_put:
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 	return ret;
@@ -780,11 +818,13 @@  static void vfio_device_last_close(struct vfio_device *device)
 	lockdep_assert_held(&device->dev_set->lock);
 
 	mutex_lock(&device->group->group_lock);
-	vfio_device_container_unregister(device);
+	if (device->group->container)
+		vfio_device_container_unregister(device);
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	device->kvm = NULL;
-	vfio_group_unuse_container(device->group);
+	if (device->group->container)
+		vfio_group_unuse_container(device->group);
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 }
@@ -900,7 +940,7 @@  static int vfio_group_ioctl_get_status(struct vfio_group *group,
 		return -ENODEV;
 	}
 
-	if (group->container)
+	if (group->container || group->iommufd)
 		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
 				VFIO_GROUP_FLAGS_VIABLE;
 	else if (!iommu_group_dma_owner_claimed(group->iommu_group))
@@ -983,6 +1023,10 @@  static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	WARN_ON(group->notifier.head);
 	if (group->container)
 		vfio_group_detach_container(group);
+	if (group->iommufd) {
+		iommufd_ctx_put(group->iommufd);
+		group->iommufd = NULL;
+	}
 	group->opened_file = NULL;
 	mutex_unlock(&group->group_lock);
 	return 0;
@@ -1879,6 +1923,8 @@  static void __exit vfio_cleanup(void)
 module_init(vfio_init);
 module_exit(vfio_cleanup);
 
+MODULE_IMPORT_NS(IOMMUFD);
+MODULE_IMPORT_NS(IOMMUFD_VFIO);
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);