diff mbox series

[09/10] kvm/vfio: Remove vfio_group from kvm

Message ID 9-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Remove vfio_group from the struct file facing VFIO API | expand

Commit Message

Jason Gunthorpe April 14, 2022, 6:46 p.m. UTC
None of the VFIO APIs take in the vfio_group anymore, so we can remove it
completely.

This has a subtle side effect on the enforced coherency tracking. The
vfio_group_get_external_user() was holding on to the container_users which
would prevent the iommu_domain and thus the enforced coherency value from
changing while the group is registered with kvm.

It changes the security proof slightly into 'user must hold a group FD
that has a device that cannot enforce DMA coherence'. As opening the group
FD, not attaching the container, is the privileged operation this doesn't
change the security properties much.

On the flip side it paves the way to changing the iommu_domain/container
attached to a group at runtime which is something that will be required to
support nested translation.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 virt/kvm/vfio.c | 40 ----------------------------------------
 1 file changed, 40 deletions(-)

Comments

Tian, Kevin April 15, 2022, 4:21 a.m. UTC | #1
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, April 15, 2022 2:46 AM
> 
> None of the VFIO APIs take in the vfio_group anymore, so we can remove it
> completely.
> 
> This has a subtle side effect on the enforced coherency tracking. The
> vfio_group_get_external_user() was holding on to the container_users which
> would prevent the iommu_domain and thus the enforced coherency value
> from
> changing while the group is registered with kvm.
> 
> It changes the security proof slightly into 'user must hold a group FD
> that has a device that cannot enforce DMA coherence'. As opening the group
> FD, not attaching the container, is the privileged operation this doesn't
> change the security properties much.

If we allow vfio_file_enforced_coherent() to return error then the security
proof can be sustained? In this case kvm can simply reject adding a group
which is opened but not attached to a container. 

Thanks
Kevin
Jason Gunthorpe April 15, 2022, 9:56 p.m. UTC | #2
On Fri, Apr 15, 2022 at 04:21:45AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, April 15, 2022 2:46 AM
> > 
> > None of the VFIO APIs take in the vfio_group anymore, so we can remove it
> > completely.
> > 
> > This has a subtle side effect on the enforced coherency tracking. The
> > vfio_group_get_external_user() was holding on to the container_users which
> > would prevent the iommu_domain and thus the enforced coherency value
> > from
> > changing while the group is registered with kvm.
> > 
> > It changes the security proof slightly into 'user must hold a group FD
> > that has a device that cannot enforce DMA coherence'. As opening the group
> > FD, not attaching the container, is the privileged operation this doesn't
> > change the security properties much.
> 
> If we allow vfio_file_enforced_coherent() to return error then the security
> proof can be sustained? In this case kvm can simply reject adding a group
> which is opened but not attached to a container. 

The issue is the user can detatch the container from the group because
kvm no longer holds a refcount on the container.

Jason
Tian, Kevin April 16, 2022, 12:42 a.m. UTC | #3
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, April 16, 2022 5:56 AM
> 
> On Fri, Apr 15, 2022 at 04:21:45AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, April 15, 2022 2:46 AM
> > >
> > > None of the VFIO APIs take in the vfio_group anymore, so we can remove
> it
> > > completely.
> > >
> > > This has a subtle side effect on the enforced coherency tracking. The
> > > vfio_group_get_external_user() was holding on to the container_users
> which
> > > would prevent the iommu_domain and thus the enforced coherency
> value
> > > from
> > > changing while the group is registered with kvm.
> > >
> > > It changes the security proof slightly into 'user must hold a group FD
> > > that has a device that cannot enforce DMA coherence'. As opening the
> group
> > > FD, not attaching the container, is the privileged operation this doesn't
> > > change the security properties much.
> >
> > If we allow vfio_file_enforced_coherent() to return error then the security
> > proof can be sustained? In this case kvm can simply reject adding a group
> > which is opened but not attached to a container.
> 
> The issue is the user can detatch the container from the group because
> kvm no longer holds a refcount on the container.
> 

See your point. In this case the guest already loses the access to the
device once the container is detached from the group thus using a
stale coherency info in KVM side is probably just fine which becomes
more a performance issue.

Then what about PPC? w/o holding a reference to container is there
any impact on spapr_tce_table which is attached to the group? 
I don't know the relationship between this table and vfio container
and whether there is a lifetime dependency in between. 

Thanks
Kevin
Jason Gunthorpe April 16, 2022, 1:34 a.m. UTC | #4
On Sat, Apr 16, 2022 at 12:42:50AM +0000, Tian, Kevin wrote:

> Then what about PPC? w/o holding a reference to container is there
> any impact on spapr_tce_table which is attached to the group? 
> I don't know the relationship between this table and vfio container
> and whether there is a lifetime dependency in between. 

table seems to have its own FD so it should be refcounted
independently and not indirectly rely on the vfio container. It seemed
like there was enough protection there..

Jason
Tian, Kevin April 18, 2022, 6:09 a.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, April 16, 2022 9:34 AM
> 
> On Sat, Apr 16, 2022 at 12:42:50AM +0000, Tian, Kevin wrote:
> 
> > Then what about PPC? w/o holding a reference to container is there
> > any impact on spapr_tce_table which is attached to the group?
> > I don't know the relationship between this table and vfio container
> > and whether there is a lifetime dependency in between.
> 
> table seems to have its own FD so it should be refcounted
> independently and not indirectly rely on the vfio container. It seemed
> like there was enough protection there..
> 

the table in PPC context refers to a window in the global iova address
space. From this angle the reference should be torn down when the
group is detached from the container which represents the address
space. But given the detach operation also prevents the user from
accessing the device, having the reference held by KVM for a while
until KVM_DEV_VFIO_GROUP_DEL is called is probably not harmful.

In any case this is also worth an explanation similar to what you
did for cache coherency.

Thanks
Kevin
diff mbox series

Patch

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9baf04c5b0cc3d..39834a0653d83a 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -35,7 +35,6 @@  struct kvm_vfio_group {
 	struct list_head node;
 	struct file *filp;
 	const struct vfio_file_ops *ops;
-	struct vfio_group *vfio_group;
 };
 
 struct kvm_vfio {
@@ -44,35 +43,6 @@  struct kvm_vfio {
 	bool noncoherent;
 };
 
-static struct vfio_group *kvm_vfio_group_get_external_user(struct file *filep)
-{
-	struct vfio_group *vfio_group;
-	struct vfio_group *(*fn)(struct file *);
-
-	fn = symbol_get(vfio_group_get_external_user);
-	if (!fn)
-		return ERR_PTR(-EINVAL);
-
-	vfio_group = fn(filep);
-
-	symbol_put(vfio_group_get_external_user);
-
-	return vfio_group;
-}
-
-static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
-{
-	void (*fn)(struct vfio_group *);
-
-	fn = symbol_get(vfio_group_put_external_user);
-	if (!fn)
-		return;
-
-	fn(vfio_group);
-
-	symbol_put(vfio_group_put_external_user);
-}
-
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
 					     struct kvm_vfio_group *kvg)
 {
@@ -121,7 +91,6 @@  static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 {
 	const struct vfio_file_ops *(*fn)(struct file *filep);
 	struct kvm_vfio *kv = dev->private;
-	struct vfio_group *vfio_group;
 	struct kvm_vfio_group *kvg;
 	struct file *filp;
 	int ret;
@@ -157,15 +126,8 @@  static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 		goto err_free;
 	}
 
-	vfio_group = kvm_vfio_group_get_external_user(filp);
-	if (IS_ERR(vfio_group)) {
-		ret = PTR_ERR(vfio_group);
-		goto err_free;
-	}
-
 	kvg->filp = filp;
 	list_add_tail(&kvg->node, &kv->group_list);
-	kvg->vfio_group = vfio_group;
 
 	kvm_arch_start_assignment(dev->kvm);
 
@@ -206,7 +168,6 @@  static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 		kvm_arch_end_assignment(dev->kvm);
 		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
 		kvg->ops->set_kvm(kvg->filp, NULL);
-		kvm_vfio_group_put_external_user(kvg->vfio_group);
 		fput(kvg->filp);
 		kfree(kvg);
 		ret = 0;
@@ -322,7 +283,6 @@  static void kvm_vfio_destroy(struct kvm_device *dev)
 	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
 		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
 		kvg->ops->set_kvm(kvg->filp, NULL);
-		kvm_vfio_group_put_external_user(kvg->vfio_group);
 		fput(kvg->filp);
 		list_del(&kvg->node);
 		kfree(kvg);