diff mbox series

[04/10] vfio: Use a struct of function pointers instead of a many symbol_get()'s

Message ID 4-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Remove vfio_group from the struct file facing VFIO API | expand

Commit Message

Jason Gunthorpe April 14, 2022, 6:46 p.m. UTC
kvm and VFIO need to be coupled together however neither is willing to
tolerate a direct module dependency. Instead when kvm is given a VFIO FD
it uses many symbol_get()'s to access VFIO.

Provide a single VFIO function vfio_file_get_ops() which validates the
given struct file * is a VFIO file and then returns a struct of ops.

Following patches will redo each of the symbol_get() calls into an
indirection through this ops struct.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c  | 19 +++++++++++++++++++
 include/linux/vfio.h |  3 +++
 virt/kvm/vfio.c      | 14 ++++++++++++++
 3 files changed, 36 insertions(+)

Comments

Tian, Kevin April 15, 2022, 3:57 a.m. UTC | #1
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, April 15, 2022 2:46 AM
> 
> kvm and VFIO need to be coupled together however neither is willing to
> tolerate a direct module dependency. Instead when kvm is given a VFIO FD
> it uses many symbol_get()'s to access VFIO.
> 
> Provide a single VFIO function vfio_file_get_ops() which validates the
> given struct file * is a VFIO file and then returns a struct of ops.

VFIO has multiple files (container, group, and device). Here and other
places seems to assume a VFIO file is just a group file. While it is correct
in this external facing context, probably calling it 'VFIO group file' is
clearer in various code comments and patch descriptions.

> 
> Following patches will redo each of the symbol_get() calls into an
> indirection through this ops struct.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Out of curiosity, how do you envision when iommufd is introduced?
Will we need a generic ops abstraction so both vfio and iommufd
register their own ops to keep kvm side generic or a new protocol
will be introduced between iommufd and kvm?
Christoph Hellwig April 15, 2022, 4:45 a.m. UTC | #2
On Thu, Apr 14, 2022 at 03:46:03PM -0300, Jason Gunthorpe wrote:
> kvm and VFIO need to be coupled together however neither is willing to
> tolerate a direct module dependency. Instead when kvm is given a VFIO FD
> it uses many symbol_get()'s to access VFIO.
> 
> Provide a single VFIO function vfio_file_get_ops() which validates the
> given struct file * is a VFIO file and then returns a struct of ops.
> 
> Following patches will redo each of the symbol_get() calls into an
> indirection through this ops struct.

So I got anoyed at this as well a while ago and I still think this
is the wrong way around.

I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and
just let kvm_vfio_ops live in a module than all the symbol_get
crazyness.  We'll need to be careful to deal with unload races
or just not allow unloading, though.
Jason Gunthorpe April 15, 2022, 12:13 p.m. UTC | #3
On Fri, Apr 15, 2022 at 06:45:34AM +0200, Christoph Hellwig wrote:
> On Thu, Apr 14, 2022 at 03:46:03PM -0300, Jason Gunthorpe wrote:
> > kvm and VFIO need to be coupled together however neither is willing to
> > tolerate a direct module dependency. Instead when kvm is given a VFIO FD
> > it uses many symbol_get()'s to access VFIO.
> > 
> > Provide a single VFIO function vfio_file_get_ops() which validates the
> > given struct file * is a VFIO file and then returns a struct of ops.
> > 
> > Following patches will redo each of the symbol_get() calls into an
> > indirection through this ops struct.
> 
> So I got anoyed at this as well a while ago and I still think this
> is the wrong way around.

What I plan to do in future is to have differnt ops returned depending
on if the file is a struct vfio_group or a struct vfio_device, so it
is not entirely pointless like this.

> I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and
> just let kvm_vfio_ops live in a module than all the symbol_get
> crazyness.  We'll need to be careful to deal with unload races
> or just not allow unloading, though.

This is certainly more complicated - especially considering module
unload - than a single symbol_get(). How do you see the benefit?

Jason
Christoph Hellwig April 15, 2022, 2:36 p.m. UTC | #4
On Fri, Apr 15, 2022 at 09:13:01AM -0300, Jason Gunthorpe wrote:
> > So I got anoyed at this as well a while ago and I still think this
> > is the wrong way around.
> 
> What I plan to do in future is to have differnt ops returned depending
> on if the file is a struct vfio_group or a struct vfio_device, so it
> is not entirely pointless like this.

Uh, I think that is a rather ugly interface.  Why would kvm pass in
FDs to both into the same interface.

> 
> > I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and
> > just let kvm_vfio_ops live in a module than all the symbol_get
> > crazyness.  We'll need to be careful to deal with unload races
> > or just not allow unloading, though.
> 
> This is certainly more complicated - especially considering module
> unload - than a single symbol_get(). How do you see the benefit?

Because that is the sensible layering - kvm already has an abstract
interface for emulated devices.  So instead of doing symbol_get magic
of some kind we should leverage it.

But I can see how that is something you might not want to do for
this series.  So maybe stick to the individual symbol_gets for now
and I'll send a separate series to clean that up?  Especially as
I have a half-finished series for that from a while ago anyway.

> 
> Jason
---end quoted text---
Jason Gunthorpe April 15, 2022, 3:31 p.m. UTC | #5
On Fri, Apr 15, 2022 at 04:36:21PM +0200, Christoph Hellwig wrote:
> On Fri, Apr 15, 2022 at 09:13:01AM -0300, Jason Gunthorpe wrote:
> > > So I got anoyed at this as well a while ago and I still think this
> > > is the wrong way around.
> > 
> > What I plan to do in future is to have differnt ops returned depending
> > on if the file is a struct vfio_group or a struct vfio_device, so it
> > is not entirely pointless like this.
> 
> Uh, I think that is a rather ugly interface.  Why would kvm pass in
> FDs to both into the same interface.

We can do it either way, but IMHO, it is not very different than
passing a socket/file/pipe/etc FD to read() - the list of VFIO files
ops works identically on vfio device or vfio file FDs. The appeal to
multiplex at the file level means we don't need to build parallel
group/device uapi paths and parallel kAPI as well.

Ultimately none of these uses of the file care about what the file is,
these are all 'security proofs' and either FD type is fine to provide
the proof.

> Because that is the sensible layering - kvm already has an abstract
> interface for emulated devices.  So instead of doing symbol_get magic
> of some kind we should leverage it.

Hm, I don't know anthing about kvm's device interface
 
> But I can see how that is something you might not want to do for
> this series.  So maybe stick to the individual symbol_gets for now
> and I'll send a separate series to clean that up?  Especially as
> I have a half-finished series for that from a while ago anyway.

Sure, I only did this because I became sad while touching all the
symbol gets - it really is ugly. It will make the diffstat much worse,
but no problem.

If you have something already then lets avoid touching it too much
here.

Thanks,
Jason
Jason Gunthorpe April 15, 2022, 9:54 p.m. UTC | #6
On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, April 15, 2022 2:46 AM
> > 
> > kvm and VFIO need to be coupled together however neither is willing to
> > tolerate a direct module dependency. Instead when kvm is given a VFIO FD
> > it uses many symbol_get()'s to access VFIO.
> > 
> > Provide a single VFIO function vfio_file_get_ops() which validates the
> > given struct file * is a VFIO file and then returns a struct of ops.
> 
> VFIO has multiple files (container, group, and device). Here and other
> places seems to assume a VFIO file is just a group file. While it is correct
> in this external facing context, probably calling it 'VFIO group file' is
> clearer in various code comments and patch descriptions.
> 
> > 
> > Following patches will redo each of the symbol_get() calls into an
> > indirection through this ops struct.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> Out of curiosity, how do you envision when iommufd is introduced?
> Will we need a generic ops abstraction so both vfio and iommufd
> register their own ops to keep kvm side generic or a new protocol
> will be introduced between iommufd and kvm? 

I imagine using the vfio_device in all these context where the vfio
group is used, not iommufd. This keeps everything internal to vfio.

Jason
Tian, Kevin April 16, 2022, midnight UTC | #7
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, April 16, 2022 5:54 AM
> 
> On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, April 15, 2022 2:46 AM
> > >
> > > kvm and VFIO need to be coupled together however neither is willing to
> > > tolerate a direct module dependency. Instead when kvm is given a VFIO
> FD
> > > it uses many symbol_get()'s to access VFIO.
> > >
> > > Provide a single VFIO function vfio_file_get_ops() which validates the
> > > given struct file * is a VFIO file and then returns a struct of ops.
> >
> > VFIO has multiple files (container, group, and device). Here and other
> > places seems to assume a VFIO file is just a group file. While it is correct
> > in this external facing context, probably calling it 'VFIO group file' is
> > clearer in various code comments and patch descriptions.
> >
> > >
> > > Following patches will redo each of the symbol_get() calls into an
> > > indirection through this ops struct.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >
> > Out of curiosity, how do you envision when iommufd is introduced?
> > Will we need a generic ops abstraction so both vfio and iommufd
> > register their own ops to keep kvm side generic or a new protocol
> > will be introduced between iommufd and kvm?
> 
> I imagine using the vfio_device in all these context where the vfio
> group is used, not iommufd. This keeps everything internal to vfio.
> 

In this case although the uAPI is called KVM_DEV_VFIO_GROUP_ADD
Qemu will pass in a device fd and with this series KVM doesn't care
whether it's actually a device or group and just use struct file to call
vfio_file_ops. correct?

You probably remember there is one additional requirement when
adding ENQCMD virtualization on Intel platform. KVM is required to
setup a guest PASID to host PASID translation table in CPU vmcs
structure to support ENQCMD in the guest. Following above direction
I suppose KVM will provide a new interface to allow user pass in
 [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g.
vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to
retrieve the host pasid. This sounds correct in concept as iommufd
only knows host pasid and any g->h information is managed by
vfio device driver.

Does it also make sense to you? Just want to think forward a bit
to make the whole picture clearer.

Thanks
Kevin
Jason Gunthorpe April 16, 2022, 1:33 a.m. UTC | #8
On Sat, Apr 16, 2022 at 12:00:12AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, April 16, 2022 5:54 AM
> > 
> > On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, April 15, 2022 2:46 AM
> > > >
> > > > kvm and VFIO need to be coupled together however neither is willing to
> > > > tolerate a direct module dependency. Instead when kvm is given a VFIO
> > FD
> > > > it uses many symbol_get()'s to access VFIO.
> > > >
> > > > Provide a single VFIO function vfio_file_get_ops() which validates the
> > > > given struct file * is a VFIO file and then returns a struct of ops.
> > >
> > > VFIO has multiple files (container, group, and device). Here and other
> > > places seems to assume a VFIO file is just a group file. While it is correct
> > > in this external facing context, probably calling it 'VFIO group file' is
> > > clearer in various code comments and patch descriptions.
> > >
> > > >
> > > > Following patches will redo each of the symbol_get() calls into an
> > > > indirection through this ops struct.
> > > >
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > >
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > >
> > > Out of curiosity, how do you envision when iommufd is introduced?
> > > Will we need a generic ops abstraction so both vfio and iommufd
> > > register their own ops to keep kvm side generic or a new protocol
> > > will be introduced between iommufd and kvm?
> > 
> > I imagine using the vfio_device in all these context where the vfio
> > group is used, not iommufd. This keeps everything internal to vfio.
> > 
> 
> In this case although the uAPI is called KVM_DEV_VFIO_GROUP_ADD

Yes, down this path we'd probably alias it to KVM_DEV_VFIO_ADD_FD or
something.

> Qemu will pass in a device fd and with this series KVM doesn't care
> whether it's actually a device or group and just use struct file to call
> vfio_file_ops. correct?

Yes

> You probably remember there is one additional requirement when
> adding ENQCMD virtualization on Intel platform. KVM is required to
> setup a guest PASID to host PASID translation table in CPU vmcs
> structure to support ENQCMD in the guest. Following above direction
> I suppose KVM will provide a new interface to allow user pass in
>  [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g.
> vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to
> retrieve the host pasid. This sounds correct in concept as iommufd
> only knows host pasid and any g->h information is managed by
> vfio device driver.

I think there is no direct linkage of KVM to iommufd or VFIO for
ENQCMD.

The security nature of KVM is that the VM world should never have more
privilege than the hypervisor process running the KVM.

Therefore, when VM does a vENQCMD it must be equviliant to a physical
ENQCMD that the KVM process could already execute anyhow. Yes, Intel
wired ENQCMD to a single PASID, but we could imagine a system call
that allowed the process to change the PASID that ENQCMD uses from an
authorized list of PASIDs that the process has access to.

So, the linkage from iommufd is indirect. When iommufd does whatever
to install a PASID in the process's ENQCMD authorization table KVM can
be instructed to link that PASID inside the ENQCMD to a vPASID in the
VM.

As long as the PASID is in the process table KVM can allow the VM to
use it.

And it explains how userspace can actually use ENQCMD in a VFIO
scenario with iommufd, where obviously it needs to be in direct
control of what PASID ENQCMD generates and not be tied only to the
PASID associated with the mm_struct.

Jason
Tian, Kevin April 18, 2022, 3:56 a.m. UTC | #9
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, April 16, 2022 9:33 AM
> 
> On Sat, Apr 16, 2022 at 12:00:12AM +0000, Tian, Kevin wrote:
> > You probably remember there is one additional requirement when
> > adding ENQCMD virtualization on Intel platform. KVM is required to
> > setup a guest PASID to host PASID translation table in CPU vmcs
> > structure to support ENQCMD in the guest. Following above direction
> > I suppose KVM will provide a new interface to allow user pass in
> >  [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g.
> > vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to
> > retrieve the host pasid. This sounds correct in concept as iommufd
> > only knows host pasid and any g->h information is managed by
> > vfio device driver.
> 
> I think there is no direct linkage of KVM to iommufd or VFIO for
> ENQCMD.
> 
> The security nature of KVM is that the VM world should never have more
> privilege than the hypervisor process running the KVM.

Indeed.

> 
> Therefore, when VM does a vENQCMD it must be equviliant to a physical
> ENQCMD that the KVM process could already execute anyhow. Yes, Intel
> wired ENQCMD to a single PASID, but we could imagine a system call
> that allowed the process to change the PASID that ENQCMD uses from an
> authorized list of PASIDs that the process has access to.

Yes, this makes more sense in concept. Just one note that for vENQCMD
guest changes PASID via xsave/xrstor which is not trapped thus we don't
need such change-PASID syscall in practice. The kernel just need maintain
a list of authorized PASIDs and setup the PASID translation structure 
properly in CPU. Then the guest is allowed to access any PASID authorized
and translated by the CPU. 

> 
> So, the linkage from iommufd is indirect. When iommufd does whatever
> to install a PASID in the process's ENQCMD authorization table KVM can
> be instructed to link that PASID inside the ENQCMD to a vPASID in the
> VM.
> 
> As long as the PASID is in the process table KVM can allow the VM to
> use it.
> 
> And it explains how userspace can actually use ENQCMD in a VFIO
> scenario with iommufd, where obviously it needs to be in direct
> control of what PASID ENQCMD generates and not be tied only to the
> PASID associated with the mm_struct.
> 

This reminds me back to the previous ioasid_set concept introduced
by Jacob [1]. Let's ignore the implementation detail for a while as lots
of logic there don't hold now given the progress of iommufd. But just
very high level concept-wise:

- Each mm is associated with a set of authorized PASIDs (ioasid_set);
- VFIO driver provides a uAPI for userspace to attach a guest virtual
  PASID (vPASID) to a hw page table in iommufd. In the uAPI:
    - a physical PASID (pPASID) is allocated and added to mm's ioasid_set;
    - the pPASID is used to actually attach to the hw page table;
    - the pPASID is returned to userspace upon successful attach;
- KVM provides a uAPI for userspace to map/unmap vPASID to pPASID
  in CPU PASID translation structure. User-provided pPASID must be
  found in mm->ioasid_set;

In this case the linkage from vfio/iommufd does be indirect.

My earlier reply was probably based on a wrong memory that the
entire ioasid_set concept was killed when the lengthy discussion
in [1] led to the debut of iommufd.

Thanks
Kevin

[1] https://lore.kernel.org/all/1614463286-97618-1-git-send-email-jacob.jun.pan@linux.intel.com/
Jason Gunthorpe April 19, 2022, 12:16 p.m. UTC | #10
On Mon, Apr 18, 2022 at 03:56:01AM +0000, Tian, Kevin wrote:
 
> - Each mm is associated with a set of authorized PASIDs (ioasid_set);
> - VFIO driver provides a uAPI for userspace to attach a guest virtual
>   PASID (vPASID) to a hw page table in iommufd. In the uAPI:
>     - a physical PASID (pPASID) is allocated and added to mm's ioasid_set;
>     - the pPASID is used to actually attach to the hw page table;
>     - the pPASID is returned to userspace upon successful attach;
> - KVM provides a uAPI for userspace to map/unmap vPASID to pPASID
>   in CPU PASID translation structure. User-provided pPASID must be
>   found in mm->ioasid_set;

It is more logical to be current->ioasid_set, not mm

But yes, something like this. The mdev api may want to mirror the kvm
api and map/unmap vPASID to pPASID

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a4555014bd1e72..93508f6a8beda5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2013,6 +2013,25 @@  long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
 }
 EXPORT_SYMBOL_GPL(vfio_external_check_extension);
 
+static const struct vfio_file_ops vfio_file_group_ops = {
+};
+
+/**
+ * vfio_file_get_ops - Return a struct of function pointers to use with the file
+ * @filep: VFIO file to return pointers for
+ *
+ * This API exists to allow KVM to call into VFIO without tightly coupling the
+ * VFIO and KVM modules together. KVM will call this using symbol_get() and from
+ * then on will call VFIO through the returned function pointers.
+ */
+const struct vfio_file_ops *vfio_file_get_ops(struct file *filep)
+{
+	if (filep->f_op != &vfio_group_fops)
+		return ERR_PTR(-EINVAL);
+	return &vfio_file_group_ops;
+}
+EXPORT_SYMBOL_GPL(vfio_file_get_ops);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 66dda06ec42d1b..409bbf817206cc 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -138,6 +138,8 @@  int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+struct vfio_file_ops {
+};
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
 extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
@@ -147,6 +149,7 @@  extern bool vfio_external_group_match_file(struct vfio_group *group,
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+const struct vfio_file_ops *vfio_file_get_ops(struct file *filep);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 7e1793a1f5f1fd..254d8c18378163 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -34,6 +34,7 @@  static void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
 struct kvm_vfio_group {
 	struct list_head node;
 	struct file *filp;
+	const struct vfio_file_ops *ops;
 	struct vfio_group *vfio_group;
 };
 
@@ -196,6 +197,7 @@  static void kvm_vfio_update_coherency(struct kvm_device *dev)
 
 static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 {
+	const struct vfio_file_ops *(*fn)(struct file *filep);
 	struct kvm_vfio *kv = dev->private;
 	struct vfio_group *vfio_group;
 	struct kvm_vfio_group *kvg;
@@ -221,6 +223,18 @@  static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 		goto err_unlock;
 	}
 
+	fn = symbol_get(vfio_file_get_ops);
+	if (!fn) {
+		ret = -EINVAL;
+		goto err_free;
+	}
+	kvg->ops = fn(filp);
+	symbol_put(vfio_file_get_ops);
+	if (IS_ERR(kvg->ops)) {
+		ret = PTR_ERR(kvg->ops);
+		goto err_free;
+	}
+
 	vfio_group = kvm_vfio_group_get_external_user(filp);
 	if (IS_ERR(vfio_group)) {
 		ret = PTR_ERR(vfio_group);