Message ID | 4-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Remove vfio_group from the struct file facing VFIO API | expand |
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, April 15, 2022 2:46 AM > > kvm and VFIO need to be coupled together however neither is willing to > tolerate a direct module dependency. Instead when kvm is given a VFIO FD > it uses many symbol_get()'s to access VFIO. > > Provide a single VFIO function vfio_file_get_ops() which validates the > given struct file * is a VFIO file and then returns a struct of ops. VFIO has multiple files (container, group, and device). Here and other places seems to assume a VFIO file is just a group file. While it is correct in this external facing context, probably calling it 'VFIO group file' is clearer in various code comments and patch descriptions. > > Following patches will redo each of the symbol_get() calls into an > indirection through this ops struct. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Out of curiosity, how do you envision when iommufd is introduced? Will we need a generic ops abstraction so both vfio and iommufd register their own ops to keep kvm side generic or a new protocol will be introduced between iommufd and kvm?
On Thu, Apr 14, 2022 at 03:46:03PM -0300, Jason Gunthorpe wrote: > kvm and VFIO need to be coupled together however neither is willing to > tolerate a direct module dependency. Instead when kvm is given a VFIO FD > it uses many symbol_get()'s to access VFIO. > > Provide a single VFIO function vfio_file_get_ops() which validates the > given struct file * is a VFIO file and then returns a struct of ops. > > Following patches will redo each of the symbol_get() calls into an > indirection through this ops struct. So I got anoyed at this as well a while ago and I still think this is the wrong way around. I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and just let kvm_vfio_ops live in a module than all the symbol_get crazyness. We'll need to be careful to deal with unload races or just not allow unloading, though.
On Fri, Apr 15, 2022 at 06:45:34AM +0200, Christoph Hellwig wrote: > On Thu, Apr 14, 2022 at 03:46:03PM -0300, Jason Gunthorpe wrote: > > kvm and VFIO need to be coupled together however neither is willing to > > tolerate a direct module dependency. Instead when kvm is given a VFIO FD > > it uses many symbol_get()'s to access VFIO. > > > > Provide a single VFIO function vfio_file_get_ops() which validates the > > given struct file * is a VFIO file and then returns a struct of ops. > > > > Following patches will redo each of the symbol_get() calls into an > > indirection through this ops struct. > > So I got anoyed at this as well a while ago and I still think this > is the wrong way around. What I plan to do in future is to have differnt ops returned depending on if the file is a struct vfio_group or a struct vfio_device, so it is not entirely pointless like this. > I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and > just let kvm_vfio_ops live in a module than all the symbol_get > crazyness. We'll need to be careful to deal with unload races > or just not allow unloading, though. This is certainly more complicated - especially considering module unload - than a single symbol_get(). How do you see the benefit? Jason
On Fri, Apr 15, 2022 at 09:13:01AM -0300, Jason Gunthorpe wrote: > > So I got anoyed at this as well a while ago and I still think this > > is the wrong way around. > > What I plan to do in future is to have differnt ops returned depending > on if the file is a struct vfio_group or a struct vfio_device, so it > is not entirely pointless like this. Uh, I think that is a rather ugly interface. Why would kvm pass in FDs to both into the same interface. > > > I'd much rather EXPORT_SYMBOL_GPL kvm_register_device_ops and > > just let kvm_vfio_ops live in a module than all the symbol_get > > crazyness. We'll need to be careful to deal with unload races > > or just not allow unloading, though. > > This is certainly more complicated - especially considering module > unload - than a single symbol_get(). How do you see the benefit? Because that is the sensible layering - kvm already has an abstract interface for emulated devices. So instead of doing symbol_get magic of some kind we should leverage it. But I can see how that is something you might not want to do for this series. So maybe stick to the individual symbol_gets for now and I'll send a separate series to clean that up? Especially as I have a half-finished series for that from a while ago anyway. > > Jason ---end quoted text---
On Fri, Apr 15, 2022 at 04:36:21PM +0200, Christoph Hellwig wrote: > On Fri, Apr 15, 2022 at 09:13:01AM -0300, Jason Gunthorpe wrote: > > > So I got anoyed at this as well a while ago and I still think this > > > is the wrong way around. > > > > What I plan to do in future is to have differnt ops returned depending > > on if the file is a struct vfio_group or a struct vfio_device, so it > > is not entirely pointless like this. > > Uh, I think that is a rather ugly interface. Why would kvm pass in > FDs to both into the same interface. We can do it either way, but IMHO, it is not very different than passing a socket/file/pipe/etc FD to read() - the list of VFIO files ops works identically on vfio device or vfio file FDs. The appeal to multiplex at the file level means we don't need to build parallel group/device uapi paths and parallel kAPI as well. Ultimately none of these uses of the file care about what the file is, these are all 'security proofs' and either FD type is fine to provide the proof. > Because that is the sensible layering - kvm already has an abstract > interface for emulated devices. So instead of doing symbol_get magic > of some kind we should leverage it. Hm, I don't know anthing about kvm's device interface > But I can see how that is something you might not want to do for > this series. So maybe stick to the individual symbol_gets for now > and I'll send a separate series to clean that up? Especially as > I have a half-finished series for that from a while ago anyway. Sure, I only did this because I became sad while touching all the symbol gets - it really is ugly. It will make the diffstat much worse, but no problem. If you have something already then lets avoid touching it too much here. Thanks, Jason
On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Friday, April 15, 2022 2:46 AM > > > > kvm and VFIO need to be coupled together however neither is willing to > > tolerate a direct module dependency. Instead when kvm is given a VFIO FD > > it uses many symbol_get()'s to access VFIO. > > > > Provide a single VFIO function vfio_file_get_ops() which validates the > > given struct file * is a VFIO file and then returns a struct of ops. > > VFIO has multiple files (container, group, and device). Here and other > places seems to assume a VFIO file is just a group file. While it is correct > in this external facing context, probably calling it 'VFIO group file' is > clearer in various code comments and patch descriptions. > > > > > Following patches will redo each of the symbol_get() calls into an > > indirection through this ops struct. > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > Out of curiosity, how do you envision when iommufd is introduced? > Will we need a generic ops abstraction so both vfio and iommufd > register their own ops to keep kvm side generic or a new protocol > will be introduced between iommufd and kvm? I imagine using the vfio_device in all these context where the vfio group is used, not iommufd. This keeps everything internal to vfio. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Saturday, April 16, 2022 5:54 AM > > On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Friday, April 15, 2022 2:46 AM > > > > > > kvm and VFIO need to be coupled together however neither is willing to > > > tolerate a direct module dependency. Instead when kvm is given a VFIO > FD > > > it uses many symbol_get()'s to access VFIO. > > > > > > Provide a single VFIO function vfio_file_get_ops() which validates the > > > given struct file * is a VFIO file and then returns a struct of ops. > > > > VFIO has multiple files (container, group, and device). Here and other > > places seems to assume a VFIO file is just a group file. While it is correct > > in this external facing context, probably calling it 'VFIO group file' is > > clearer in various code comments and patch descriptions. > > > > > > > > Following patches will redo each of the symbol_get() calls into an > > > indirection through this ops struct. > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > > Out of curiosity, how do you envision when iommufd is introduced? > > Will we need a generic ops abstraction so both vfio and iommufd > > register their own ops to keep kvm side generic or a new protocol > > will be introduced between iommufd and kvm? > > I imagine using the vfio_device in all these context where the vfio > group is used, not iommufd. This keeps everything internal to vfio. > In this case although the uAPI is called KVM_DEV_VFIO_GROUP_ADD Qemu will pass in a device fd and with this series KVM doesn't care whether it's actually a device or group and just use struct file to call vfio_file_ops. correct? You probably remember there is one additional requirement when adding ENQCMD virtualization on Intel platform. KVM is required to setup a guest PASID to host PASID translation table in CPU vmcs structure to support ENQCMD in the guest. Following above direction I suppose KVM will provide a new interface to allow user pass in [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g. vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to retrieve the host pasid. This sounds correct in concept as iommufd only knows host pasid and any g->h information is managed by vfio device driver. Does it also make sense to you? Just want to think forward a bit to make the whole picture clearer. Thanks Kevin
On Sat, Apr 16, 2022 at 12:00:12AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Saturday, April 16, 2022 5:54 AM > > > > On Fri, Apr 15, 2022 at 03:57:14AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > Sent: Friday, April 15, 2022 2:46 AM > > > > > > > > kvm and VFIO need to be coupled together however neither is willing to > > > > tolerate a direct module dependency. Instead when kvm is given a VFIO > > FD > > > > it uses many symbol_get()'s to access VFIO. > > > > > > > > Provide a single VFIO function vfio_file_get_ops() which validates the > > > > given struct file * is a VFIO file and then returns a struct of ops. > > > > > > VFIO has multiple files (container, group, and device). Here and other > > > places seems to assume a VFIO file is just a group file. While it is correct > > > in this external facing context, probably calling it 'VFIO group file' is > > > clearer in various code comments and patch descriptions. > > > > > > > > > > > Following patches will redo each of the symbol_get() calls into an > > > > indirection through this ops struct. > > > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > > > > > > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > > > > > > Out of curiosity, how do you envision when iommufd is introduced? > > > Will we need a generic ops abstraction so both vfio and iommufd > > > register their own ops to keep kvm side generic or a new protocol > > > will be introduced between iommufd and kvm? > > > > I imagine using the vfio_device in all these context where the vfio > > group is used, not iommufd. This keeps everything internal to vfio. > > > > In this case although the uAPI is called KVM_DEV_VFIO_GROUP_ADD Yes, down this path we'd probably alias it to KVM_DEV_VFIO_ADD_FD or something. > Qemu will pass in a device fd and with this series KVM doesn't care > whether it's actually a device or group and just use struct file to call > vfio_file_ops. correct? Yes > You probably remember there is one additional requirement when > adding ENQCMD virtualization on Intel platform. KVM is required to > setup a guest PASID to host PASID translation table in CPU vmcs > structure to support ENQCMD in the guest. Following above direction > I suppose KVM will provide a new interface to allow user pass in > [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g. > vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to > retrieve the host pasid. This sounds correct in concept as iommufd > only knows host pasid and any g->h information is managed by > vfio device driver. I think there is no direct linkage of KVM to iommufd or VFIO for ENQCMD. The security nature of KVM is that the VM world should never have more privilege than the hypervisor process running the KVM. Therefore, when VM does a vENQCMD it must be equviliant to a physical ENQCMD that the KVM process could already execute anyhow. Yes, Intel wired ENQCMD to a single PASID, but we could imagine a system call that allowed the process to change the PASID that ENQCMD uses from an authorized list of PASIDs that the process has access to. So, the linkage from iommufd is indirect. When iommufd does whatever to install a PASID in the process's ENQCMD authorization table KVM can be instructed to link that PASID inside the ENQCMD to a vPASID in the VM. As long as the PASID is in the process table KVM can allow the VM to use it. And it explains how userspace can actually use ENQCMD in a VFIO scenario with iommufd, where obviously it needs to be in direct control of what PASID ENQCMD generates and not be tied only to the PASID associated with the mm_struct. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Saturday, April 16, 2022 9:33 AM > > On Sat, Apr 16, 2022 at 12:00:12AM +0000, Tian, Kevin wrote: > > You probably remember there is one additional requirement when > > adding ENQCMD virtualization on Intel platform. KVM is required to > > setup a guest PASID to host PASID translation table in CPU vmcs > > structure to support ENQCMD in the guest. Following above direction > > I suppose KVM will provide a new interface to allow user pass in > > [devfd, iommufd, guest_pasid] and then call a new vfio ops e.g. > > vfio_file_translate_guest_pasid(dev_file, iommufd, gpasid) to > > retrieve the host pasid. This sounds correct in concept as iommufd > > only knows host pasid and any g->h information is managed by > > vfio device driver. > > I think there is no direct linkage of KVM to iommufd or VFIO for > ENQCMD. > > The security nature of KVM is that the VM world should never have more > privilege than the hypervisor process running the KVM. Indeed. > > Therefore, when VM does a vENQCMD it must be equviliant to a physical > ENQCMD that the KVM process could already execute anyhow. Yes, Intel > wired ENQCMD to a single PASID, but we could imagine a system call > that allowed the process to change the PASID that ENQCMD uses from an > authorized list of PASIDs that the process has access to. Yes, this makes more sense in concept. Just one note that for vENQCMD guest changes PASID via xsave/xrstor which is not trapped thus we don't need such change-PASID syscall in practice. The kernel just need maintain a list of authorized PASIDs and setup the PASID translation structure properly in CPU. Then the guest is allowed to access any PASID authorized and translated by the CPU. > > So, the linkage from iommufd is indirect. When iommufd does whatever > to install a PASID in the process's ENQCMD authorization table KVM can > be instructed to link that PASID inside the ENQCMD to a vPASID in the > VM. > > As long as the PASID is in the process table KVM can allow the VM to > use it. > > And it explains how userspace can actually use ENQCMD in a VFIO > scenario with iommufd, where obviously it needs to be in direct > control of what PASID ENQCMD generates and not be tied only to the > PASID associated with the mm_struct. > This reminds me back to the previous ioasid_set concept introduced by Jacob [1]. Let's ignore the implementation detail for a while as lots of logic there don't hold now given the progress of iommufd. But just very high level concept-wise: - Each mm is associated with a set of authorized PASIDs (ioasid_set); - VFIO driver provides a uAPI for userspace to attach a guest virtual PASID (vPASID) to a hw page table in iommufd. In the uAPI: - a physical PASID (pPASID) is allocated and added to mm's ioasid_set; - the pPASID is used to actually attach to the hw page table; - the pPASID is returned to userspace upon successful attach; - KVM provides a uAPI for userspace to map/unmap vPASID to pPASID in CPU PASID translation structure. User-provided pPASID must be found in mm->ioasid_set; In this case the linkage from vfio/iommufd does be indirect. My earlier reply was probably based on a wrong memory that the entire ioasid_set concept was killed when the lengthy discussion in [1] led to the debut of iommufd. Thanks Kevin [1] https://lore.kernel.org/all/1614463286-97618-1-git-send-email-jacob.jun.pan@linux.intel.com/
On Mon, Apr 18, 2022 at 03:56:01AM +0000, Tian, Kevin wrote: > - Each mm is associated with a set of authorized PASIDs (ioasid_set); > - VFIO driver provides a uAPI for userspace to attach a guest virtual > PASID (vPASID) to a hw page table in iommufd. In the uAPI: > - a physical PASID (pPASID) is allocated and added to mm's ioasid_set; > - the pPASID is used to actually attach to the hw page table; > - the pPASID is returned to userspace upon successful attach; > - KVM provides a uAPI for userspace to map/unmap vPASID to pPASID > in CPU PASID translation structure. User-provided pPASID must be > found in mm->ioasid_set; It is more logical to be current->ioasid_set, not mm But yes, something like this. The mdev api may want to mirror the kvm api and map/unmap vPASID to pPASID Jason
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index a4555014bd1e72..93508f6a8beda5 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -2013,6 +2013,25 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg) } EXPORT_SYMBOL_GPL(vfio_external_check_extension); +static const struct vfio_file_ops vfio_file_group_ops = { +}; + +/** + * vfio_file_get_ops - Return a struct of function pointers to use with the file + * @filep: VFIO file to return pointers for + * + * This API exists to allow KVM to call into VFIO without tightly coupling the + * VFIO and KVM modules together. KVM will call this using symbol_get() and from + * then on will call VFIO through the returned function pointers. + */ +const struct vfio_file_ops *vfio_file_get_ops(struct file *filep) +{ + if (filep->f_op != &vfio_group_fops) + return ERR_PTR(-EINVAL); + return &vfio_file_group_ops; +} +EXPORT_SYMBOL_GPL(vfio_file_get_ops); + /* * Sub-module support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 66dda06ec42d1b..409bbf817206cc 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -138,6 +138,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, /* * External user API */ +struct vfio_file_ops { +}; extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device @@ -147,6 +149,7 @@ extern bool vfio_external_group_match_file(struct vfio_group *group, extern int vfio_external_user_iommu_id(struct vfio_group *group); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); +const struct vfio_file_ops *vfio_file_get_ops(struct file *filep); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 7e1793a1f5f1fd..254d8c18378163 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -34,6 +34,7 @@ static void kvm_spapr_tce_release_iommu_group(struct kvm *kvm, struct kvm_vfio_group { struct list_head node; struct file *filp; + const struct vfio_file_ops *ops; struct vfio_group *vfio_group; }; @@ -196,6 +197,7 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev) static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) { + const struct vfio_file_ops *(*fn)(struct file *filep); struct kvm_vfio *kv = dev->private; struct vfio_group *vfio_group; struct kvm_vfio_group *kvg; @@ -221,6 +223,18 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) goto err_unlock; } + fn = symbol_get(vfio_file_get_ops); + if (!fn) { + ret = -EINVAL; + goto err_free; + } + kvg->ops = fn(filp); + symbol_put(vfio_file_get_ops); + if (IS_ERR(kvg->ops)) { + ret = PTR_ERR(kvg->ops); + goto err_free; + } + vfio_group = kvm_vfio_group_get_external_user(filp); if (IS_ERR(vfio_group)) { ret = PTR_ERR(vfio_group);
kvm and VFIO need to be coupled together however neither is willing to tolerate a direct module dependency. Instead when kvm is given a VFIO FD it uses many symbol_get()'s to access VFIO. Provide a single VFIO function vfio_file_get_ops() which validates the given struct file * is a VFIO file and then returns a struct of ops. Following patches will redo each of the symbol_get() calls into an indirection through this ops struct. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --- drivers/vfio/vfio.c | 19 +++++++++++++++++++ include/linux/vfio.h | 3 +++ virt/kvm/vfio.c | 14 ++++++++++++++ 3 files changed, 36 insertions(+)