diff mbox series

[RFC,05/20] vfio/pci: Register device to /dev/vfio/devices

Message ID 20210919063848.1476776-6-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series Introduce /dev/iommu for userspace I/O address space management | expand

Commit Message

Yi Liu Sept. 19, 2021, 6:38 a.m. UTC
This patch exposes the device-centric interface for vfio-pci devices. To
be compatiable with existing users, vfio-pci exposes both legacy group
interface and device-centric interface.

As explained in last patch, this change doesn't apply to devices which
cannot be forced to snoop cache by their upstream iommu. Such devices
are still expected to be opened via the legacy group interface.

When the device is opened via /dev/vfio/devices, vfio-pci should prevent
the user from accessing the assigned device because the device is still
attached to the default domain which may allow user-initiated DMAs to
touch arbitrary place. The user access must be blocked until the device
is later bound to an iommufd (see patch 08). The binding acts as the
contract for putting the device in a security context which ensures user-
initiated DMAs via this device cannot harm the rest of the system.

This patch introduces a vdev->block_access flag for this purpose. It's set
when the device is opened via /dev/vfio/devices and cleared after binding
to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
user access should be blocked or not.

An alternative option is to use a dummy fops when the device is opened and
then switch to the real fops (replace_fops()) after binding. Appreciate
inputs on which option is better.

The legacy group interface doesn't have this problem. Its uAPI requires the
user to first put the device into a security context via container/group
attaching process, before opening the device through the groupfd.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 25 +++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 drivers/vfio/vfio.c                 |  3 ++-
 include/linux/vfio.h                |  1 +
 4 files changed, 27 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Sept. 21, 2021, 4:40 p.m. UTC | #1
On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> This patch exposes the device-centric interface for vfio-pci devices. To
> be compatiable with existing users, vfio-pci exposes both legacy group
> interface and device-centric interface.
> 
> As explained in last patch, this change doesn't apply to devices which
> cannot be forced to snoop cache by their upstream iommu. Such devices
> are still expected to be opened via the legacy group interface.
> 
> When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> the user from accessing the assigned device because the device is still
> attached to the default domain which may allow user-initiated DMAs to
> touch arbitrary place. The user access must be blocked until the device
> is later bound to an iommufd (see patch 08). The binding acts as the
> contract for putting the device in a security context which ensures user-
> initiated DMAs via this device cannot harm the rest of the system.
> 
> This patch introduces a vdev->block_access flag for this purpose. It's set
> when the device is opened via /dev/vfio/devices and cleared after binding
> to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
> user access should be blocked or not.

This should not be in vfio_pci.

AFAIK there is no condition where a vfio driver can work without being
connected to some kind of iommu back end, so the core code should
handle this interlock globally. A vfio driver's ops should not be
callable until the iommu is connected.

The only vfio_pci patch in this series should be adding a new callback
op to take in an iommufd and register the pci_device as a iommufd
device.

Jason
Alex Williamson Sept. 21, 2021, 9:09 p.m. UTC | #2
On Tue, 21 Sep 2021 13:40:01 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> > This patch exposes the device-centric interface for vfio-pci devices. To
> > be compatiable with existing users, vfio-pci exposes both legacy group
> > interface and device-centric interface.
> > 
> > As explained in last patch, this change doesn't apply to devices which
> > cannot be forced to snoop cache by their upstream iommu. Such devices
> > are still expected to be opened via the legacy group interface.

This doesn't make much sense to me.  The previous patch indicates
there's work to be done in updating the kvm-vfio contract to understand
DMA coherency, so you're trying to limit use cases to those where the
IOMMU enforces coherency, but there's QEMU work to be done to support
the iommufd uAPI at all.  Isn't part of that work to understand how KVM
will be told about non-coherent devices rather than "meh, skip it in the
kernel"?  Also let's not forget that vfio is not only for KVM.
 
> > When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> > the user from accessing the assigned device because the device is still
> > attached to the default domain which may allow user-initiated DMAs to
> > touch arbitrary place. The user access must be blocked until the device
> > is later bound to an iommufd (see patch 08). The binding acts as the
> > contract for putting the device in a security context which ensures user-
> > initiated DMAs via this device cannot harm the rest of the system.
> > 
> > This patch introduces a vdev->block_access flag for this purpose. It's set
> > when the device is opened via /dev/vfio/devices and cleared after binding
> > to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
> > user access should be blocked or not.  
> 
> This should not be in vfio_pci.
> 
> AFAIK there is no condition where a vfio driver can work without being
> connected to some kind of iommu back end, so the core code should
> handle this interlock globally. A vfio driver's ops should not be
> callable until the iommu is connected.
> 
> The only vfio_pci patch in this series should be adding a new callback
> op to take in an iommufd and register the pci_device as a iommufd
> device.

Couldn't the same argument be made that registering a $bus device as an
iommufd device is a common interface that shouldn't be the
responsibility of the vfio device driver?  Is userspace opening the
non-group device anything more than a reservation of that device if
access is withheld until iommu isolation?  I also don't really want to
predict how ioctls might evolve to guess whether only blocking .read,
.write, and .mmap callbacks are sufficient.  Thanks,

Alex
Jason Gunthorpe Sept. 21, 2021, 9:58 p.m. UTC | #3
On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote:

> the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> will be told about non-coherent devices rather than "meh, skip it in the
> kernel"?  Also let's not forget that vfio is not only for KVM.

vfio is not only for KVM, but AFIACT the wbinv stuff is only for
KVM... But yes, I agree this should be sorted out at this stage

> > > When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> > > the user from accessing the assigned device because the device is still
> > > attached to the default domain which may allow user-initiated DMAs to
> > > touch arbitrary place. The user access must be blocked until the device
> > > is later bound to an iommufd (see patch 08). The binding acts as the
> > > contract for putting the device in a security context which ensures user-
> > > initiated DMAs via this device cannot harm the rest of the system.
> > > 
> > > This patch introduces a vdev->block_access flag for this purpose. It's set
> > > when the device is opened via /dev/vfio/devices and cleared after binding
> > > to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
> > > user access should be blocked or not.  
> > 
> > This should not be in vfio_pci.
> > 
> > AFAIK there is no condition where a vfio driver can work without being
> > connected to some kind of iommu back end, so the core code should
> > handle this interlock globally. A vfio driver's ops should not be
> > callable until the iommu is connected.
> > 
> > The only vfio_pci patch in this series should be adding a new callback
> > op to take in an iommufd and register the pci_device as a iommufd
> > device.
> 
> Couldn't the same argument be made that registering a $bus device as an
> iommufd device is a common interface that shouldn't be the
> responsibility of the vfio device driver? 

The driver needs enough involvment to signal what kind of IOMMU
connection it wants, eg attaching to a physical device will use the
iofd_attach_device() path, but attaching to a SW page table should use
a different API call. PASID should also be different.

Possibly a good arrangement is to have the core provide some generic
ioctl ops functions 'vfio_all_device_iommufd_bind' that everything
except mdev drivers can use so the code is all duplicated.

> non-group device anything more than a reservation of that device if
> access is withheld until iommu isolation?  I also don't really want to
> predict how ioctls might evolve to guess whether only blocking .read,
> .write, and .mmap callbacks are sufficient.  Thanks,

This is why I said the entire fops should be blocked in a dummy fops
so the core code the vfio_device FD parked and userspace unable to
access the ops until device attachment and thus IOMMU ioslation is
completed.

Simple and easy to reason about, a parked FD is very similar to a
closed FD.

Jason
Tian, Kevin Sept. 22, 2021, 1:19 a.m. UTC | #4
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, September 22, 2021 5:09 AM
> 
> On Tue, 21 Sep 2021 13:40:01 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> > > This patch exposes the device-centric interface for vfio-pci devices. To
> > > be compatiable with existing users, vfio-pci exposes both legacy group
> > > interface and device-centric interface.
> > >
> > > As explained in last patch, this change doesn't apply to devices which
> > > cannot be forced to snoop cache by their upstream iommu. Such devices
> > > are still expected to be opened via the legacy group interface.
> 
> This doesn't make much sense to me.  The previous patch indicates
> there's work to be done in updating the kvm-vfio contract to understand
> DMA coherency, so you're trying to limit use cases to those where the
> IOMMU enforces coherency, but there's QEMU work to be done to support
> the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> will be told about non-coherent devices rather than "meh, skip it in the
> kernel"?  Also let's not forget that vfio is not only for KVM.

The policy here is that VFIO will not expose such devices (no enforce-snoop)
in the new device hierarchy at all. In this case QEMU will fall back to the
group interface automatically and then rely on the existing contract to connect 
vfio and QEMU. It doesn't need to care about the whatever new contract
until such devices are exposed in the new interface.

yes, vfio is not only for KVM. But here it's more a task split based on staging
consideration. imo it's not necessary to further split task into supporting
non-snoop device for userspace driver and then for kvm.
Tian, Kevin Sept. 22, 2021, 1:24 a.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, September 22, 2021 5:59 AM
> 
> On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote:
> 
> > the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> > will be told about non-coherent devices rather than "meh, skip it in the
> > kernel"?  Also let's not forget that vfio is not only for KVM.
> 
> vfio is not only for KVM, but AFIACT the wbinv stuff is only for
> KVM... But yes, I agree this should be sorted out at this stage

If such devices are even not exposed in the new hierarchy at this stage,
suppose sorting it out later should be fine?

> 
> > > > When the device is opened via /dev/vfio/devices, vfio-pci should
> prevent
> > > > the user from accessing the assigned device because the device is still
> > > > attached to the default domain which may allow user-initiated DMAs to
> > > > touch arbitrary place. The user access must be blocked until the device
> > > > is later bound to an iommufd (see patch 08). The binding acts as the
> > > > contract for putting the device in a security context which ensures user-
> > > > initiated DMAs via this device cannot harm the rest of the system.
> > > >
> > > > This patch introduces a vdev->block_access flag for this purpose. It's set
> > > > when the device is opened via /dev/vfio/devices and cleared after
> binding
> > > > to iommufd succeeds. mmap and r/w handlers check this flag to decide
> whether
> > > > user access should be blocked or not.
> > >
> > > This should not be in vfio_pci.
> > >
> > > AFAIK there is no condition where a vfio driver can work without being
> > > connected to some kind of iommu back end, so the core code should
> > > handle this interlock globally. A vfio driver's ops should not be
> > > callable until the iommu is connected.
> > >
> > > The only vfio_pci patch in this series should be adding a new callback
> > > op to take in an iommufd and register the pci_device as a iommufd
> > > device.
> >
> > Couldn't the same argument be made that registering a $bus device as an
> > iommufd device is a common interface that shouldn't be the
> > responsibility of the vfio device driver?
> 
> The driver needs enough involvment to signal what kind of IOMMU
> connection it wants, eg attaching to a physical device will use the
> iofd_attach_device() path, but attaching to a SW page table should use
> a different API call. PASID should also be different.

Exactly

> 
> Possibly a good arrangement is to have the core provide some generic
> ioctl ops functions 'vfio_all_device_iommufd_bind' that everything
> except mdev drivers can use so the code is all duplicated.

Could this be an future enhancement when we have multiple device
types supporting iommufd?

> 
> > non-group device anything more than a reservation of that device if
> > access is withheld until iommu isolation?  I also don't really want to
> > predict how ioctls might evolve to guess whether only blocking .read,
> > .write, and .mmap callbacks are sufficient.  Thanks,
> 
> This is why I said the entire fops should be blocked in a dummy fops
> so the core code the vfio_device FD parked and userspace unable to
> access the ops until device attachment and thus IOMMU ioslation is
> completed.
> 
> Simple and easy to reason about, a parked FD is very similar to a
> closed FD.
> 

This rationale makes sense. Just the open how to handle exclusive
open between group and nongroup interfaces still needs some
more clarification here, especially about what a parked FD means
for the group interface (where parking is unnecessary since the 
security context is already established before the device is opened)

Thanks
Kevin
Alex Williamson Sept. 22, 2021, 9:17 p.m. UTC | #6
On Wed, 22 Sep 2021 01:19:08 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, September 22, 2021 5:09 AM
> > 
> > On Tue, 21 Sep 2021 13:40:01 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:  
> > > > This patch exposes the device-centric interface for vfio-pci devices. To
> > > > be compatiable with existing users, vfio-pci exposes both legacy group
> > > > interface and device-centric interface.
> > > >
> > > > As explained in last patch, this change doesn't apply to devices which
> > > > cannot be forced to snoop cache by their upstream iommu. Such devices
> > > > are still expected to be opened via the legacy group interface.  
> > 
> > This doesn't make much sense to me.  The previous patch indicates
> > there's work to be done in updating the kvm-vfio contract to understand
> > DMA coherency, so you're trying to limit use cases to those where the
> > IOMMU enforces coherency, but there's QEMU work to be done to support
> > the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> > will be told about non-coherent devices rather than "meh, skip it in the
> > kernel"?  Also let's not forget that vfio is not only for KVM.  
> 
> The policy here is that VFIO will not expose such devices (no enforce-snoop)
> in the new device hierarchy at all. In this case QEMU will fall back to the
> group interface automatically and then rely on the existing contract to connect 
> vfio and QEMU. It doesn't need to care about the whatever new contract
> until such devices are exposed in the new interface.
> 
> yes, vfio is not only for KVM. But here it's more a task split based on staging
> consideration. imo it's not necessary to further split task into supporting
> non-snoop device for userspace driver and then for kvm.

Patch 10 introduces an iommufd interface for QEMU to learn whether the
IOMMU enforces DMA coherency, at that point QEMU could revert to the
legacy interface, or register the iommufd with KVM, or otherwise
establish non-coherent DMA with KVM as necessary.  We're adding cruft
to the kernel here to enforce an unnecessary limitation.

If there are reasons the kernel can't support the device interface,
that's a valid reason not to present the interface, but this seems like
picking a specific gap that userspace is already able to detect from
this series at the expense of other use cases.  Thanks,

Alex
Tian, Kevin Sept. 22, 2021, 11:49 p.m. UTC | #7
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, September 23, 2021 5:17 AM
> 
> On Wed, 22 Sep 2021 01:19:08 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, September 22, 2021 5:09 AM
> > >
> > > On Tue, 21 Sep 2021 13:40:01 -0300
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> > > > > This patch exposes the device-centric interface for vfio-pci devices. To
> > > > > be compatiable with existing users, vfio-pci exposes both legacy group
> > > > > interface and device-centric interface.
> > > > >
> > > > > As explained in last patch, this change doesn't apply to devices which
> > > > > cannot be forced to snoop cache by their upstream iommu. Such
> devices
> > > > > are still expected to be opened via the legacy group interface.
> > >
> > > This doesn't make much sense to me.  The previous patch indicates
> > > there's work to be done in updating the kvm-vfio contract to understand
> > > DMA coherency, so you're trying to limit use cases to those where the
> > > IOMMU enforces coherency, but there's QEMU work to be done to
> support
> > > the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> > > will be told about non-coherent devices rather than "meh, skip it in the
> > > kernel"?  Also let's not forget that vfio is not only for KVM.
> >
> > The policy here is that VFIO will not expose such devices (no enforce-snoop)
> > in the new device hierarchy at all. In this case QEMU will fall back to the
> > group interface automatically and then rely on the existing contract to
> connect
> > vfio and QEMU. It doesn't need to care about the whatever new contract
> > until such devices are exposed in the new interface.
> >
> > yes, vfio is not only for KVM. But here it's more a task split based on staging
> > consideration. imo it's not necessary to further split task into supporting
> > non-snoop device for userspace driver and then for kvm.
> 
> Patch 10 introduces an iommufd interface for QEMU to learn whether the
> IOMMU enforces DMA coherency, at that point QEMU could revert to the
> legacy interface, or register the iommufd with KVM, or otherwise
> establish non-coherent DMA with KVM as necessary.  We're adding cruft
> to the kernel here to enforce an unnecessary limitation.
> 
> If there are reasons the kernel can't support the device interface,
> that's a valid reason not to present the interface, but this seems like
> picking a specific gap that userspace is already able to detect from
> this series at the expense of other use cases.  Thanks,
> 

I see your point now. Yes I agree that the kernel cruft is unnecessary
limitation here. The user should rely on the device/iommufd capability
to decide whether non-coherent DMA should go through legacy or
new interface.

Thanks
Kevin
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 318864d52837..145addde983b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -572,6 +572,10 @@  static int vfio_pci_open(struct vfio_device *core_vdev)
 
 		vfio_spapr_pci_eeh_open(vdev->pdev);
 		vfio_pci_vf_token_user_add(vdev, 1);
+		if (!vfio_device_in_container(core_vdev))
+			atomic_set(&vdev->block_access, 1);
+		else
+			atomic_set(&vdev->block_access, 0);
 	}
 	vdev->refcnt++;
 error:
@@ -1374,6 +1378,9 @@  static ssize_t vfio_pci_rw(struct vfio_pci_device *vdev, char __user *buf,
 {
 	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
 
+	if (atomic_read(&vdev->block_access))
+		return -ENODEV;
+
 	if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
 		return -EINVAL;
 
@@ -1640,6 +1647,9 @@  static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
 	u64 phys_len, req_len, pgoff, req_start;
 	int ret;
 
+	if (atomic_read(&vdev->block_access))
+		return -ENODEV;
+
 	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
 
 	if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
@@ -1978,6 +1988,8 @@  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	struct vfio_pci_device *vdev;
 	struct iommu_group *group;
 	int ret;
+	u32 flags;
+	bool snoop = false;
 
 	if (vfio_pci_is_denylisted(pdev))
 		return -EINVAL;
@@ -2046,9 +2058,18 @@  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 	}
 
-	ret = vfio_register_group_dev(&vdev->vdev);
-	if (ret)
+	flags = VFIO_DEVNODE_GROUP;
+	ret = iommu_device_get_info(&pdev->dev,
+				    IOMMU_DEV_INFO_FORCE_SNOOP, &snoop);
+	if (!ret && snoop)
+		flags |= VFIO_DEVNODE_NONGROUP;
+
+	ret = vfio_register_device(&vdev->vdev, flags);
+	if (ret) {
+		pr_debug("Failed to register device interface\n");
 		goto out_power;
+	}
+
 	dev_set_drvdata(&pdev->dev, vdev);
 	return 0;
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 5a36272cecbf..f12012e30b53 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -143,6 +143,7 @@  struct vfio_pci_device {
 	struct mutex		vma_lock;
 	struct list_head	vma_list;
 	struct rw_semaphore	memory_lock;
+	atomic_t		block_access;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1e87b25962f1..22851747e92c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1789,10 +1789,11 @@  static int vfio_device_fops_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
-static bool vfio_device_in_container(struct vfio_device *device)
+bool vfio_device_in_container(struct vfio_device *device)
 {
 	return !!(device->group && device->group->container);
 }
+EXPORT_SYMBOL_GPL(vfio_device_in_container);
 
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9448b751b663..fd0629acb948 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -81,6 +81,7 @@  enum vfio_iommu_notify_type {
 
 extern int vfio_register_device(struct vfio_device *device, u32 flags);
 extern void vfio_unregister_device(struct vfio_device *device);
+extern bool vfio_device_in_container(struct vfio_device *device);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks