Message ID | 20250107142719.179636-9-yilun.xu@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Private MMIO support for private assigned dev | expand |
On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > for private assignment. For these private assigned devices, disallow > host accessing their MMIO resources. Why? Shouldn't the VMM simply not call mmap? Why does the kernel have to enforce this? Jason
On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > for private assignment. For these private assigned devices, disallow > > host accessing their MMIO resources. > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > to enforce this? MM.. maybe I should not say 'host', instead 'userspace'. I think the kernel part VMM (KVM) has the responsibility to enforce the correct behavior of the userspace part VMM (QEMU). QEMU has no way to touch private memory/MMIO intentionally or accidently. IIUC that's one of the initiative guest_memfd is introduced for private memory. Private MMIO follows. Thanks, Yilun > > Jason
On Thu, Jan 09, 2025 at 12:57:58AM +0800, Xu Yilun wrote: > On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > > for private assignment. For these private assigned devices, disallow > > > host accessing their MMIO resources. > > > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > > to enforce this? > > MM.. maybe I should not say 'host', instead 'userspace'. > > I think the kernel part VMM (KVM) has the responsibility to enforce the > correct behavior of the userspace part VMM (QEMU). QEMU has no way to > touch private memory/MMIO intentionally or accidently. IIUC that's one > of the initiative guest_memfd is introduced for private memory. Private > MMIO follows. Okay, but then why is it a flag like that? I'm expecting a much broader system here to make the VFIO device into a confidential device (like setup the TDI) where we'd have to enforce the private things, communicate with some secure world to assign it, and so on. I want to see a fuller solution to the CC problem in VFIO before we can be sure what is the correct UAPI. In other words, make the VFIO device into a CC device should also prevent mmaping it and so on. So, I would take this out and defer VFIO enforcment to a series which does fuller CC enablement of VFIO. The precursor work should just be avoiding requiring a VMA when installing VFIO MMIO into the KVM and IOMMU stage 2 mappings. Ie by using a FD to get the CPU pfns into iommufd and kvm as you are showing. This works just fine for non-CC devices anyhow and is the necessary building block for making a TDI interface in VFIO. Jason
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..919285c1cd7a 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -75,7 +75,10 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, if (copy_from_user(&bind, arg, minsz)) return -EFAULT; - if (bind.argsz < minsz || bind.flags || bind.iommufd < 0) + if (bind.argsz < minsz || bind.iommufd < 0) + return -EINVAL; + + if (bind.flags & ~(VFIO_DEVICE_BIND_IOMMUFD_PRIVATE)) return -EINVAL; /* BIND_IOMMUFD only allowed for cdev fds */ @@ -118,6 +121,9 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, goto out_close_device; device->cdev_opened = true; + if (bind.flags & VFIO_DEVICE_BIND_IOMMUFD_PRIVATE) + device->is_private = true; + /* * Paired with smp_load_acquire() in vfio_device_fops::ioctl/ * read/write/mmap @@ -151,6 +157,7 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) return; mutex_lock(&device->dev_set->lock); + device->is_private = false; vfio_df_close(df); vfio_device_put_kvm(device); iommufd_ctx_put(df->iommufd); diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index f69eda5956ad..11c735dfe1f7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1005,6 +1005,12 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev, return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; } +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar) +{ + /* Any mmap supported bar can be used as vfio dmabuf */ + return vdev->bar_mmap_supported[bar] && vdev->vdev.is_private; +} + static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, struct vfio_region_info __user *arg) { @@ -1035,6 +1041,11 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, break; } + if (is_vfio_pci_bar_private(vdev, info.index)) { + info.flags = VFIO_REGION_INFO_FLAG_PRIVATE; + break; + } + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; if (vdev->bar_mmap_supported[info.index]) { @@ -1735,6 +1746,9 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma u64 phys_len, req_len, pgoff, req_start; int ret; + if (vdev->vdev.is_private) + return -EINVAL; + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index d27f383f3931..2b61e35145fd 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -126,4 +126,6 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, } #endif +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar); + #endif diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 66b72c289284..e385f7f63414 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -242,6 +242,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, struct resource *res = &vdev->pdev->resource[bar]; ssize_t done; + if (is_vfio_pci_bar_private(vdev, bar)) + return -EINVAL; + if (pci_resource_start(pdev, bar)) end = pci_resource_len(pdev, bar); else if (bar == PCI_ROM_RESOURCE && diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2258b0585330..e99d856c6cd8 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -69,6 +69,7 @@ struct vfio_device { struct iommufd_device *iommufd_device; u8 iommufd_attached:1; #endif + u8 is_private:1; u8 cdev_opened:1; #ifdef CONFIG_DEBUG_FS /* diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index f43dfbde7352..6a1c703e3185 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -275,6 +275,7 @@ struct vfio_region_info { #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ #define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */ +#define VFIO_REGION_INFO_FLAG_PRIVATE (1 << 4) /* Region supports private MMIO */ __u32 index; /* Region index */ __u32 cap_offset; /* Offset within info struct of first cap */ __aligned_u64 size; /* Region size (bytes) */ @@ -904,7 +905,8 @@ struct vfio_device_feature { * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18, * struct vfio_device_bind_iommufd) * @argsz: User filled size of this data. - * @flags: Must be 0. + * @flags: Optional device initialization flags: + * VFIO_DEVICE_BIND_IOMMUFD_PRIVATE: for private assignment * @iommufd: iommufd to bind. * @out_devid: The device id generated by this bind. devid is a handle for * this device/iommufd bond and can be used in IOMMUFD commands. @@ -921,6 +923,7 @@ struct vfio_device_feature { struct vfio_device_bind_iommufd { __u32 argsz; __u32 flags; +#define VFIO_DEVICE_BIND_IOMMUFD_PRIVATE (1 << 0) __s32 iommufd; __u32 out_devid; };
Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as for private assignment. For these private assigned devices, disallow host accessing their MMIO resources. Since the MMIO regions for private assignment are not accessible from host, remove the VFIO_REGION_INFO_FLAG_MMAP/READ/WRITE for these regions, instead add a new VFIO_REGION_INFO_FLAG_PRIVATE flag to indicate users should create dma-buf for MMIO mapping in KVM MMU. Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- drivers/vfio/device_cdev.c | 9 ++++++++- drivers/vfio/pci/vfio_pci_core.c | 14 ++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 ++ drivers/vfio/pci/vfio_pci_rdwr.c | 3 +++ include/linux/vfio.h | 1 + include/uapi/linux/vfio.h | 5 ++++- 6 files changed, 32 insertions(+), 2 deletions(-)