mbox series

[RFC,00/12] IOMMUFD Generic interface

Message ID 0-v1-e79cd8d168e8+6-iommufd_jgg@nvidia.com (mailing list archive)
Headers show
Series IOMMUFD Generic interface | expand

Message

Jason Gunthorpe March 18, 2022, 5:27 p.m. UTC
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.

It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.

We see a broad need for extended features, some being highly IOMMU device
specific:
 - Binding iommu_domain's to PASID/SSID
 - Userspace page tables, for ARM, x86 and S390
 - Kernel bypass'd invalidation of user page tables
 - Re-use of the KVM page table in the IOMMU
 - Dirty page tracking in the IOMMU
 - Runtime Increase/Decrease of IOPTE size
 - PRI support with faults resolved in userspace

As well as a need to access these features beyond just VFIO, VDPA for
instance, but other classes of accelerator HW are touching on these areas
now too.

The v1 series proposed re-using the VFIO type 1 data structure, however it
was suggested that if we are doing this big update then we should also
come with a data structure that solves the limitations that VFIO type1
has. Notably this addresses:

 - Multiple IOAS/'containers' and multiple domains inside a single FD

 - Single-pin operation no matter how many domains and containers use
   a page

 - A fine grained locking scheme supporting user managed concurrency for
   multi-threaded map/unmap

 - A pre-registration mechanism to optimize vIOMMU use cases by
   pre-pinning pages

 - Extended ioctl API that can manage these new objects and exposes
   domains directly to user space

 - domains are sharable between subsystems, eg VFIO and VDPA

The bulk of this code is a new data structure design to track how the
IOVAs are mapped to PFNs.

iommufd intends to be general and consumable by any driver that wants to
DMA to userspace. From a driver perspective it can largely be dropped in
in-place of iommu_attach_device() and provides a uniform full feature set
to all consumers.

As this is a larger project this series is the first step. This series
provides the iommfd "generic interface" which is designed to be suitable
for applications like DPDK and VMM flows that are not optimized to
specific HW scenarios. It is close to being a drop in replacement for the
existing VFIO type 1.

This is part two of three for an initial sequence:
 - Move IOMMU Group security into the iommu layer
   https://lore.kernel.org/linux-iommu/20220218005521.172832-1-baolu.lu@linux.intel.com/
 * Generic IOMMUFD implementation
 - VFIO ability to consume IOMMUFD
   An early exploration of this is available here:
    https://github.com/luxis1999/iommufd/commits/iommufd-v5.17-rc6

Various parts of the above extended features are in WIP stages currently
to define how their IOCTL interface should work.

At this point, using the draft VFIO series, unmodified qemu has been
tested to operate using iommufd on x86 and ARM systems.

Several people have contributed directly to this work: Eric Auger, Kevin
Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have participated in the
discussions that lead here, and provided ideas. Thanks to all!

This is on github: https://github.com/jgunthorpe/linux/commits/iommufd

# S390 in-kernel page table walker
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
# AMD Dirty page tracking
Cc: Joao Martins <joao.m.martins@oracle.com>
# ARM SMMU Dirty page tracking
Cc: Keqian Zhu <zhukeqian1@huawei.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
# ARM SMMU nesting
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
# Map/unmap performance
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
# VDPA
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
# Power
Cc: David Gibson <david@gibson.dropbear.id.au>
# vfio
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
# iommu
Cc: iommu@lists.linux-foundation.org
# Collaborators
Cc: "Chaitanya Kulkarni" <chaitanyak@nvidia.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (11):
  interval-tree: Add a utility to iterate over spans in an interval tree
  iommufd: File descriptor, context, kconfig and makefiles
  kernel/user: Allow user::locked_vm to be usable for iommufd
  iommufd: PFN handling for iopt_pages
  iommufd: Algorithms for PFN storage
  iommufd: Data structure to provide IOVA to PFN mapping
  iommufd: IOCTLs for the io_pagetable
  iommufd: Add a HW pagetable object
  iommufd: Add kAPI toward external drivers
  iommufd: vfio container FD ioctl compatibility
  iommufd: Add a selftest

Kevin Tian (1):
  iommufd: Overview documentation

 Documentation/userspace-api/index.rst         |    1 +
 .../userspace-api/ioctl/ioctl-number.rst      |    1 +
 Documentation/userspace-api/iommufd.rst       |  224 +++
 MAINTAINERS                                   |   10 +
 drivers/iommu/Kconfig                         |    1 +
 drivers/iommu/Makefile                        |    2 +-
 drivers/iommu/iommufd/Kconfig                 |   22 +
 drivers/iommu/iommufd/Makefile                |   13 +
 drivers/iommu/iommufd/device.c                |  274 ++++
 drivers/iommu/iommufd/hw_pagetable.c          |  142 ++
 drivers/iommu/iommufd/io_pagetable.c          |  890 +++++++++++
 drivers/iommu/iommufd/io_pagetable.h          |  170 +++
 drivers/iommu/iommufd/ioas.c                  |  252 ++++
 drivers/iommu/iommufd/iommufd_private.h       |  231 +++
 drivers/iommu/iommufd/iommufd_test.h          |   65 +
 drivers/iommu/iommufd/main.c                  |  346 +++++
 drivers/iommu/iommufd/pages.c                 | 1321 +++++++++++++++++
 drivers/iommu/iommufd/selftest.c              |  495 ++++++
 drivers/iommu/iommufd/vfio_compat.c           |  401 +++++
 include/linux/interval_tree.h                 |   41 +
 include/linux/iommufd.h                       |   50 +
 include/linux/sched/user.h                    |    2 +-
 include/uapi/linux/iommufd.h                  |  223 +++
 kernel/user.c                                 |    1 +
 lib/interval_tree.c                           |   98 ++
 tools/testing/selftests/Makefile              |    1 +
 tools/testing/selftests/iommu/.gitignore      |    2 +
 tools/testing/selftests/iommu/Makefile        |   11 +
 tools/testing/selftests/iommu/config          |    2 +
 tools/testing/selftests/iommu/iommufd.c       | 1225 +++++++++++++++
 30 files changed, 6515 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/userspace-api/iommufd.rst
 create mode 100644 drivers/iommu/iommufd/Kconfig
 create mode 100644 drivers/iommu/iommufd/Makefile
 create mode 100644 drivers/iommu/iommufd/device.c
 create mode 100644 drivers/iommu/iommufd/hw_pagetable.c
 create mode 100644 drivers/iommu/iommufd/io_pagetable.c
 create mode 100644 drivers/iommu/iommufd/io_pagetable.h
 create mode 100644 drivers/iommu/iommufd/ioas.c
 create mode 100644 drivers/iommu/iommufd/iommufd_private.h
 create mode 100644 drivers/iommu/iommufd/iommufd_test.h
 create mode 100644 drivers/iommu/iommufd/main.c
 create mode 100644 drivers/iommu/iommufd/pages.c
 create mode 100644 drivers/iommu/iommufd/selftest.c
 create mode 100644 drivers/iommu/iommufd/vfio_compat.c
 create mode 100644 include/linux/iommufd.h
 create mode 100644 include/uapi/linux/iommufd.h
 create mode 100644 tools/testing/selftests/iommu/.gitignore
 create mode 100644 tools/testing/selftests/iommu/Makefile
 create mode 100644 tools/testing/selftests/iommu/config
 create mode 100644 tools/testing/selftests/iommu/iommufd.c


base-commit: d1c716ed82a6bf4c35ba7be3741b9362e84cd722

Comments

Eric Auger April 12, 2022, 8:13 p.m. UTC | #1
Hi,

On 3/18/22 6:27 PM, Jason Gunthorpe wrote:
> iommufd is the user API to control the IOMMU subsystem as it relates to
> managing IO page tables that point at user space memory.
>
> It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
> container) which is the VFIO specific interface for a similar idea.
>
> We see a broad need for extended features, some being highly IOMMU device
> specific:
>  - Binding iommu_domain's to PASID/SSID
>  - Userspace page tables, for ARM, x86 and S390
>  - Kernel bypass'd invalidation of user page tables
>  - Re-use of the KVM page table in the IOMMU
>  - Dirty page tracking in the IOMMU
>  - Runtime Increase/Decrease of IOPTE size
>  - PRI support with faults resolved in userspace

This series does not have any concept of group fds anymore and the API
is device oriented.
I have a question wrt pci bus reset capability.

8b27ee60bfd6 ("vfio-pci: PCI hot reset interface")
introduced VFIO_DEVICE_PCI_GET_HOT_RESET_INFO and VFIO_DEVICE_PCI_HOT_RESET

Maybe we can reuse VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to retrieve the devices and iommu groups that need to be checked and involved in the bus reset. If I understand correctly we now need to make sure the devices are handled in the same security context (bound to the same iommufd)

however VFIO_DEVICE_PCI_HOT_RESET operate on a collection of group fds.

How do you see the porting of this functionality onto /dev/iommu?

Thanks

Eric




>
> As well as a need to access these features beyond just VFIO, VDPA for
> instance, but other classes of accelerator HW are touching on these areas
> now too.
>
> The v1 series proposed re-using the VFIO type 1 data structure, however it
> was suggested that if we are doing this big update then we should also
> come with a data structure that solves the limitations that VFIO type1
> has. Notably this addresses:
>
>  - Multiple IOAS/'containers' and multiple domains inside a single FD
>
>  - Single-pin operation no matter how many domains and containers use
>    a page
>
>  - A fine grained locking scheme supporting user managed concurrency for
>    multi-threaded map/unmap
>
>  - A pre-registration mechanism to optimize vIOMMU use cases by
>    pre-pinning pages
>
>  - Extended ioctl API that can manage these new objects and exposes
>    domains directly to user space
>
>  - domains are sharable between subsystems, eg VFIO and VDPA
>
> The bulk of this code is a new data structure design to track how the
> IOVAs are mapped to PFNs.
>
> iommufd intends to be general and consumable by any driver that wants to
> DMA to userspace. From a driver perspective it can largely be dropped in
> in-place of iommu_attach_device() and provides a uniform full feature set
> to all consumers.
>
> As this is a larger project this series is the first step. This series
> provides the iommfd "generic interface" which is designed to be suitable
> for applications like DPDK and VMM flows that are not optimized to
> specific HW scenarios. It is close to being a drop in replacement for the
> existing VFIO type 1.
>
> This is part two of three for an initial sequence:
>  - Move IOMMU Group security into the iommu layer
>    https://lore.kernel.org/linux-iommu/20220218005521.172832-1-baolu.lu@linux.intel.com/
>  * Generic IOMMUFD implementation
>  - VFIO ability to consume IOMMUFD
>    An early exploration of this is available here:
>     https://github.com/luxis1999/iommufd/commits/iommufd-v5.17-rc6
>
> Various parts of the above extended features are in WIP stages currently
> to define how their IOCTL interface should work.
>
> At this point, using the draft VFIO series, unmodified qemu has been
> tested to operate using iommufd on x86 and ARM systems.
>
> Several people have contributed directly to this work: Eric Auger, Kevin
> Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have participated in the
> discussions that lead here, and provided ideas. Thanks to all!
>
> This is on github: https://github.com/jgunthorpe/linux/commits/iommufd
>
> # S390 in-kernel page table walker
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Matthew Rosato <mjrosato@linux.ibm.com>
> # AMD Dirty page tracking
> Cc: Joao Martins <joao.m.martins@oracle.com>
> # ARM SMMU Dirty page tracking
> Cc: Keqian Zhu <zhukeqian1@huawei.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> # ARM SMMU nesting
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> # Map/unmap performance
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> # VDPA
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> # Power
> Cc: David Gibson <david@gibson.dropbear.id.au>
> # vfio
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: kvm@vger.kernel.org
> # iommu
> Cc: iommu@lists.linux-foundation.org
> # Collaborators
> Cc: "Chaitanya Kulkarni" <chaitanyak@nvidia.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> Jason Gunthorpe (11):
>   interval-tree: Add a utility to iterate over spans in an interval tree
>   iommufd: File descriptor, context, kconfig and makefiles
>   kernel/user: Allow user::locked_vm to be usable for iommufd
>   iommufd: PFN handling for iopt_pages
>   iommufd: Algorithms for PFN storage
>   iommufd: Data structure to provide IOVA to PFN mapping
>   iommufd: IOCTLs for the io_pagetable
>   iommufd: Add a HW pagetable object
>   iommufd: Add kAPI toward external drivers
>   iommufd: vfio container FD ioctl compatibility
>   iommufd: Add a selftest
>
> Kevin Tian (1):
>   iommufd: Overview documentation
>
>  Documentation/userspace-api/index.rst         |    1 +
>  .../userspace-api/ioctl/ioctl-number.rst      |    1 +
>  Documentation/userspace-api/iommufd.rst       |  224 +++
>  MAINTAINERS                                   |   10 +
>  drivers/iommu/Kconfig                         |    1 +
>  drivers/iommu/Makefile                        |    2 +-
>  drivers/iommu/iommufd/Kconfig                 |   22 +
>  drivers/iommu/iommufd/Makefile                |   13 +
>  drivers/iommu/iommufd/device.c                |  274 ++++
>  drivers/iommu/iommufd/hw_pagetable.c          |  142 ++
>  drivers/iommu/iommufd/io_pagetable.c          |  890 +++++++++++
>  drivers/iommu/iommufd/io_pagetable.h          |  170 +++
>  drivers/iommu/iommufd/ioas.c                  |  252 ++++
>  drivers/iommu/iommufd/iommufd_private.h       |  231 +++
>  drivers/iommu/iommufd/iommufd_test.h          |   65 +
>  drivers/iommu/iommufd/main.c                  |  346 +++++
>  drivers/iommu/iommufd/pages.c                 | 1321 +++++++++++++++++
>  drivers/iommu/iommufd/selftest.c              |  495 ++++++
>  drivers/iommu/iommufd/vfio_compat.c           |  401 +++++
>  include/linux/interval_tree.h                 |   41 +
>  include/linux/iommufd.h                       |   50 +
>  include/linux/sched/user.h                    |    2 +-
>  include/uapi/linux/iommufd.h                  |  223 +++
>  kernel/user.c                                 |    1 +
>  lib/interval_tree.c                           |   98 ++
>  tools/testing/selftests/Makefile              |    1 +
>  tools/testing/selftests/iommu/.gitignore      |    2 +
>  tools/testing/selftests/iommu/Makefile        |   11 +
>  tools/testing/selftests/iommu/config          |    2 +
>  tools/testing/selftests/iommu/iommufd.c       | 1225 +++++++++++++++
>  30 files changed, 6515 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/userspace-api/iommufd.rst
>  create mode 100644 drivers/iommu/iommufd/Kconfig
>  create mode 100644 drivers/iommu/iommufd/Makefile
>  create mode 100644 drivers/iommu/iommufd/device.c
>  create mode 100644 drivers/iommu/iommufd/hw_pagetable.c
>  create mode 100644 drivers/iommu/iommufd/io_pagetable.c
>  create mode 100644 drivers/iommu/iommufd/io_pagetable.h
>  create mode 100644 drivers/iommu/iommufd/ioas.c
>  create mode 100644 drivers/iommu/iommufd/iommufd_private.h
>  create mode 100644 drivers/iommu/iommufd/iommufd_test.h
>  create mode 100644 drivers/iommu/iommufd/main.c
>  create mode 100644 drivers/iommu/iommufd/pages.c
>  create mode 100644 drivers/iommu/iommufd/selftest.c
>  create mode 100644 drivers/iommu/iommufd/vfio_compat.c
>  create mode 100644 include/linux/iommufd.h
>  create mode 100644 include/uapi/linux/iommufd.h
>  create mode 100644 tools/testing/selftests/iommu/.gitignore
>  create mode 100644 tools/testing/selftests/iommu/Makefile
>  create mode 100644 tools/testing/selftests/iommu/config
>  create mode 100644 tools/testing/selftests/iommu/iommufd.c
>
>
> base-commit: d1c716ed82a6bf4c35ba7be3741b9362e84cd722
Jason Gunthorpe April 12, 2022, 8:22 p.m. UTC | #2
On Tue, Apr 12, 2022 at 10:13:32PM +0200, Eric Auger wrote:
> Hi,
> 
> On 3/18/22 6:27 PM, Jason Gunthorpe wrote:
> > iommufd is the user API to control the IOMMU subsystem as it relates to
> > managing IO page tables that point at user space memory.
> >
> > It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
> > container) which is the VFIO specific interface for a similar idea.
> >
> > We see a broad need for extended features, some being highly IOMMU device
> > specific:
> >  - Binding iommu_domain's to PASID/SSID
> >  - Userspace page tables, for ARM, x86 and S390
> >  - Kernel bypass'd invalidation of user page tables
> >  - Re-use of the KVM page table in the IOMMU
> >  - Dirty page tracking in the IOMMU
> >  - Runtime Increase/Decrease of IOPTE size
> >  - PRI support with faults resolved in userspace
> 
> This series does not have any concept of group fds anymore and the API
> is device oriented.
> I have a question wrt pci bus reset capability.
> 
> 8b27ee60bfd6 ("vfio-pci: PCI hot reset interface")
> introduced VFIO_DEVICE_PCI_GET_HOT_RESET_INFO and VFIO_DEVICE_PCI_HOT_RESET
> 
> Maybe we can reuse VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to retrieve the devices and iommu groups that need to be checked and involved in the bus reset. If I understand correctly we now need to make sure the devices are handled in the same security context (bound to the same iommufd)
> 
> however VFIO_DEVICE_PCI_HOT_RESET operate on a collection of group fds.
> 
> How do you see the porting of this functionality onto /dev/iommu?

I already made a patch that converts VFIO_DEVICE_PCI_HOT_RESET to work
on a generic notion of a file and the underlying infrastructure to
allow it to accept either a device or group fd.

Same for the similar issue in KVM.

It is part of three VFIO series I will be posting. First is up here:

https://lore.kernel.org/kvm/0-v1-a8faf768d202+125dd-vfio_mdev_no_group_jgg@nvidia.com/

Overall the strategy is to contain the vfio_group as an internal detail
of vfio.ko and external interfaces use either a struct vfio_device *
or a struct file *

Jason
Eric Auger April 12, 2022, 8:50 p.m. UTC | #3
Hi Jason,

On 4/12/22 10:22 PM, Jason Gunthorpe wrote:
> On Tue, Apr 12, 2022 at 10:13:32PM +0200, Eric Auger wrote:
>> Hi,
>>
>> On 3/18/22 6:27 PM, Jason Gunthorpe wrote:
>>> iommufd is the user API to control the IOMMU subsystem as it relates to
>>> managing IO page tables that point at user space memory.
>>>
>>> It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
>>> container) which is the VFIO specific interface for a similar idea.
>>>
>>> We see a broad need for extended features, some being highly IOMMU device
>>> specific:
>>>  - Binding iommu_domain's to PASID/SSID
>>>  - Userspace page tables, for ARM, x86 and S390
>>>  - Kernel bypass'd invalidation of user page tables
>>>  - Re-use of the KVM page table in the IOMMU
>>>  - Dirty page tracking in the IOMMU
>>>  - Runtime Increase/Decrease of IOPTE size
>>>  - PRI support with faults resolved in userspace
>> This series does not have any concept of group fds anymore and the API
>> is device oriented.
>> I have a question wrt pci bus reset capability.
>>
>> 8b27ee60bfd6 ("vfio-pci: PCI hot reset interface")
>> introduced VFIO_DEVICE_PCI_GET_HOT_RESET_INFO and VFIO_DEVICE_PCI_HOT_RESET
>>
>> Maybe we can reuse VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to retrieve the devices and iommu groups that need to be checked and involved in the bus reset. If I understand correctly we now need to make sure the devices are handled in the same security context (bound to the same iommufd)
>>
>> however VFIO_DEVICE_PCI_HOT_RESET operate on a collection of group fds.
>>
>> How do you see the porting of this functionality onto /dev/iommu?
> I already made a patch that converts VFIO_DEVICE_PCI_HOT_RESET to work
> on a generic notion of a file and the underlying infrastructure to
> allow it to accept either a device or group fd.
>
> Same for the similar issue in KVM.
>
> It is part of three VFIO series I will be posting. First is up here:
>
> https://lore.kernel.org/kvm/0-v1-a8faf768d202+125dd-vfio_mdev_no_group_jgg@nvidia.com/
>
> Overall the strategy is to contain the vfio_group as an internal detail
> of vfio.ko and external interfaces use either a struct vfio_device *
> or a struct file *
Thank you for the quick reply. Yi and I will look at this series. I
guess we won't support the bus reset functionality in our first QEMU
porting onto /dev/iommu until that code stabilizes.

Eric
>
> Jason
>
Yi Liu April 14, 2022, 10:56 a.m. UTC | #4
On 2022/3/19 01:27, Jason Gunthorpe wrote:
> iommufd is the user API to control the IOMMU subsystem as it relates to
> managing IO page tables that point at user space memory.
> 
> It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
> container) which is the VFIO specific interface for a similar idea.
> 
> We see a broad need for extended features, some being highly IOMMU device
> specific:
>   - Binding iommu_domain's to PASID/SSID
>   - Userspace page tables, for ARM, x86 and S390
>   - Kernel bypass'd invalidation of user page tables
>   - Re-use of the KVM page table in the IOMMU
>   - Dirty page tracking in the IOMMU
>   - Runtime Increase/Decrease of IOPTE size
>   - PRI support with faults resolved in userspace
> 
> As well as a need to access these features beyond just VFIO, VDPA for
> instance, but other classes of accelerator HW are touching on these areas
> now too.
> 
> The v1 series proposed re-using the VFIO type 1 data structure, however it
> was suggested that if we are doing this big update then we should also
> come with a data structure that solves the limitations that VFIO type1
> has. Notably this addresses:
> 
>   - Multiple IOAS/'containers' and multiple domains inside a single FD
> 
>   - Single-pin operation no matter how many domains and containers use
>     a page
> 
>   - A fine grained locking scheme supporting user managed concurrency for
>     multi-threaded map/unmap
> 
>   - A pre-registration mechanism to optimize vIOMMU use cases by
>     pre-pinning pages
> 
>   - Extended ioctl API that can manage these new objects and exposes
>     domains directly to user space
> 
>   - domains are sharable between subsystems, eg VFIO and VDPA
> 
> The bulk of this code is a new data structure design to track how the
> IOVAs are mapped to PFNs.
> 
> iommufd intends to be general and consumable by any driver that wants to
> DMA to userspace. From a driver perspective it can largely be dropped in
> in-place of iommu_attach_device() and provides a uniform full feature set
> to all consumers.
> 
> As this is a larger project this series is the first step. This series
> provides the iommfd "generic interface" which is designed to be suitable
> for applications like DPDK and VMM flows that are not optimized to
> specific HW scenarios. It is close to being a drop in replacement for the
> existing VFIO type 1.
> 
> This is part two of three for an initial sequence:
>   - Move IOMMU Group security into the iommu layer
>     https://lore.kernel.org/linux-iommu/20220218005521.172832-1-baolu.lu@linux.intel.com/
>   * Generic IOMMUFD implementation
>   - VFIO ability to consume IOMMUFD
>     An early exploration of this is available here:
>      https://github.com/luxis1999/iommufd/commits/iommufd-v5.17-rc6

Eric Auger and me have posted a QEMU rfc based on this branch.

https://lore.kernel.org/kvm/20220414104710.28534-1-yi.l.liu@intel.com/