mbox series

[v2,00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Message ID cover.1724776335.git.nicolinc@nvidia.com (mailing list archive)
Headers show
Series iommufd: Add VIOMMU infrastructure (Part-1) | expand

Message

Nicolin Chen Aug. 27, 2024, 4:59 p.m. UTC
This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.

The new VIOMMU object is an additional layer, between the nested HWPT and
its parent HWPT, to give to both the IOMMUFD core and an IOMMU driver an
additional structure to support HW-accelerated feature:
                     ----------------------------
 ----------------    |         |  paging_hwpt0  |
 | hwpt_nested0 |--->| viommu0 ------------------
 ----------------    |         | HW-accel feats |
                     ----------------------------

On a multi-IOMMU system, the VIOMMU object can be instanced to the number
of vIOMMUs in a guest VM, while holding the same parent HWPT to share the
stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
                     ----------------------------
 ----------------    |         |  paging_hwpt0  |
 | hwpt_nested0 |--->| viommu0 ------------------
 ----------------    |         |     VMID0      |
                     ----------------------------
                     ----------------------------
 ----------------    |         |  paging_hwpt0  |
 | hwpt_nested1 |--->| viommu1 ------------------
 ----------------    |         |     VMID1      |
                     ----------------------------

As an initial part-1, add ioctls to support a VIOMMU-based invalidation:
    IOMMUFD_CMD_VIOMMU_ALLOC to allocate a VIOMMU object
    IOMMUFD_CMD_VIOMMU_SET/UNSET_VDEV_ID to set/clear device's virtual ID
    (Resue IOMMUFD_CMD_HWPT_INVALIDATE for a VIOMMU object to flush cache
     by a given driver data)

Worth noting that the VDEV_ID is for a per-VIOMMU device list for drivers
to look up the device's physical instance from its virtual ID in a VM. It
is essential for a VIOMMU-based invalidation where the request contains a
device's virtual ID for its device cache flush, e.g. ATC invalidation.

As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed VIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
provide some viommu helpers to drivers for VDEV_ID translation and parent
domain lookup. Add VIOMMU invalidation support to ARM SMMUv3 driver for a
real world use case. This adds supports of arm-smmuv-v3's CMDQ_OP_ATC_INV
and CMDQ_OP_CFGI_CD/ALL commands, supplementing HWPT-based invalidations.

In the future, drivers will also be able to choose a driver-managed type
to hold its own structure by adding a new type to enum iommu_viommu_type.
More VIOMMU-based structures and ioctls will be introduced in part-2/3 to
support a driver-managed VIOMMU, e.g. VQUEUE object for a HW accelerated
queue, VIRQ (or VEVENT) object for IRQ injections. Although we repurposed
the VIOMMU object from an earlier RFC discussion, for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v2
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p1-v2

Changelog
v2
 * Limited vdev_id to one per idev
 * Added a rw_sem to protect the vdev_id list
 * Reworked driver-level APIs with proper lockings
 * Added a new viommu_api file for IOMMUFD_DRIVER config
 * Dropped useless iommu_dev point from the viommu structure
 * Added missing index numnbers to new types in the uAPI header
 * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
 * Reworked mock_viommu_cache_invalidate() using the new iommu helper
 * Reordered details of set/unset_vdev_id handlers for proper lockings
 * Added arm_smmu_cache_invalidate_user patch from Jason's nesting series
v1
 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/

Thanks!
Nicolin

Jason Gunthorpe (3):
  iommu: Add iommu_copy_struct_from_full_user_array helper
  iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
  iommu/arm-smmu-v3: Update comments about ATS and bypass

Nicolin Chen (16):
  iommufd: Reorder struct forward declarations
  iommufd/viommu: Add IOMMUFD_OBJ_VIOMMU and IOMMU_VIOMMU_ALLOC ioctl
  iommu: Pass in a viommu pointer to domain_alloc_user op
  iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
  iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
  iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl
  iommufd/selftest: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID test coverage
  iommufd/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT
  iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
  iommufd/viommu: Add vdev_id helpers for IOMMU drivers
  iommufd/selftest: Add mock_viommu_invalidate_user op
  iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
  iommufd/selftest: Add VIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
  iommufd/viommu: Add iommufd_viommu_to_parent_domain helper
  iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  iommu/arm-smmu-v3: Add arm_smmu_viommu_cache_invalidate

 drivers/iommu/amd/iommu.c                     |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 218 ++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   3 +
 drivers/iommu/intel/iommu.c                   |   1 +
 drivers/iommu/iommufd/Makefile                |   5 +-
 drivers/iommu/iommufd/device.c                |  12 +
 drivers/iommu/iommufd/hw_pagetable.c          |  59 +++-
 drivers/iommu/iommufd/iommufd_private.h       |  37 +++
 drivers/iommu/iommufd/iommufd_test.h          |  30 ++
 drivers/iommu/iommufd/main.c                  |  12 +
 drivers/iommu/iommufd/selftest.c              | 101 ++++++-
 drivers/iommu/iommufd/viommu.c                | 196 +++++++++++++
 drivers/iommu/iommufd/viommu_api.c            |  53 ++++
 include/linux/iommu.h                         |  56 +++-
 include/linux/iommufd.h                       |  51 +++-
 include/uapi/linux/iommufd.h                  | 117 +++++++-
 tools/testing/selftests/iommu/iommufd.c       | 259 +++++++++++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h | 126 +++++++++
 18 files changed, 1299 insertions(+), 38 deletions(-)
 create mode 100644 drivers/iommu/iommufd/viommu.c
 create mode 100644 drivers/iommu/iommufd/viommu_api.c

Comments

Tian, Kevin Sept. 11, 2024, 6:12 a.m. UTC | #1
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, August 28, 2024 1:00 AM
> 
[...]
> On a multi-IOMMU system, the VIOMMU object can be instanced to the
> number
> of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> the

Is there restriction that multiple vIOMMU objects can be only created
on a multi-IOMMU system?

> stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:

this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
entire context it actually means the physical 'VMID' allocated on the
associated physical IOMMU, correct?
Nicolin Chen Sept. 11, 2024, 7:08 a.m. UTC | #2
On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, August 28, 2024 1:00 AM
> >
> [...]
> > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > number
> > of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> > the
> 
> Is there restriction that multiple vIOMMU objects can be only created
> on a multi-IOMMU system?

I think it should be generally restricted to the number of pIOMMUs,
although likely (not 100% sure) we could do multiple vIOMMUs on a
single-pIOMMU system. Any reason for doing that?

> > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> 
> this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> entire context it actually means the physical 'VMID' allocated on the
> associated physical IOMMU, correct?

Quoting Jason's narratives, a VMID is a "Security namespace for
guest owned ID". The allocation, using SMMU as an example, should
be a part of vIOMMU instance allocation in the host SMMU driver.
Then, this VMID will be used to mark the cache tags. So, it is
still a software allocated ID, while HW would use it too.

Thanks
Nicolin
Tian, Kevin Sept. 11, 2024, 7:18 a.m. UTC | #3
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, September 11, 2024 3:08 PM
> 
> On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, August 28, 2024 1:00 AM
> > >
> > [...]
> > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > number
> > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> share
> > > the
> >
> > Is there restriction that multiple vIOMMU objects can be only created
> > on a multi-IOMMU system?
> 
> I think it should be generally restricted to the number of pIOMMUs,
> although likely (not 100% sure) we could do multiple vIOMMUs on a
> single-pIOMMU system. Any reason for doing that?

No idea. But if you stated so then there will be code to enforce it e.g.
failing the attempt to create a vIOMMU object on a pIOMMU to which
another vIOMMU object is already linked?

> 
> > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> own
> > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> >
> > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > entire context it actually means the physical 'VMID' allocated on the
> > associated physical IOMMU, correct?
> 
> Quoting Jason's narratives, a VMID is a "Security namespace for
> guest owned ID". The allocation, using SMMU as an example, should

the VMID alone is not a namespace. It's one ID to tag another namespace.

> be a part of vIOMMU instance allocation in the host SMMU driver.
> Then, this VMID will be used to mark the cache tags. So, it is
> still a software allocated ID, while HW would use it too.
> 

VMIDs are physical resource belonging to the host SMMU driver.

but I got your original point that it's each vIOMMU gets an unique VMID
from the host SMMU driver, not exactly that each vIOMMU maintains
its own VMID namespace. that'd be a different concept.
Nicolin Chen Sept. 11, 2024, 7:40 a.m. UTC | #4
On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, September 11, 2024 3:08 PM
> >
> > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > >
> > > [...]
> > > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > > number
> > > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> > share
> > > > the
> > >
> > > Is there restriction that multiple vIOMMU objects can be only created
> > > on a multi-IOMMU system?
> >
> > I think it should be generally restricted to the number of pIOMMUs,
> > although likely (not 100% sure) we could do multiple vIOMMUs on a
> > single-pIOMMU system. Any reason for doing that?
> 
> No idea. But if you stated so then there will be code to enforce it e.g.
> failing the attempt to create a vIOMMU object on a pIOMMU to which
> another vIOMMU object is already linked?

Yea, I can do that.

> > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > own
> > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > >
> > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > entire context it actually means the physical 'VMID' allocated on the
> > > associated physical IOMMU, correct?
> >
> > Quoting Jason's narratives, a VMID is a "Security namespace for
> > guest owned ID". The allocation, using SMMU as an example, should
> 
> the VMID alone is not a namespace. It's one ID to tag another namespace.
> 
> > be a part of vIOMMU instance allocation in the host SMMU driver.
> > Then, this VMID will be used to mark the cache tags. So, it is
> > still a software allocated ID, while HW would use it too.
> >
> 
> VMIDs are physical resource belonging to the host SMMU driver.

Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
the guest.

> but I got your original point that it's each vIOMMU gets an unique VMID
> from the host SMMU driver, not exactly that each vIOMMU maintains
> its own VMID namespace. that'd be a different concept.

What's a VMID namespace actually? Please educate me :)

Thanks
Nicolin
Tian, Kevin Sept. 11, 2024, 8:08 a.m. UTC | #5
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, September 11, 2024 3:41 PM
> 
> On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, September 11, 2024 3:08 PM
> > >
> > > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > >
> > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > > own
> > > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > > >
> > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > > entire context it actually means the physical 'VMID' allocated on the
> > > > associated physical IOMMU, correct?
> > >
> > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > guest owned ID". The allocation, using SMMU as an example, should
> >
> > the VMID alone is not a namespace. It's one ID to tag another namespace.
> >
> > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > Then, this VMID will be used to mark the cache tags. So, it is
> > > still a software allocated ID, while HW would use it too.
> > >
> >
> > VMIDs are physical resource belonging to the host SMMU driver.
> 
> Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> the guest.
> 
> > but I got your original point that it's each vIOMMU gets an unique VMID
> > from the host SMMU driver, not exactly that each vIOMMU maintains
> > its own VMID namespace. that'd be a different concept.
> 
> What's a VMID namespace actually? Please educate me :)
> 

I meant the 16bit VMID pool under each SMMU.
Nicolin Chen Sept. 11, 2024, 8:21 p.m. UTC | #6
On Wed, Sep 11, 2024 at 08:08:04AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, September 11, 2024 3:41 PM
> >
> > On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, September 11, 2024 3:08 PM
> > > >
> > > > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > > >
> > > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > > > own
> > > > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > > > >
> > > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > > > entire context it actually means the physical 'VMID' allocated on the
> > > > > associated physical IOMMU, correct?
> > > >
> > > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > > guest owned ID". The allocation, using SMMU as an example, should
> > >
> > > the VMID alone is not a namespace. It's one ID to tag another namespace.
> > >
> > > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > > Then, this VMID will be used to mark the cache tags. So, it is
> > > > still a software allocated ID, while HW would use it too.
> > > >
> > >
> > > VMIDs are physical resource belonging to the host SMMU driver.
> >
> > Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> > the guest.
> >
> > > but I got your original point that it's each vIOMMU gets an unique VMID
> > > from the host SMMU driver, not exactly that each vIOMMU maintains
> > > its own VMID namespace. that'd be a different concept.
> >
> > What's a VMID namespace actually? Please educate me :)
> >
> 
> I meant the 16bit VMID pool under each SMMU.

I see. Makes sense now.

Thanks
Nicolin