diff mbox series

[v1,04/10] iommufd/viommu: Allow drivers to control vdev_id lifecycle

Message ID 3ab48d4978318938911d91833654b296947668ad.1724777091.git.nicolinc@nvidia.com (mailing list archive)
State New, archived
Headers show
Series iommufd: Add VIOMMU infrastructure (Part-2 VIRQ) | expand

Commit Message

Nicolin Chen Aug. 27, 2024, 5:02 p.m. UTC
The iommufd core provides a lookup helper for an IOMMU driver to find a
device pointer by device's per-viommu virtual ID. Yet a driver may need
an inverted lookup to find a device's per-viommu virtual ID by a device
pointer, e.g. when reporting virtual IRQs/events back to the user space.
In this case, it'd be unsafe for iommufd core to do an inverted lookup,
as the driver can't track the lifecycle of a viommu object or a vdev_id
object.

Meanwhile, some HW can even support virtual device ID lookup by its HW-
accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to
execute vanilla guest-issued SMMU commands containing virtual Stream ID
but requires software to configure a link between virtual Stream ID and
physical Stream ID via HW registers. So not only the iommufd core needs
a vdev_id lookup table, drivers will want one too.

Given the two justifications above, it's the best practice to provide a
a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver
can implement them to control a vdev_id's lifecycle, and configure the
HW properly if required.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          |  2 ++
 drivers/iommu/iommufd/iommufd_private.h |  6 ------
 drivers/iommu/iommufd/viommu.c          | 23 +++++++++++++++++++----
 include/linux/iommufd.h                 | 13 +++++++++++++
 4 files changed, 34 insertions(+), 10 deletions(-)

Comments

Jason Gunthorpe Sept. 5, 2024, 6:01 p.m. UTC | #1
On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
> The iommufd core provides a lookup helper for an IOMMU driver to find a
> device pointer by device's per-viommu virtual ID. Yet a driver may need
> an inverted lookup to find a device's per-viommu virtual ID by a device
> pointer, e.g. when reporting virtual IRQs/events back to the user space.
> In this case, it'd be unsafe for iommufd core to do an inverted lookup,
> as the driver can't track the lifecycle of a viommu object or a vdev_id
> object.
> 
> Meanwhile, some HW can even support virtual device ID lookup by its HW-
> accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to
> execute vanilla guest-issued SMMU commands containing virtual Stream ID
> but requires software to configure a link between virtual Stream ID and
> physical Stream ID via HW registers. So not only the iommufd core needs
> a vdev_id lookup table, drivers will want one too.
> 
> Given the two justifications above, it's the best practice to provide a
> a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver
> can implement them to control a vdev_id's lifecycle, and configure the
> HW properly if required.

I think the lifecycle rules should be much simpler.

If a nested domain is attached to a STE/RID/device then the vIOMMU
affiliated with that nested domain is pinned while the STE is in place

So the driver only need to provide locking around attach changing the
STE's vIOMMU vs async operations translating from a STE to a
vIOMMU. This can be a simple driver lock of some kind, ie a rwlock
across the STE table.

Generally that is how all the async events should work, go from the
STE to the VIOMMU to a iommufd callback to the iommufd event
queue. iommufd will translate the struct device from the driver to an
idev_id (or maybe even a vdevid) the same basic way the PRI code works

Need to check the attach struct lifecycle carefully

Jason
Nicolin Chen Oct. 8, 2024, 5:39 p.m. UTC | #2
Sorry for the late reply. Just sat down and started to look at
this series.

On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
> > The iommufd core provides a lookup helper for an IOMMU driver to find a
> > device pointer by device's per-viommu virtual ID. Yet a driver may need
> > an inverted lookup to find a device's per-viommu virtual ID by a device
> > pointer, e.g. when reporting virtual IRQs/events back to the user space.
> > In this case, it'd be unsafe for iommufd core to do an inverted lookup,
> > as the driver can't track the lifecycle of a viommu object or a vdev_id
> > object.
> > 
> > Meanwhile, some HW can even support virtual device ID lookup by its HW-
> > accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to
> > execute vanilla guest-issued SMMU commands containing virtual Stream ID
> > but requires software to configure a link between virtual Stream ID and
> > physical Stream ID via HW registers. So not only the iommufd core needs
> > a vdev_id lookup table, drivers will want one too.
> > 
> > Given the two justifications above, it's the best practice to provide a
> > a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver
> > can implement them to control a vdev_id's lifecycle, and configure the
> > HW properly if required.
> 
> I think the lifecycle rules should be much simpler.
> 
> If a nested domain is attached to a STE/RID/device then the vIOMMU
> affiliated with that nested domain is pinned while the STE is in place
> 
> So the driver only need to provide locking around attach changing the
> STE's vIOMMU vs async operations translating from a STE to a
> vIOMMU. This can be a simple driver lock of some kind, ie a rwlock
> across the STE table.

I was worried about the async between a vdev link (idev<=>vIOMMU)
and an regular attach link (idev<=>nested_domain).

Though practically the vdev link wouldn't break until the regular
attach link breaks first, it was still not safe from it happening.
So, having a master->lock to protect master->vdev could ensure.

Now, with the vdev being an object refcounting idev/vIOMMU objs,
I think we can simply pin the vdev in iommufd_hw_pagetable_attach
to ensure the vdev won't disappear. Then, a driver wouldn't need
to worry about that. [1]

Meanwhile, a nested_domain pins the vIOMMU object in the iommufd
level upon allocation. [2]

So, what's missing seems to be the attach between the master->dev
and the nested_domain itself [3]

      idev <----------- vdev [1] ---------> vIOMMU
      (dev) ---[3]--> nested_domain ----[2]-----^

A lock, protecting the attach(), which is in parallel with iommu
core's group->mutex could help I think?

> Generally that is how all the async events should work, go from the
> STE to the VIOMMU to a iommufd callback to the iommufd event
> queue. iommufd will translate the struct device from the driver to an
> idev_id (or maybe even a vdevid) the same basic way the PRI code works

My first attempt was putting the vdev into the attach_handle that
was created for PRI code. Yet later on I found the links were too
long and some of them weren't safe (perhaps with the new vdev and
those pins mentioned above, it worth another try). So letting the
driver hold the vdev itself became much simpler.

Thanks
Nicolin
Nicolin Chen Oct. 23, 2024, 7:22 a.m. UTC | #3
On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
> > The iommufd core provides a lookup helper for an IOMMU driver to find a
> > device pointer by device's per-viommu virtual ID. Yet a driver may need
> > an inverted lookup to find a device's per-viommu virtual ID by a device
> > pointer, e.g. when reporting virtual IRQs/events back to the user space.
> > In this case, it'd be unsafe for iommufd core to do an inverted lookup,
> > as the driver can't track the lifecycle of a viommu object or a vdev_id
> > object.
> > 
> > Meanwhile, some HW can even support virtual device ID lookup by its HW-
> > accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to
> > execute vanilla guest-issued SMMU commands containing virtual Stream ID
> > but requires software to configure a link between virtual Stream ID and
> > physical Stream ID via HW registers. So not only the iommufd core needs
> > a vdev_id lookup table, drivers will want one too.
> > 
> > Given the two justifications above, it's the best practice to provide a
> > a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver
> > can implement them to control a vdev_id's lifecycle, and configure the
> > HW properly if required.
> 
> I think the lifecycle rules should be much simpler.
> 
> If a nested domain is attached to a STE/RID/device then the vIOMMU
> affiliated with that nested domain is pinned while the STE is in place
> 
> So the driver only need to provide locking around attach changing the
> STE's vIOMMU vs async operations translating from a STE to a
> vIOMMU. This can be a simple driver lock of some kind, ie a rwlock
> across the STE table.
> 
> Generally that is how all the async events should work, go from the
> STE to the VIOMMU to a iommufd callback to the iommufd event
> queue. iommufd will translate the struct device from the driver to an
> idev_id (or maybe even a vdevid) the same basic way the PRI code works

I am trying to draft something following this, and here is what
it would look like:

------------------------draft---------------------------
struct arm_smmu_master {
	....
+	struct rw_semaphore lock;
+	struct arm_vsmmu *vsmmu;
	....
};

->attach_dev() {
	down_write(&master->lock);
	if (domain->type == IOMMU_DOMAIN_NESTED)
		master->vsmmu = to_smmu_domain(domain)->vsmmu;
	else
		master->vsmmu = NULL;
	up_write(&master->lock);
}

isr() {
	down_read(&master->lock);
	if (master->vsmmu) {
		xa_lock(&master->vsmmu->core.vdevs);
		vdev = iommufd_viommu_dev_to_vdev(&master->vsmmu->core,
						  master->dev);
		if (vdev) {
			struct iommu_virq_arm_smmuv3 virq_data = evt;

			virq_data.evt[0] &= ~EVTQ_0_SID;
			virq_data.evt[0] |= FIELD_PREP(EVTQ_0_SID, vdev->id);
			return iommufd_viommu_report_irq(
					vdev->viommu,
					IOMMU_VIRQ_TYPE_ARM_SMMUV3, &virq_data,
					sizeof(virq_data));
		} else {
			rc = -ENOENT;
		}
		xa_unlock(&master->vsmmu->core.vdevs);
	}
	up_read(&master->lock);
}
--------------------------------------------------------

[Comparison]      | This v1                 | Draft
1. Adds to master | A lock and vdev ptr     | A lock and viommu ptr
2. Set/unset ptr  | In ->vdevice_alloc/free | In all ->attach_dev
3. Do dev_to_vdev | master->vdev->id        | attach_handle?

Both solutions needs a driver-level lock and an extra pointer in
the master structure. And both ISR routines require that driver-
level lock to avoid race against attach_dev v.s. vdev alloc/free.
Overall, taking step.3 into consideration, it seems that letting
master lock&hold the vdev pointer (i.e. this v1) is simpler?

As for the implementation of iommufd_viommu_dev_to_vdev(), I read
the attach_handle part in the PRI code, yet I don't see the lock
that protects the handle returned by iommu_attach_handle_get() in
iommu_report_device_fault/find_fault_handler().

And I see the kdoc of iommu_attach_handle_get() mentioning:
  "Callers are required to synchronize the call of
   iommu_attach_handle_get() with domain attachment
   and detachment. The attach handle can only be used
   during its life cycle."
But the caller iommu_report_device_fault() is an async event that
cannot guarantee the lifecycle. Would you please shed some light?

Thanks
Nicolin
Jason Gunthorpe Oct. 23, 2024, 4:59 p.m. UTC | #4
On Wed, Oct 23, 2024 at 12:22:15AM -0700, Nicolin Chen wrote:
> On Thu, Sep 05, 2024 at 03:01:19PM -0300, Jason Gunthorpe wrote:
> > On Tue, Aug 27, 2024 at 10:02:06AM -0700, Nicolin Chen wrote:
> > > The iommufd core provides a lookup helper for an IOMMU driver to find a
> > > device pointer by device's per-viommu virtual ID. Yet a driver may need
> > > an inverted lookup to find a device's per-viommu virtual ID by a device
> > > pointer, e.g. when reporting virtual IRQs/events back to the user space.
> > > In this case, it'd be unsafe for iommufd core to do an inverted lookup,
> > > as the driver can't track the lifecycle of a viommu object or a vdev_id
> > > object.
> > > 
> > > Meanwhile, some HW can even support virtual device ID lookup by its HW-
> > > accelerated virtualization feature. E.g. Tegra241 CMDQV HW supports to
> > > execute vanilla guest-issued SMMU commands containing virtual Stream ID
> > > but requires software to configure a link between virtual Stream ID and
> > > physical Stream ID via HW registers. So not only the iommufd core needs
> > > a vdev_id lookup table, drivers will want one too.
> > > 
> > > Given the two justifications above, it's the best practice to provide a
> > > a pair of set_vdev_id/unset_vdev_id ops in the viommu ops, so a driver
> > > can implement them to control a vdev_id's lifecycle, and configure the
> > > HW properly if required.
> > 
> > I think the lifecycle rules should be much simpler.
> > 
> > If a nested domain is attached to a STE/RID/device then the vIOMMU
> > affiliated with that nested domain is pinned while the STE is in place
> > 
> > So the driver only need to provide locking around attach changing the
> > STE's vIOMMU vs async operations translating from a STE to a
> > vIOMMU. This can be a simple driver lock of some kind, ie a rwlock
> > across the STE table.
> > 
> > Generally that is how all the async events should work, go from the
> > STE to the VIOMMU to a iommufd callback to the iommufd event
> > queue. iommufd will translate the struct device from the driver to an
> > idev_id (or maybe even a vdevid) the same basic way the PRI code works
> 
> I am trying to draft something following this, and here is what
> it would look like:
> 
> ------------------------draft---------------------------
> struct arm_smmu_master {
> 	....
> +	struct rw_semaphore lock;

It would be called vsmmu_lock

> +	struct arm_vsmmu *vsmmu;
> 	....
> };
> 
> ->attach_dev() {
> 	down_write(&master->lock);
> 	if (domain->type == IOMMU_DOMAIN_NESTED)
> 		master->vsmmu = to_smmu_domain(domain)->vsmmu;
> 	else
> 		master->vsmmu = NULL;
> 	up_write(&master->lock);
> }
> 
> isr() {
> 	down_read(&master->lock);
> 	if (master->vsmmu) {
> 		xa_lock(&master->vsmmu->core.vdevs);
> 		vdev = iommufd_viommu_dev_to_vdev(&master->vsmmu->core,
> 						  master->dev);
> 		if (vdev) {
> 			struct iommu_virq_arm_smmuv3 virq_data = evt;
> 
> 			virq_data.evt[0] &= ~EVTQ_0_SID;
> 			virq_data.evt[0] |= FIELD_PREP(EVTQ_0_SID, vdev->id);
> 			return iommufd_viommu_report_irq(
> 					vdev->viommu,
> 					IOMMU_VIRQ_TYPE_ARM_SMMUV3, &virq_data,
> 					sizeof(virq_data));
> 		} else {
> 			rc = -ENOENT;
> 		}
> 		xa_unlock(&master->vsmmu->core.vdevs);
> 	}
> 	up_read(&master->lock);
> }
> --------------------------------------------------------

This looks reasonable

> [Comparison]      | This v1                 | Draft
> 1. Adds to master | A lock and vdev ptr     | A lock and viommu ptr
> 2. Set/unset ptr  | In ->vdevice_alloc/free | In all ->attach_dev
> 3. Do dev_to_vdev | master->vdev->id        | attach_handle?

The set/unset ops have the major issue that they can get out of sync
with the domain. The only time things should be routed to the viommu
is when a viommu related domain is attached.

The lock on attach can be reduced:

  iommu_group_mutex_assert(dev)
  if (domain->type == IOMMU_DOMAIN_NESTED)
 		new_vsmmu = to_smmu_domain(domain)->vsmmu;
  else
 		new_vsmmu = NULL;
  if (new_vsmmu != master->vsmmu) {
 	down_write(&master->lock);
	master->vsmmu = new_vsmmu;
	up_write(&master->lock);
  }

And you'd stick this in prepare or commit..

> Both solutions needs a driver-level lock and an extra pointer in
> the master structure. And both ISR routines require that driver-
> level lock to avoid race against attach_dev v.s. vdev alloc/free.
> Overall, taking step.3 into consideration, it seems that letting
> master lock&hold the vdev pointer (i.e. this v1) is simpler?

I'm not sure the vdev pointer should even be visible to the drivers..

> As for the implementation of iommufd_viommu_dev_to_vdev(), I read
> the attach_handle part in the PRI code, yet I don't see the lock
> that protects the handle returned by iommu_attach_handle_get() in
> iommu_report_device_fault/find_fault_handler().

It is the xa_lock and some rules about flushing irqs and work queues
before allowing the dev to disappear:

>   "Callers are required to synchronize the call of
>    iommu_attach_handle_get() with domain attachment
>    and detachment. The attach handle can only be used
>    during its life cycle."

> But the caller iommu_report_device_fault() is an async event that
> cannot guarantee the lifecycle. Would you please shed some light?

The iopf detatch function will act as a barrirer to ensure that all
the async work has completed, sort of like how RCU works.

But here, I think it is pretty simple, isn't it?

When you update the master->vsmmu you can query the vsmmu to get the
vdev id of that master, then store it in the master struct and forward
it to the iommufd_viommu_report_irq(). That could even search the
xarray since attach is not a performance path.

Then it is locked under the master->lock

Jason
Nicolin Chen Oct. 23, 2024, 6:54 p.m. UTC | #5
On Wed, Oct 23, 2024 at 01:59:05PM -0300, Jason Gunthorpe wrote:
> > [Comparison]      | This v1                 | Draft
> > 1. Adds to master | A lock and vdev ptr     | A lock and viommu ptr
> > 2. Set/unset ptr  | In ->vdevice_alloc/free | In all ->attach_dev
> > 3. Do dev_to_vdev | master->vdev->id        | attach_handle?
> 
> The set/unset ops have the major issue that they can get out of sync
> with the domain. The only time things should be routed to the viommu
> is when a viommu related domain is attached.

Ah, I missed that point.

> The lock on attach can be reduced:
> 
>   iommu_group_mutex_assert(dev)
>   if (domain->type == IOMMU_DOMAIN_NESTED)
>  		new_vsmmu = to_smmu_domain(domain)->vsmmu;
>   else
>  		new_vsmmu = NULL;
>   if (new_vsmmu != master->vsmmu) {
>  	down_write(&master->lock);
> 	master->vsmmu = new_vsmmu;
> 	up_write(&master->lock);
>   }
> 
> And you'd stick this in prepare or commit..

Ack.

> > Both solutions needs a driver-level lock and an extra pointer in
> > the master structure. And both ISR routines require that driver-
> > level lock to avoid race against attach_dev v.s. vdev alloc/free.
> > Overall, taking step.3 into consideration, it seems that letting
> > master lock&hold the vdev pointer (i.e. this v1) is simpler?
> 
> I'm not sure the vdev pointer should even be visible to the drivers..

With the iommufd_vdevice_alloc allocator, we already expose the
vdevice structure to the drivers, along with the vdevice_alloc
vdevice_free ops, which would be easier for the vCMDQ driver to
allocate and hold its own pSID/vSID structure.

And vsid_to_psid() requires to look up the viommu->vdevs xarray.
If we hid the vDEVICE structure, we'd need something else than
the vdev_to_dev(). Maybe iommufd_viommu_find_dev_by_virt_id()?

> > As for the implementation of iommufd_viommu_dev_to_vdev(), I read
> > the attach_handle part in the PRI code, yet I don't see the lock
> > that protects the handle returned by iommu_attach_handle_get() in
> > iommu_report_device_fault/find_fault_handler().
> 
> It is the xa_lock and some rules about flushing irqs and work queues
> before allowing the dev to disappear:
> 
> >   "Callers are required to synchronize the call of
> >    iommu_attach_handle_get() with domain attachment
> >    and detachment. The attach handle can only be used
> >    during its life cycle."
> 
> > But the caller iommu_report_device_fault() is an async event that
> > cannot guarantee the lifecycle. Would you please shed some light?
> 
> The iopf detatch function will act as a barrirer to ensure that all
> the async work has completed, sort of like how RCU works.

The xa_lock(&group->pasid_array) is released once the handle is
returned to the iommu_attach_handle_get callers, so it protects
only for a very short window (T0 below). What if:
   | detach()                       | isr=>iommu_report_device_fault()
T0 | Get attach_handle [xa_lock]    | Get attach_handle [xa_lock]
T1 | Clean deliver Q [fault->mutex] | Waiting for fault->mutex
T2 | iommufd_eventq_iopf_disable()  | Add new fault to the deliver Q
T3 | kfree(handle)                  | ?? 

> But here, I think it is pretty simple, isn't it?
> 
> When you update the master->vsmmu you can query the vsmmu to get the
> vdev id of that master, then store it in the master struct and forward
> it to the iommufd_viommu_report_irq(). That could even search the
> xarray since attach is not a performance path.
> 
> Then it is locked under the master->lock

Yes! I didn't see that coming. vdev->id must be set before the
attach to a nested domain, and can be cleaned after the device
detaches. Maybe an attach to vIOMMU-based nested domain should
just fail if idev->vdev isn't ready?

Then perhaps we can have a struct arm_smmu_vstream to hold all
the things, such as vsmmu pointer and vdev->id. If vCMDQ needs
an extra structure, it can be stored there too.

Thanks!
Nicolin
Jason Gunthorpe Oct. 28, 2024, 12:58 p.m. UTC | #6
On Wed, Oct 23, 2024 at 11:54:54AM -0700, Nicolin Chen wrote:
> > The iopf detatch function will act as a barrirer to ensure that all
> > the async work has completed, sort of like how RCU works.
> 
> The xa_lock(&group->pasid_array) is released once the handle is
> returned to the iommu_attach_handle_get callers, so it protects
> only for a very short window (T0 below). What if:
>    | detach()                       | isr=>iommu_report_device_fault()
> T0 | Get attach_handle [xa_lock]    | Get attach_handle [xa_lock]
> T1 | Clean deliver Q [fault->mutex] | Waiting for fault->mutex
> T2 | iommufd_eventq_iopf_disable()  | Add new fault to the deliver Q
> T3 | kfree(handle)                  | ?? 

Prior to iommufd_eventq_iopf_disable() the driver has to ensure the
threads calling isr->iommu_report_device_fault() are fenced.

New threads that start running cannot see the attach_handle() because
it is not in the xarray anymore. Old threads are completed because of
the fence.

> > But here, I think it is pretty simple, isn't it?
> > 
> > When you update the master->vsmmu you can query the vsmmu to get the
> > vdev id of that master, then store it in the master struct and forward
> > it to the iommufd_viommu_report_irq(). That could even search the
> > xarray since attach is not a performance path.
> > 
> > Then it is locked under the master->lock
> 
> Yes! I didn't see that coming. vdev->id must be set before the
> attach to a nested domain, and can be cleaned after the device
> detaches. Maybe an attach to vIOMMU-based nested domain should
> just fail if idev->vdev isn't ready?

That would make sense

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 3ad759971b32..01bb5c9f415b 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -145,6 +145,8 @@  void iommufd_device_destroy(struct iommufd_object *obj)
 		old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL,
 				 GFP_KERNEL);
 		WARN_ON(old != vdev_id);
+		if (vdev_id->viommu->ops && vdev_id->viommu->ops->unset_vdev_id)
+			vdev_id->viommu->ops->unset_vdev_id(vdev_id);
 		kfree(vdev_id);
 		idev->vdev_id = NULL;
 	}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index be1f1813672e..4cb1555991b8 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -621,12 +621,6 @@  struct iommufd_viommu {
 	unsigned int type;
 };
 
-struct iommufd_vdev_id {
-	struct iommufd_viommu *viommu;
-	struct iommufd_device *idev;
-	u64 id;
-};
-
 static inline struct iommufd_viommu *
 iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id)
 {
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 9adc9c62ada9..b1eb900b7fbf 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -13,6 +13,8 @@  void iommufd_viommu_destroy(struct iommufd_object *obj)
 
 	xa_for_each(&viommu->vdev_ids, index, vdev_id) {
 		/* Unlocked since there should be no race in a destroy() */
+		if (viommu->ops && viommu->ops->unset_vdev_id)
+			viommu->ops->unset_vdev_id(vdev_id);
 		vdev_id->idev->vdev_id = NULL;
 		kfree(vdev_id);
 	}
@@ -116,10 +118,18 @@  int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd)
 		goto out_unlock_igroup;
 	}
 
-	vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL);
-	if (!vdev_id) {
-		rc = -ENOMEM;
-		goto out_unlock_igroup;
+	if (viommu->ops && viommu->ops->set_vdev_id) {
+		vdev_id = viommu->ops->set_vdev_id(viommu, idev->dev, cmd->vdev_id);
+		if (IS_ERR(vdev_id)) {
+			rc = PTR_ERR(vdev_id);
+			goto out_unlock_igroup;
+		}
+	} else {
+		vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL);
+		if (!vdev_id) {
+			rc = -ENOMEM;
+			goto out_unlock_igroup;
+		}
 	}
 
 	vdev_id->idev = idev;
@@ -137,6 +147,8 @@  int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd)
 	goto out_unlock_igroup;
 
 out_free:
+	if (viommu->ops && viommu->ops->unset_vdev_id)
+		viommu->ops->unset_vdev_id(vdev_id);
 	kfree(vdev_id);
 out_unlock_igroup:
 	mutex_unlock(&idev->igroup->lock);
@@ -185,6 +197,9 @@  int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd)
 		rc = xa_err(old);
 		goto out_unlock_igroup;
 	}
+
+	if (viommu->ops && viommu->ops->unset_vdev_id)
+		viommu->ops->unset_vdev_id(old);
 	kfree(old);
 	idev->vdev_id = NULL;
 
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index f7c265c6de7c..c89583c7c792 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -56,8 +56,18 @@  void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
+struct iommufd_vdev_id {
+	struct iommufd_viommu *viommu;
+	struct iommufd_device *idev;
+	u64 id;
+};
+
 /**
  * struct iommufd_viommu_ops - viommu specific operations
+ * @set_vdev_id: Set a virtual device id for a device assigned to a viommu.
+ *               Driver allocates an iommufd_vdev_id and return its pointer.
+ * @unset_vdev_id: Unset a virtual device id for a device assigned to a viommu.
+ *                 iommufd core frees the memory pointed by an iommufd_vdev_id.
  * @cache_invalidate: Flush hardware cache used by a viommu. It can be used for
  *                    any IOMMU hardware specific cache as long as a viommu has
  *                    enough information to identify it: for example, a VMID or
@@ -69,6 +79,9 @@  void iommufd_ctx_get(struct iommufd_ctx *ictx);
  *                    include/uapi/linux/iommufd.h
  */
 struct iommufd_viommu_ops {
+	struct iommufd_vdev_id *(*set_vdev_id)(struct iommufd_viommu *viommu,
+					       struct device *dev, u64 id);
+	void (*unset_vdev_id)(struct iommufd_vdev_id *vdev_id);
 	int (*cache_invalidate)(struct iommufd_viommu *viommu,
 				struct iommu_user_data_array *array);
 };