diff mbox

[RFC,24/30] iommu: Specify PASID state when unbinding a task

Message ID 20170227195441.5170-25-jean-philippe.brucker@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jean-Philippe Brucker Feb. 27, 2017, 7:54 p.m. UTC
Provide a way for device drivers to tell the IOMMU driver about the state
of the PASID they are trying to decommission. When unbinding a task from a
device, the IOMMU driver needs to know whether it can immediately reuse
the PASID for another task, or if there is additional work to be done
before the PASID is safe to re-use.

One hard requirement when calling unbind is that the associated PASID is
not present in any transaction downstream of the IOMMU anymore. In other
words, any read, write, page requests referring to this PASID has
finished.

For PCIe, this means that the driver has successfully executed the
device-specific stop request mechanism described in 6.20.1 (Managing PASID
TLP Prefix Usage). In particular:

* device doesn't issue any new request for this PASID,
* all non-posted requests for this PASID have been completed,
* all posted requests for this PASID (addressing host memory) have been
  flushed to the host.

Address Translation Requests are non-posted, and PRI Page Requests (PPR)
are posted. In addition with PRI, device must implement one of the
following mechanism (ATS spec 4.1.2. - Managing PASID TLP Prefix Usage):

A. Finish transmitting any PPR affecting this PASID and wait for their
   response. In this case, the IOMMU driver can safely reuse the PASID and
   must not wait for a Stop Marker.

B. Finish transmitting any PPR affecting this PASID and send a Stop
   Marker. The driver must wait to receive a Stop Marker for this PASID
   before reusing it.

This patch lets the driver communicate the current state of the PASID with
either IOMMU_PASID_FLUSHED for case A, or IOMMU_PASID_CLEAN for case B.

It is an important distinction because, if the IOMMU driver reassigns a
PASID while the IOMMU still holds pending PPR targeting that PASID
internally, the PPR will trigger a fault in the wrong address space.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c |  8 ++++++++
 include/linux/iommu.h | 18 +++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

Comments

Joerg Roedel March 22, 2017, 3:44 p.m. UTC | #1
On Mon, Feb 27, 2017 at 07:54:35PM +0000, Jean-Philippe Brucker wrote:
> It is an important distinction because, if the IOMMU driver reassigns a
> PASID while the IOMMU still holds pending PPR targeting that PASID
> internally, the PPR will trigger a fault in the wrong address space.

The IOMMU driver also controls a devices apbility to issue PPR requests
(at least on PCI), so it already knows whether a device has still
requests pending or if it even can create new ones.

Furhter, the IOMMU driver can already wait for all pending faults to be
processed before it shuts down a PASID. So it is not clear to me why the
device driver needs to be involved here.

When the device driver issues a PASID-unbind call the iommu driver
just waits until all pending faults are processed, answers new faults
with INVALID, then switch off the devices capability to issue new
faults, and then release the PASID.



	Joerg
Jean-Philippe Brucker March 22, 2017, 6:31 p.m. UTC | #2
On 22/03/17 15:44, Joerg Roedel wrote:
> On Mon, Feb 27, 2017 at 07:54:35PM +0000, Jean-Philippe Brucker wrote:
>> It is an important distinction because, if the IOMMU driver reassigns a
>> PASID while the IOMMU still holds pending PPR targeting that PASID
>> internally, the PPR will trigger a fault in the wrong address space.
> 
> The IOMMU driver also controls a devices apbility to issue PPR requests
> (at least on PCI), so it already knows whether a device has still
> requests pending or if it even can create new ones.

Apart from resetting the PRI capability, the SMMU doesn't have any control
over the device's PPR requests, so we simply mandate that the caller did
the required work to stop issuing them before calling iommu_unbind.

> Furhter, the IOMMU driver can already wait for all pending faults to be
> processed before it shuts down a PASID. So it is not clear to me why the
> device driver needs to be involved here.

The problem might be too tied to the specifics of the SMMU. As implemented
in this series, the normal flow for a PPR with the SMMU is the following:

(1) PCI device issues a PPR for PASID 1
(2) The PPR is queued by the SMMU in the (hardware) PRI queue
(3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
    to a software work queue.
(4) The PPR is finally handled and a PRI response is sent to the device.

The case that worries me is if someone unbinds PASID 1 between (2) and
(3), while the PPR is still in the hardware queue, and immediately binds
it to a new address space.

Then (3) and (4) happen, the PPR is handled and the fault is for the new
address space. It's certainly undesirable, but I don't know if it could be
exploited. We don't kill the task for an unhandled fault at the moment,
simply report a failed PPR to the device, so I might be worrying for nothing.

Having the caller tell us if PPRs might still be pending in the hardware
PRI queue ensures that the SMMU driver waits until it's entirely safe:

* If the device has no outstanding PPR, PASID can be reallocated
* If the device has outstanding PPRs, wait for a Stop Marker, or drain
  the PRI queue after a while (if the Stop Marker was lost in a PRI queue
  overflow).

Draining the PRI queue is very costly, we need to block the PRI thread to
inspect the queue, risking an overflow. And with these PASID state flags
we avoid flushing any queue.

But since the problem seems too centered around the SMMU, I might just
drop this patch along with the CLEAN/FLUSHED flags in my next version, and
go with the full-drain solution. After all, unbind should be a fairly rare
event.

Thanks,
Jean-Philippe

> When the device driver issues a PASID-unbind call the iommu driver
> just waits until all pending faults are processed, answers new faults
> with INVALID, then switch off the devices capability to issue new
> faults, and then release the PASID.
>
Joerg Roedel March 22, 2017, 10:53 p.m. UTC | #3
On Wed, Mar 22, 2017 at 06:31:01PM +0000, Jean-Philippe Brucker wrote:
> The problem might be too tied to the specifics of the SMMU. As implemented
> in this series, the normal flow for a PPR with the SMMU is the following:
> 
> (1) PCI device issues a PPR for PASID 1
> (2) The PPR is queued by the SMMU in the (hardware) PRI queue
> (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
>     to a software work queue.
> (4) The PPR is finally handled and a PRI response is sent to the device.

There are two ways a PASID could get shut down:

	1) The device driver calls unbind()
	2) The mm_struct bound to that PASID is going away

Case 1) is the easy one, we can safely assume that the device driver did
anything to stop new PPR requests from being created for that PASID. In
this case we just shut down PPR processing by waiting until everything
is handled and reply INVALID to any further PPR request before we remove
the PASID from the per-device IOMMU data structures and flush caches.

In case 2) we have more work to do. The mm_struct is going away
(probably because the task segfaulted) and we can't assume that the
device driver shut everything down already. But for this case we have
the call-back into the device driver to tell it should clean everything
up for that PASID and stop the device from creating further requests.

After that call-back returns it is the same as in case 1), we drain the
queue and deny any further request that comes in.

> The case that worries me is if someone unbinds PASID 1 between (2) and
> (3), while the PPR is still in the hardware queue, and immediately binds
> it to a new address space.
> 
> Then (3) and (4) happen, the PPR is handled and the fault is for the new
> address space. It's certainly undesirable, but I don't know if it could be
> exploited. We don't kill the task for an unhandled fault at the moment,
> simply report a failed PPR to the device, so I might be worrying for nothing.

As I wrote above, when the device driver calls unbind() we should
assume that the device does not sent any further requests with that
PASID. If it does, we just answer with INVALID.

> Having the caller tell us if PPRs might still be pending in the hardware
> PRI queue ensures that the SMMU driver waits until it's entirely safe:
> 
> * If the device has no outstanding PPR, PASID can be reallocated
> * If the device has outstanding PPRs, wait for a Stop Marker, or drain
>   the PRI queue after a while (if the Stop Marker was lost in a PRI queue
>   overflow).

That can't happen, when the device driver does its job right. It has to
shut down the context which causes the PPR requests for the PASID on the
device. This includes stopping the context and waiting until all PPR
requests it sent are processed.

And the device driver has to do this either before it calls unbind() or
in the call-back it provided. Only after this the PASID should be freed.

> Draining the PRI queue is very costly, we need to block the PRI thread to
> inspect the queue, risking an overflow. And with these PASID state flags
> we avoid flushing any queue.

There is a configurable maximum of PPR requests a device can have
in-flight. If you take that into account when allocation the PPR queue
for the SMMU, there can't be any overflows. The AMD driver allocates a
queue for 512 entries and allows devices to have a maximum of 32
outstanding requests.

> But since the problem seems too centered around the SMMU, I might just
> drop this patch along with the CLEAN/FLUSHED flags in my next version, and
> go with the full-drain solution. After all, unbind should be a fairly rare
> event.

I don't think all this is SMMU specific, it is the same on all other
IOMMUs that have the ATS/PRI/PASID features.



	Joerg
Jean-Philippe Brucker March 23, 2017, 1:37 p.m. UTC | #4
On 22/03/17 22:53, Joerg Roedel wrote:
> On Wed, Mar 22, 2017 at 06:31:01PM +0000, Jean-Philippe Brucker wrote:
>> The problem might be too tied to the specifics of the SMMU. As implemented
>> in this series, the normal flow for a PPR with the SMMU is the following:
>>
>> (1) PCI device issues a PPR for PASID 1
>> (2) The PPR is queued by the SMMU in the (hardware) PRI queue
>> (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
>>     to a software work queue.
>> (4) The PPR is finally handled and a PRI response is sent to the device.
> 
> There are two ways a PASID could get shut down:
> 
> 	1) The device driver calls unbind()
> 	2) The mm_struct bound to that PASID is going away
> 
> Case 1) is the easy one, we can safely assume that the device driver did
> anything to stop new PPR requests from being created for that PASID. In
> this case we just shut down PPR processing by waiting until everything
> is handled and reply INVALID to any further PPR request before we remove
> the PASID from the per-device IOMMU data structures and flush caches.
> 
> In case 2) we have more work to do. The mm_struct is going away
> (probably because the task segfaulted) and we can't assume that the
> device driver shut everything down already. But for this case we have
> the call-back into the device driver to tell it should clean everything
> up for that PASID and stop the device from creating further requests.
> 
> After that call-back returns it is the same as in case 1), we drain the
> queue and deny any further request that comes in.

I agree on the semantics (I didn't implement case 2) here but I will have
to in the next version.)

>> The case that worries me is if someone unbinds PASID 1 between (2) and
>> (3), while the PPR is still in the hardware queue, and immediately binds
>> it to a new address space.
>>
>> Then (3) and (4) happen, the PPR is handled and the fault is for the new
>> address space. It's certainly undesirable, but I don't know if it could be
>> exploited. We don't kill the task for an unhandled fault at the moment,
>> simply report a failed PPR to the device, so I might be worrying for nothing.
> 
> As I wrote above, when the device driver calls unbind() we should
> assume that the device does not sent any further requests with that
> PASID. If it does, we just answer with INVALID.
> 
>> Having the caller tell us if PPRs might still be pending in the hardware
>> PRI queue ensures that the SMMU driver waits until it's entirely safe:
>>
>> * If the device has no outstanding PPR, PASID can be reallocated
>> * If the device has outstanding PPRs, wait for a Stop Marker, or drain
>>   the PRI queue after a while (if the Stop Marker was lost in a PRI queue
>>   overflow).
> 
> That can't happen, when the device driver does its job right. It has to
> shut down the context which causes the PPR requests for the PASID on the
> device. This includes stopping the context and waiting until all PPR
> requests it sent are processed.

By "processed", do you mean that they are committed to the IOMMU, or that
they came back with a PRG response?

The driver might not be able to do the latter, since PCI defines two ways
of shutting down a context:

* Either wait for all PPR requests to come back with a PRG response,
* Or send a Stop Marker PPR request and forget about it.

The second one is problematic, all the device says is "I've stopped
sending requests, some might still be in flight, but a Stop Marker ends
the flow".

Without having the device driver tell us which of the two models the
device is implementing, draining the hardware queue is our best bet to
ensure that no request is pending.

In any case, this is a property of the device, and passing flags to
unbind() as I suggested is probably excessive. The IOMMU driver could
store that info somewhere and use it whenever it has to unbind(), as an
indication of the minimum amount of work required to clean the context.
Without this hint the default should be to drain both queues.

My intent with passing flags to unbind() was to handle the case where VFIO
is unable to tell us whether PPRs are still being issued by the device.
But the issue seems moot to me now that I have a better understanding, as
there will be a detach_dev/attach_dev sequence before we start rebinding
PASIDs, and we can simply reset the PRI interface there (I'm currently
doing that in add_device, but I should move it.)

> And the device driver has to do this either before it calls unbind() or
> in the call-back it provided. Only after this the PASID should be freed.
>
>> Draining the PRI queue is very costly, we need to block the PRI thread to
>> inspect the queue, risking an overflow. And with these PASID state flags
>> we avoid flushing any queue.
> 
> There is a configurable maximum of PPR requests a device can have
> in-flight. If you take that into account when allocation the PPR queue
> for the SMMU, there can't be any overflows. The AMD driver allocates a
> queue for 512 entries and allows devices to have a maximum of 32
> outstanding requests.

Yes, the SMMU specification also tells that over-committing the PPR queue
is a programming error. I wonder how well it scales with hot-plugging of
devices however. At the time when we start allocating PPR credits, we
might not be sure that the number of devices with a PRI will be limited to
16. So even if overflow is a programming error, I'm not comfortable with
ruling it out just yet.

Thanks,
Jean-Philippe

>> But since the problem seems too centered around the SMMU, I might just
>> drop this patch along with the CLEAN/FLUSHED flags in my next version, and
>> go with the full-drain solution. After all, unbind should be a fairly rare
>> event.
> 
> I don't think all this is SMMU specific, it is the same on all other
> IOMMUs that have the ATS/PRI/PASID features.
> 
> 
> 
> 	Joerg
>
Joerg Roedel March 23, 2017, 2:30 p.m. UTC | #5
On Thu, Mar 23, 2017 at 01:37:41PM +0000, Jean-Philippe Brucker wrote:
> On 22/03/17 22:53, Joerg Roedel wrote:
> > That can't happen, when the device driver does its job right. It has to
> > shut down the context which causes the PPR requests for the PASID on the
> > device. This includes stopping the context and waiting until all PPR
> > requests it sent are processed.
> 
> By "processed", do you mean that they are committed to the IOMMU, or that
> they came back with a PRG response?

By processed I mean the device received a response to the request, or is
at least no longer waiting for responses.

> The driver might not be able to do the latter, since PCI defines two ways
> of shutting down a context:
> 
> * Either wait for all PPR requests to come back with a PRG response,
> * Or send a Stop Marker PPR request and forget about it.

Are you sure about the meaning of the stop-marker? Can you point me to
where it is specified?

I know about a different marker, which allows a device to issue several
PPR requests and only get one response for all of them. The last request
then has that marker set.

But I am not aware (yet) of a marker with which the device says it no
longer sends requests with this PASID.

> My intent with passing flags to unbind() was to handle the case where VFIO
> is unable to tell us whether PPRs are still being issued by the device.
> But the issue seems moot to me now that I have a better understanding, as
> there will be a detach_dev/attach_dev sequence before we start rebinding
> PASIDs, and we can simply reset the PRI interface there (I'm currently
> doing that in add_device, but I should move it.)

VFIO should completly reset the device before it is re-used, so I don't
think we run into problems there.



	Joerg
Jean-Philippe Brucker March 23, 2017, 3:52 p.m. UTC | #6
On 23/03/17 14:30, Joerg Roedel wrote:
> On Thu, Mar 23, 2017 at 01:37:41PM +0000, Jean-Philippe Brucker wrote:
>> On 22/03/17 22:53, Joerg Roedel wrote:
>>> That can't happen, when the device driver does its job right. It has to
>>> shut down the context which causes the PPR requests for the PASID on the
>>> device. This includes stopping the context and waiting until all PPR
>>> requests it sent are processed.
>>
>> By "processed", do you mean that they are committed to the IOMMU, or that
>> they came back with a PRG response?
> 
> By processed I mean the device received a response to the request, or is
> at least no longer waiting for responses.
> 
>> The driver might not be able to do the latter, since PCI defines two ways
>> of shutting down a context:
>>
>> * Either wait for all PPR requests to come back with a PRG response,
>> * Or send a Stop Marker PPR request and forget about it.
> 
> Are you sure about the meaning of the stop-marker? Can you point me to
> where it is specified?

The concept is introduced in the PCI ECN that adds PASIDs to the ATS
specification. I have the following link, which is probably behind a
wall:

https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf

A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
PASID that is being decommissioned. In section 4.1.2, the specifications
details the two device-specific ways of stopping use of a PASID, with or
without a Stop Marker. When done with a Stop Marker, the function doesn't
have to wait for any outstanding PPR to return, the Stop Marker serves as
a PASID barrier.

Thanks,
Jean-Philippe

> I know about a different marker, which allows a device to issue several
> PPR requests and only get one response for all of them. The last request
> then has that marker set.
> 
> But I am not aware (yet) of a marker with which the device says it no
> longer sends requests with this PASID.

>> My intent with passing flags to unbind() was to handle the case where VFIO
>> is unable to tell us whether PPRs are still being issued by the device.
>> But the issue seems moot to me now that I have a better understanding, as
>> there will be a detach_dev/attach_dev sequence before we start rebinding
>> PASIDs, and we can simply reset the PRI interface there (I'm currently
>> doing that in add_device, but I should move it.)
> 
> VFIO should completly reset the device before it is re-used, so I don't
> think we run into problems there.
> 
> 
> 
> 	Joerg
>
Joerg Roedel March 23, 2017, 4:52 p.m. UTC | #7
On Thu, Mar 23, 2017 at 03:52:14PM +0000, Jean-Philippe Brucker wrote:
> On 23/03/17 14:30, Joerg Roedel wrote:
> > Are you sure about the meaning of the stop-marker? Can you point me to
> > where it is specified?
> 
> The concept is introduced in the PCI ECN that adds PASIDs to the ATS
> specification. I have the following link, which is probably behind a
> wall:
> 
> https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf
> 
> A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
> PASID that is being decommissioned. In section 4.1.2, the specifications
> details the two device-specific ways of stopping use of a PASID, with or
> without a Stop Marker. When done with a Stop Marker, the function doesn't
> have to wait for any outstanding PPR to return, the Stop Marker serves as
> a PASID barrier.

Thanks for that, I have a closer look. Is that stopper packet visible to
software (e.g. is an entry created in the queue)?

I have to check whether the AMD IOMMU makes this visible.



	Joerg
Jean-Philippe Brucker March 23, 2017, 5:03 p.m. UTC | #8
On 23/03/17 16:52, Joerg Roedel wrote:
> On Thu, Mar 23, 2017 at 03:52:14PM +0000, Jean-Philippe Brucker wrote:
>> On 23/03/17 14:30, Joerg Roedel wrote:
>>> Are you sure about the meaning of the stop-marker? Can you point me to
>>> where it is specified?
>>
>> The concept is introduced in the PCI ECN that adds PASIDs to the ATS
>> specification. I have the following link, which is probably behind a
>> wall:
>>
>> https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf
>>
>> A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the
>> PASID that is being decommissioned. In section 4.1.2, the specifications
>> details the two device-specific ways of stopping use of a PASID, with or
>> without a Stop Marker. When done with a Stop Marker, the function doesn't
>> have to wait for any outstanding PPR to return, the Stop Marker serves as
>> a PASID barrier.
> 
> Thanks for that, I have a closer look. Is that stopper packet visible to
> software (e.g. is an entry created in the queue)?

As far as I understand, it should be added to the queue like a normal PPR,
and software should check the R/W/L flags combination. For SMMU at least,
it is the same. The only difference is that when the PRI queue overflows,
the SMMU would discard a Stop Marker instead of sending an automated
response to the device (as it would do with others PPR that have the L
flag.) Software shouldn't send a response to a Stop Marker either.

Thanks,
Jean-Philippe
Joerg Roedel March 24, 2017, 11 a.m. UTC | #9
On Thu, Mar 23, 2017 at 05:03:46PM +0000, Jean-Philippe Brucker wrote:
> On 23/03/17 16:52, Joerg Roedel wrote:
> > Thanks for that, I have a closer look. Is that stopper packet visible to
> > software (e.g. is an entry created in the queue)?
> 
> As far as I understand, it should be added to the queue like a normal PPR,
> and software should check the R/W/L flags combination. For SMMU at least,
> it is the same. The only difference is that when the PRI queue overflows,
> the SMMU would discard a Stop Marker instead of sending an automated
> response to the device (as it would do with others PPR that have the L
> flag.) Software shouldn't send a response to a Stop Marker either.

The document you posted is an addition to the spec, so we can't rely on
a stop marker being sent by a device when it shuts down a context.
Current AMD GPUs don't send one, afaik.

I think the best we can do is shutting down processing for this PASID
when and inform the device driver, when we receive a stop marker. An
additional, optional call-back would do the job.



	Joerg
Jean-Philippe Brucker March 24, 2017, 7:08 p.m. UTC | #10
On 24/03/17 11:00, Joerg Roedel wrote:
> On Thu, Mar 23, 2017 at 05:03:46PM +0000, Jean-Philippe Brucker wrote:
>> On 23/03/17 16:52, Joerg Roedel wrote:
>>> Thanks for that, I have a closer look. Is that stopper packet visible to
>>> software (e.g. is an entry created in the queue)?
>>
>> As far as I understand, it should be added to the queue like a normal PPR,
>> and software should check the R/W/L flags combination. For SMMU at least,
>> it is the same. The only difference is that when the PRI queue overflows,
>> the SMMU would discard a Stop Marker instead of sending an automated
>> response to the device (as it would do with others PPR that have the L
>> flag.) Software shouldn't send a response to a Stop Marker either.
> 
> The document you posted is an addition to the spec, so we can't rely on
> a stop marker being sent by a device when it shuts down a context.
> Current AMD GPUs don't send one, afaik.

Fair enough, though on the SMMU side we still don't know which shutdown
model hardware vendors are more likely to choose. Devices could use the
stop marker and never wait for completion of PPRs. In that case context
shutdown would only have to make sure that PPRs are outside the PCI
network, return immediately and allow the device driver to call unbind().

So to be on the safe side I think I will by default assume that PPRs are
in flight during unbind, and drain both software and hardware queue to
ensure we catch them all.

As an optimization, we can later add ways for device drivers or firmware
to inform the IOMMU which PASID shutdown model is implemented (maybe a
callback in svm_ops?), in which case we wouldn't have to flush the queue
and if necessary, we can wait for a stop marker before throwing away a PASID.

Thanks,
Jean-Philippe

> I think the best we can do is shutting down processing for this PASID
> when and inform the device driver, when we receive a stop marker. An
> additional, optional call-back would do the job.
> 
> 
> 
> 	Joerg
>
Joerg Roedel March 27, 2017, 3:33 p.m. UTC | #11
On Fri, Mar 24, 2017 at 07:08:47PM +0000, Jean-Philippe Brucker wrote:
> On 24/03/17 11:00, Joerg Roedel wrote:
> > The document you posted is an addition to the spec, so we can't rely on
> > a stop marker being sent by a device when it shuts down a context.
> > Current AMD GPUs don't send one, afaik.
> 
> Fair enough, though on the SMMU side we still don't know which shutdown
> model hardware vendors are more likely to choose. Devices could use the
> stop marker and never wait for completion of PPRs. In that case context
> shutdown would only have to make sure that PPRs are outside the PCI
> network, return immediately and allow the device driver to call unbind().

Yes, on a stop-marker we should immediatly remove the PASID from the
IOMMU and inform the driver. It might call unbind or do something with
the device.

> So to be on the safe side I think I will by default assume that PPRs are
> in flight during unbind, and drain both software and hardware queue to
> ensure we catch them all.

The AMD driver first removes the internal mapping of the PASID to its
managing data-structures, so that any follow-on faults will get
rejected.



	Joerg
diff mbox

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 26c5f6528c69..eed52500d469 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1517,6 +1517,14 @@  EXPORT_SYMBOL_GPL(iommu_bind_task);
  * @dev: device bound to the task
  * @pasid: identifier of the bond
  * @flags: state of the PASID and driver-specific flags
+ *
+ * The caller must informs the IOMMU driver whether the PASID is safe to reuse
+ * immediately or if it needs more invalidation steps, by setting flags to
+ * either IOMMU_PASID_FLUSHED, or IOMMU_PASID_CLEAN.
+ *
+ * Without one of these flags, the device driver must have provided an
+ * invalidate_pasid callback in iommu_svm_ops. Otherwise, iommu_unbind_task
+ * returns an error.
  */
 int iommu_unbind_task(struct device *dev, int pasid, int flags)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9554f45d4305..204943ef38b2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -50,6 +50,21 @@  struct notifier_block;
 #define IOMMU_FAULT_READ	0x0
 #define IOMMU_FAULT_WRITE	0x1
 
+/*
+ * State of a PASID in the system
+ *
+ * IOMMU_PASID_FLUSHED: the device does not generate any traffic for this PASID
+ *   anymore, and all references to the PASID have been flushed; in other words,
+ *   the IOMMU will not receive any transaction referring to this instance of
+ *   the PASID anymore.
+ *
+ * IOMMU_PASID_CLEAN: in addition to IOMMU_PASID_FLUSHED, the PASID isn't
+ *   present in the IOMMU either. For instance when using PRI, the device waited
+ *   for all of its page requests to come back with a response.
+ */
+#define IOMMU_PASID_FLUSHED	0x1
+#define IOMMU_PASID_CLEAN	0x2
+
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
@@ -147,7 +162,8 @@  struct iommu_resv_region {
 
 /*
  * @handle_fault: report or handle a fault from the device (FIXME: imprecise)
- * @invalidate_pasid: stop using a PASID.
+ * @invalidate_pasid: stop using a PASID. Returns one of IOMMU_PASID_FLUSHED or
+ *                    IOMMU_PASID_CLEAN when stopped successfully. 0 otherwise.
  */
 struct iommu_svm_ops {
 	int (*handle_fault)(struct device *dev, int pasid, u64 address,