Message ID | 20170227195441.5170-25-jean-philippe.brucker@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Feb 27, 2017 at 07:54:35PM +0000, Jean-Philippe Brucker wrote: > It is an important distinction because, if the IOMMU driver reassigns a > PASID while the IOMMU still holds pending PPR targeting that PASID > internally, the PPR will trigger a fault in the wrong address space. The IOMMU driver also controls a devices apbility to issue PPR requests (at least on PCI), so it already knows whether a device has still requests pending or if it even can create new ones. Furhter, the IOMMU driver can already wait for all pending faults to be processed before it shuts down a PASID. So it is not clear to me why the device driver needs to be involved here. When the device driver issues a PASID-unbind call the iommu driver just waits until all pending faults are processed, answers new faults with INVALID, then switch off the devices capability to issue new faults, and then release the PASID. Joerg
On 22/03/17 15:44, Joerg Roedel wrote: > On Mon, Feb 27, 2017 at 07:54:35PM +0000, Jean-Philippe Brucker wrote: >> It is an important distinction because, if the IOMMU driver reassigns a >> PASID while the IOMMU still holds pending PPR targeting that PASID >> internally, the PPR will trigger a fault in the wrong address space. > > The IOMMU driver also controls a devices apbility to issue PPR requests > (at least on PCI), so it already knows whether a device has still > requests pending or if it even can create new ones. Apart from resetting the PRI capability, the SMMU doesn't have any control over the device's PPR requests, so we simply mandate that the caller did the required work to stop issuing them before calling iommu_unbind. > Furhter, the IOMMU driver can already wait for all pending faults to be > processed before it shuts down a PASID. So it is not clear to me why the > device driver needs to be involved here. The problem might be too tied to the specifics of the SMMU. As implemented in this series, the normal flow for a PPR with the SMMU is the following: (1) PCI device issues a PPR for PASID 1 (2) The PPR is queued by the SMMU in the (hardware) PRI queue (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it to a software work queue. (4) The PPR is finally handled and a PRI response is sent to the device. The case that worries me is if someone unbinds PASID 1 between (2) and (3), while the PPR is still in the hardware queue, and immediately binds it to a new address space. Then (3) and (4) happen, the PPR is handled and the fault is for the new address space. It's certainly undesirable, but I don't know if it could be exploited. We don't kill the task for an unhandled fault at the moment, simply report a failed PPR to the device, so I might be worrying for nothing. Having the caller tell us if PPRs might still be pending in the hardware PRI queue ensures that the SMMU driver waits until it's entirely safe: * If the device has no outstanding PPR, PASID can be reallocated * If the device has outstanding PPRs, wait for a Stop Marker, or drain the PRI queue after a while (if the Stop Marker was lost in a PRI queue overflow). Draining the PRI queue is very costly, we need to block the PRI thread to inspect the queue, risking an overflow. And with these PASID state flags we avoid flushing any queue. But since the problem seems too centered around the SMMU, I might just drop this patch along with the CLEAN/FLUSHED flags in my next version, and go with the full-drain solution. After all, unbind should be a fairly rare event. Thanks, Jean-Philippe > When the device driver issues a PASID-unbind call the iommu driver > just waits until all pending faults are processed, answers new faults > with INVALID, then switch off the devices capability to issue new > faults, and then release the PASID. >
On Wed, Mar 22, 2017 at 06:31:01PM +0000, Jean-Philippe Brucker wrote: > The problem might be too tied to the specifics of the SMMU. As implemented > in this series, the normal flow for a PPR with the SMMU is the following: > > (1) PCI device issues a PPR for PASID 1 > (2) The PPR is queued by the SMMU in the (hardware) PRI queue > (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it > to a software work queue. > (4) The PPR is finally handled and a PRI response is sent to the device. There are two ways a PASID could get shut down: 1) The device driver calls unbind() 2) The mm_struct bound to that PASID is going away Case 1) is the easy one, we can safely assume that the device driver did anything to stop new PPR requests from being created for that PASID. In this case we just shut down PPR processing by waiting until everything is handled and reply INVALID to any further PPR request before we remove the PASID from the per-device IOMMU data structures and flush caches. In case 2) we have more work to do. The mm_struct is going away (probably because the task segfaulted) and we can't assume that the device driver shut everything down already. But for this case we have the call-back into the device driver to tell it should clean everything up for that PASID and stop the device from creating further requests. After that call-back returns it is the same as in case 1), we drain the queue and deny any further request that comes in. > The case that worries me is if someone unbinds PASID 1 between (2) and > (3), while the PPR is still in the hardware queue, and immediately binds > it to a new address space. > > Then (3) and (4) happen, the PPR is handled and the fault is for the new > address space. It's certainly undesirable, but I don't know if it could be > exploited. We don't kill the task for an unhandled fault at the moment, > simply report a failed PPR to the device, so I might be worrying for nothing. As I wrote above, when the device driver calls unbind() we should assume that the device does not sent any further requests with that PASID. If it does, we just answer with INVALID. > Having the caller tell us if PPRs might still be pending in the hardware > PRI queue ensures that the SMMU driver waits until it's entirely safe: > > * If the device has no outstanding PPR, PASID can be reallocated > * If the device has outstanding PPRs, wait for a Stop Marker, or drain > the PRI queue after a while (if the Stop Marker was lost in a PRI queue > overflow). That can't happen, when the device driver does its job right. It has to shut down the context which causes the PPR requests for the PASID on the device. This includes stopping the context and waiting until all PPR requests it sent are processed. And the device driver has to do this either before it calls unbind() or in the call-back it provided. Only after this the PASID should be freed. > Draining the PRI queue is very costly, we need to block the PRI thread to > inspect the queue, risking an overflow. And with these PASID state flags > we avoid flushing any queue. There is a configurable maximum of PPR requests a device can have in-flight. If you take that into account when allocation the PPR queue for the SMMU, there can't be any overflows. The AMD driver allocates a queue for 512 entries and allows devices to have a maximum of 32 outstanding requests. > But since the problem seems too centered around the SMMU, I might just > drop this patch along with the CLEAN/FLUSHED flags in my next version, and > go with the full-drain solution. After all, unbind should be a fairly rare > event. I don't think all this is SMMU specific, it is the same on all other IOMMUs that have the ATS/PRI/PASID features. Joerg
On 22/03/17 22:53, Joerg Roedel wrote: > On Wed, Mar 22, 2017 at 06:31:01PM +0000, Jean-Philippe Brucker wrote: >> The problem might be too tied to the specifics of the SMMU. As implemented >> in this series, the normal flow for a PPR with the SMMU is the following: >> >> (1) PCI device issues a PPR for PASID 1 >> (2) The PPR is queued by the SMMU in the (hardware) PRI queue >> (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it >> to a software work queue. >> (4) The PPR is finally handled and a PRI response is sent to the device. > > There are two ways a PASID could get shut down: > > 1) The device driver calls unbind() > 2) The mm_struct bound to that PASID is going away > > Case 1) is the easy one, we can safely assume that the device driver did > anything to stop new PPR requests from being created for that PASID. In > this case we just shut down PPR processing by waiting until everything > is handled and reply INVALID to any further PPR request before we remove > the PASID from the per-device IOMMU data structures and flush caches. > > In case 2) we have more work to do. The mm_struct is going away > (probably because the task segfaulted) and we can't assume that the > device driver shut everything down already. But for this case we have > the call-back into the device driver to tell it should clean everything > up for that PASID and stop the device from creating further requests. > > After that call-back returns it is the same as in case 1), we drain the > queue and deny any further request that comes in. I agree on the semantics (I didn't implement case 2) here but I will have to in the next version.) >> The case that worries me is if someone unbinds PASID 1 between (2) and >> (3), while the PPR is still in the hardware queue, and immediately binds >> it to a new address space. >> >> Then (3) and (4) happen, the PPR is handled and the fault is for the new >> address space. It's certainly undesirable, but I don't know if it could be >> exploited. We don't kill the task for an unhandled fault at the moment, >> simply report a failed PPR to the device, so I might be worrying for nothing. > > As I wrote above, when the device driver calls unbind() we should > assume that the device does not sent any further requests with that > PASID. If it does, we just answer with INVALID. > >> Having the caller tell us if PPRs might still be pending in the hardware >> PRI queue ensures that the SMMU driver waits until it's entirely safe: >> >> * If the device has no outstanding PPR, PASID can be reallocated >> * If the device has outstanding PPRs, wait for a Stop Marker, or drain >> the PRI queue after a while (if the Stop Marker was lost in a PRI queue >> overflow). > > That can't happen, when the device driver does its job right. It has to > shut down the context which causes the PPR requests for the PASID on the > device. This includes stopping the context and waiting until all PPR > requests it sent are processed. By "processed", do you mean that they are committed to the IOMMU, or that they came back with a PRG response? The driver might not be able to do the latter, since PCI defines two ways of shutting down a context: * Either wait for all PPR requests to come back with a PRG response, * Or send a Stop Marker PPR request and forget about it. The second one is problematic, all the device says is "I've stopped sending requests, some might still be in flight, but a Stop Marker ends the flow". Without having the device driver tell us which of the two models the device is implementing, draining the hardware queue is our best bet to ensure that no request is pending. In any case, this is a property of the device, and passing flags to unbind() as I suggested is probably excessive. The IOMMU driver could store that info somewhere and use it whenever it has to unbind(), as an indication of the minimum amount of work required to clean the context. Without this hint the default should be to drain both queues. My intent with passing flags to unbind() was to handle the case where VFIO is unable to tell us whether PPRs are still being issued by the device. But the issue seems moot to me now that I have a better understanding, as there will be a detach_dev/attach_dev sequence before we start rebinding PASIDs, and we can simply reset the PRI interface there (I'm currently doing that in add_device, but I should move it.) > And the device driver has to do this either before it calls unbind() or > in the call-back it provided. Only after this the PASID should be freed. > >> Draining the PRI queue is very costly, we need to block the PRI thread to >> inspect the queue, risking an overflow. And with these PASID state flags >> we avoid flushing any queue. > > There is a configurable maximum of PPR requests a device can have > in-flight. If you take that into account when allocation the PPR queue > for the SMMU, there can't be any overflows. The AMD driver allocates a > queue for 512 entries and allows devices to have a maximum of 32 > outstanding requests. Yes, the SMMU specification also tells that over-committing the PPR queue is a programming error. I wonder how well it scales with hot-plugging of devices however. At the time when we start allocating PPR credits, we might not be sure that the number of devices with a PRI will be limited to 16. So even if overflow is a programming error, I'm not comfortable with ruling it out just yet. Thanks, Jean-Philippe >> But since the problem seems too centered around the SMMU, I might just >> drop this patch along with the CLEAN/FLUSHED flags in my next version, and >> go with the full-drain solution. After all, unbind should be a fairly rare >> event. > > I don't think all this is SMMU specific, it is the same on all other > IOMMUs that have the ATS/PRI/PASID features. > > > > Joerg >
On Thu, Mar 23, 2017 at 01:37:41PM +0000, Jean-Philippe Brucker wrote: > On 22/03/17 22:53, Joerg Roedel wrote: > > That can't happen, when the device driver does its job right. It has to > > shut down the context which causes the PPR requests for the PASID on the > > device. This includes stopping the context and waiting until all PPR > > requests it sent are processed. > > By "processed", do you mean that they are committed to the IOMMU, or that > they came back with a PRG response? By processed I mean the device received a response to the request, or is at least no longer waiting for responses. > The driver might not be able to do the latter, since PCI defines two ways > of shutting down a context: > > * Either wait for all PPR requests to come back with a PRG response, > * Or send a Stop Marker PPR request and forget about it. Are you sure about the meaning of the stop-marker? Can you point me to where it is specified? I know about a different marker, which allows a device to issue several PPR requests and only get one response for all of them. The last request then has that marker set. But I am not aware (yet) of a marker with which the device says it no longer sends requests with this PASID. > My intent with passing flags to unbind() was to handle the case where VFIO > is unable to tell us whether PPRs are still being issued by the device. > But the issue seems moot to me now that I have a better understanding, as > there will be a detach_dev/attach_dev sequence before we start rebinding > PASIDs, and we can simply reset the PRI interface there (I'm currently > doing that in add_device, but I should move it.) VFIO should completly reset the device before it is re-used, so I don't think we run into problems there. Joerg
On 23/03/17 14:30, Joerg Roedel wrote: > On Thu, Mar 23, 2017 at 01:37:41PM +0000, Jean-Philippe Brucker wrote: >> On 22/03/17 22:53, Joerg Roedel wrote: >>> That can't happen, when the device driver does its job right. It has to >>> shut down the context which causes the PPR requests for the PASID on the >>> device. This includes stopping the context and waiting until all PPR >>> requests it sent are processed. >> >> By "processed", do you mean that they are committed to the IOMMU, or that >> they came back with a PRG response? > > By processed I mean the device received a response to the request, or is > at least no longer waiting for responses. > >> The driver might not be able to do the latter, since PCI defines two ways >> of shutting down a context: >> >> * Either wait for all PPR requests to come back with a PRG response, >> * Or send a Stop Marker PPR request and forget about it. > > Are you sure about the meaning of the stop-marker? Can you point me to > where it is specified? The concept is introduced in the PCI ECN that adds PASIDs to the ATS specification. I have the following link, which is probably behind a wall: https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the PASID that is being decommissioned. In section 4.1.2, the specifications details the two device-specific ways of stopping use of a PASID, with or without a Stop Marker. When done with a Stop Marker, the function doesn't have to wait for any outstanding PPR to return, the Stop Marker serves as a PASID barrier. Thanks, Jean-Philippe > I know about a different marker, which allows a device to issue several > PPR requests and only get one response for all of them. The last request > then has that marker set. > > But I am not aware (yet) of a marker with which the device says it no > longer sends requests with this PASID. >> My intent with passing flags to unbind() was to handle the case where VFIO >> is unable to tell us whether PPRs are still being issued by the device. >> But the issue seems moot to me now that I have a better understanding, as >> there will be a detach_dev/attach_dev sequence before we start rebinding >> PASIDs, and we can simply reset the PRI interface there (I'm currently >> doing that in add_device, but I should move it.) > > VFIO should completly reset the device before it is re-used, so I don't > think we run into problems there. > > > > Joerg >
On Thu, Mar 23, 2017 at 03:52:14PM +0000, Jean-Philippe Brucker wrote: > On 23/03/17 14:30, Joerg Roedel wrote: > > Are you sure about the meaning of the stop-marker? Can you point me to > > where it is specified? > > The concept is introduced in the PCI ECN that adds PASIDs to the ATS > specification. I have the following link, which is probably behind a > wall: > > https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf > > A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the > PASID that is being decommissioned. In section 4.1.2, the specifications > details the two device-specific ways of stopping use of a PASID, with or > without a Stop Marker. When done with a Stop Marker, the function doesn't > have to wait for any outstanding PPR to return, the Stop Marker serves as > a PASID barrier. Thanks for that, I have a closer look. Is that stopper packet visible to software (e.g. is an entry created in the queue)? I have to check whether the AMD IOMMU makes this visible. Joerg
On 23/03/17 16:52, Joerg Roedel wrote: > On Thu, Mar 23, 2017 at 03:52:14PM +0000, Jean-Philippe Brucker wrote: >> On 23/03/17 14:30, Joerg Roedel wrote: >>> Are you sure about the meaning of the stop-marker? Can you point me to >>> where it is specified? >> >> The concept is introduced in the PCI ECN that adds PASIDs to the ATS >> specification. I have the following link, which is probably behind a >> wall: >> >> https://pcisig.com/sites/default/files/specification_documents/ECN-PASID-ATS-2011-03-31.pdf >> >> A Stop Marker is a PPR with flags Read=Write=0, Last=1, targeting the >> PASID that is being decommissioned. In section 4.1.2, the specifications >> details the two device-specific ways of stopping use of a PASID, with or >> without a Stop Marker. When done with a Stop Marker, the function doesn't >> have to wait for any outstanding PPR to return, the Stop Marker serves as >> a PASID barrier. > > Thanks for that, I have a closer look. Is that stopper packet visible to > software (e.g. is an entry created in the queue)? As far as I understand, it should be added to the queue like a normal PPR, and software should check the R/W/L flags combination. For SMMU at least, it is the same. The only difference is that when the PRI queue overflows, the SMMU would discard a Stop Marker instead of sending an automated response to the device (as it would do with others PPR that have the L flag.) Software shouldn't send a response to a Stop Marker either. Thanks, Jean-Philippe
On Thu, Mar 23, 2017 at 05:03:46PM +0000, Jean-Philippe Brucker wrote: > On 23/03/17 16:52, Joerg Roedel wrote: > > Thanks for that, I have a closer look. Is that stopper packet visible to > > software (e.g. is an entry created in the queue)? > > As far as I understand, it should be added to the queue like a normal PPR, > and software should check the R/W/L flags combination. For SMMU at least, > it is the same. The only difference is that when the PRI queue overflows, > the SMMU would discard a Stop Marker instead of sending an automated > response to the device (as it would do with others PPR that have the L > flag.) Software shouldn't send a response to a Stop Marker either. The document you posted is an addition to the spec, so we can't rely on a stop marker being sent by a device when it shuts down a context. Current AMD GPUs don't send one, afaik. I think the best we can do is shutting down processing for this PASID when and inform the device driver, when we receive a stop marker. An additional, optional call-back would do the job. Joerg
On 24/03/17 11:00, Joerg Roedel wrote: > On Thu, Mar 23, 2017 at 05:03:46PM +0000, Jean-Philippe Brucker wrote: >> On 23/03/17 16:52, Joerg Roedel wrote: >>> Thanks for that, I have a closer look. Is that stopper packet visible to >>> software (e.g. is an entry created in the queue)? >> >> As far as I understand, it should be added to the queue like a normal PPR, >> and software should check the R/W/L flags combination. For SMMU at least, >> it is the same. The only difference is that when the PRI queue overflows, >> the SMMU would discard a Stop Marker instead of sending an automated >> response to the device (as it would do with others PPR that have the L >> flag.) Software shouldn't send a response to a Stop Marker either. > > The document you posted is an addition to the spec, so we can't rely on > a stop marker being sent by a device when it shuts down a context. > Current AMD GPUs don't send one, afaik. Fair enough, though on the SMMU side we still don't know which shutdown model hardware vendors are more likely to choose. Devices could use the stop marker and never wait for completion of PPRs. In that case context shutdown would only have to make sure that PPRs are outside the PCI network, return immediately and allow the device driver to call unbind(). So to be on the safe side I think I will by default assume that PPRs are in flight during unbind, and drain both software and hardware queue to ensure we catch them all. As an optimization, we can later add ways for device drivers or firmware to inform the IOMMU which PASID shutdown model is implemented (maybe a callback in svm_ops?), in which case we wouldn't have to flush the queue and if necessary, we can wait for a stop marker before throwing away a PASID. Thanks, Jean-Philippe > I think the best we can do is shutting down processing for this PASID > when and inform the device driver, when we receive a stop marker. An > additional, optional call-back would do the job. > > > > Joerg >
On Fri, Mar 24, 2017 at 07:08:47PM +0000, Jean-Philippe Brucker wrote: > On 24/03/17 11:00, Joerg Roedel wrote: > > The document you posted is an addition to the spec, so we can't rely on > > a stop marker being sent by a device when it shuts down a context. > > Current AMD GPUs don't send one, afaik. > > Fair enough, though on the SMMU side we still don't know which shutdown > model hardware vendors are more likely to choose. Devices could use the > stop marker and never wait for completion of PPRs. In that case context > shutdown would only have to make sure that PPRs are outside the PCI > network, return immediately and allow the device driver to call unbind(). Yes, on a stop-marker we should immediatly remove the PASID from the IOMMU and inform the driver. It might call unbind or do something with the device. > So to be on the safe side I think I will by default assume that PPRs are > in flight during unbind, and drain both software and hardware queue to > ensure we catch them all. The AMD driver first removes the internal mapping of the PASID to its managing data-structures, so that any follow-on faults will get rejected. Joerg
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 26c5f6528c69..eed52500d469 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1517,6 +1517,14 @@ EXPORT_SYMBOL_GPL(iommu_bind_task); * @dev: device bound to the task * @pasid: identifier of the bond * @flags: state of the PASID and driver-specific flags + * + * The caller must informs the IOMMU driver whether the PASID is safe to reuse + * immediately or if it needs more invalidation steps, by setting flags to + * either IOMMU_PASID_FLUSHED, or IOMMU_PASID_CLEAN. + * + * Without one of these flags, the device driver must have provided an + * invalidate_pasid callback in iommu_svm_ops. Otherwise, iommu_unbind_task + * returns an error. */ int iommu_unbind_task(struct device *dev, int pasid, int flags) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9554f45d4305..204943ef38b2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -50,6 +50,21 @@ struct notifier_block; #define IOMMU_FAULT_READ 0x0 #define IOMMU_FAULT_WRITE 0x1 +/* + * State of a PASID in the system + * + * IOMMU_PASID_FLUSHED: the device does not generate any traffic for this PASID + * anymore, and all references to the PASID have been flushed; in other words, + * the IOMMU will not receive any transaction referring to this instance of + * the PASID anymore. + * + * IOMMU_PASID_CLEAN: in addition to IOMMU_PASID_FLUSHED, the PASID isn't + * present in the IOMMU either. For instance when using PRI, the device waited + * for all of its page requests to come back with a response. + */ +#define IOMMU_PASID_FLUSHED 0x1 +#define IOMMU_PASID_CLEAN 0x2 + typedef int (*iommu_fault_handler_t)(struct iommu_domain *, struct device *, unsigned long, int, void *); @@ -147,7 +162,8 @@ struct iommu_resv_region { /* * @handle_fault: report or handle a fault from the device (FIXME: imprecise) - * @invalidate_pasid: stop using a PASID. + * @invalidate_pasid: stop using a PASID. Returns one of IOMMU_PASID_FLUSHED or + * IOMMU_PASID_CLEAN when stopped successfully. 0 otherwise. */ struct iommu_svm_ops { int (*handle_fault)(struct device *dev, int pasid, u64 address,
Provide a way for device drivers to tell the IOMMU driver about the state of the PASID they are trying to decommission. When unbinding a task from a device, the IOMMU driver needs to know whether it can immediately reuse the PASID for another task, or if there is additional work to be done before the PASID is safe to re-use. One hard requirement when calling unbind is that the associated PASID is not present in any transaction downstream of the IOMMU anymore. In other words, any read, write, page requests referring to this PASID has finished. For PCIe, this means that the driver has successfully executed the device-specific stop request mechanism described in 6.20.1 (Managing PASID TLP Prefix Usage). In particular: * device doesn't issue any new request for this PASID, * all non-posted requests for this PASID have been completed, * all posted requests for this PASID (addressing host memory) have been flushed to the host. Address Translation Requests are non-posted, and PRI Page Requests (PPR) are posted. In addition with PRI, device must implement one of the following mechanism (ATS spec 4.1.2. - Managing PASID TLP Prefix Usage): A. Finish transmitting any PPR affecting this PASID and wait for their response. In this case, the IOMMU driver can safely reuse the PASID and must not wait for a Stop Marker. B. Finish transmitting any PPR affecting this PASID and send a Stop Marker. The driver must wait to receive a Stop Marker for this PASID before reusing it. This patch lets the driver communicate the current state of the PASID with either IOMMU_PASID_FLUSHED for case A, or IOMMU_PASID_CLEAN for case B. It is an important distinction because, if the IOMMU driver reassigns a PASID while the IOMMU still holds pending PPR targeting that PASID internally, the PPR will trigger a fault in the wrong address space. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> --- drivers/iommu/iommu.c | 8 ++++++++ include/linux/iommu.h | 18 +++++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-)