diff mbox

[RFCv2,05/36] iommu/process: Bind and unbind process to and from devices

Message ID 20171006133203.22803-6-jean-philippe.brucker@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jean-Philippe Brucker Oct. 6, 2017, 1:31 p.m. UTC
Add bind and unbind operations to the IOMMU API. Device drivers can use
them to share process page tables with their device.
iommu_process_bind_group is provided for VFIO's convenience, as it needs
to provide a coherent interface on containers. Device drivers will most
likely want to use iommu_process_bind_device, which doesn't bind the whole
group.

PASIDs are de facto shared between all devices in a group (because of
hardware weaknesses), but we don't do anything about it at the API level.
Making bind_device call bind_group is probably the wrong way around,
because it requires more work on our side for no benefit. We'd have to
replay all binds each time a device is hotplugged into a group. But when a
device is hotplugged into a group, the device driver will have to do a
bind before using its PASID anyway and we can reject inconsistencies at
that point.

Concurrent calls to iommu_process_bind_device for the same process are not
supported at the moment (they'll race on process_alloc which will only
succeed for the first one; the others will have to retry the bind). I also
don't support calling bind() on a dying process, not sure if it matters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |  64 ++++++++++++++++
 include/linux/iommu.h         |  41 +++++++++++
 3 files changed, 270 insertions(+)

Comments

Joerg Roedel Oct. 11, 2017, 11:33 a.m. UTC | #1
Hi Jean-Philipe,

Thanks for your patches, this is definitly a step in the right direction
to get generic support for IO virtual memory into the IOMMU core code.

But I see an issue with the design around task_struct, please see
below.

On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

I just took this patch as an example, it is in the other patches as
well. The code is designed around 'struct task_struct' while it should
really be designed around 'struct mm_struct'. You are not attaching a
specific process to a device, but the address-space of one or more
processes. And that should be reflected in the design.

There are several bad implications of building it around task_struct,
one is the life-time of the binding. If the address space is detached
from the device when the process exits, only the thread that did the
setup can can safely make use of the device, because if the device is
accessed from another thread it will crash when the setup-thread exits.

There are other benefits of using mm_struct, for example that you can
use mmu_notifiers to exit-handling.

Here is how I think the base API should look like:

	* iommu_iovm_device_init(struct device *dev);
	  iommu_iovm_device_shutdown(struct device *dev);

	  These two functions do the device specific setup/shutdown. For
	  PCI this would include enabling the PRI, PASID, and ATS
	  capabilities and setting up a PASID table for the device.

	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
			     iovm_shutdown_cb *cb);
	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);

	  These functions add and delete the entries in the PASID table
	  for the device and setup mmu_notifiers for the mm_struct to
	  keep IOMMU TLBs in sync with the CPU TLBs.

	  The shutdown_cb is invoked by the IOMMU core code when the
	  mm_struct goes away, for example because the process
	  segfaults.

	The PASID handling is best done these functions as well, unless
	there is a strong reason to allow device drivers to do the
	handling themselves.

The context data can be stored directly in mm_struct, including the
PASID for that mm.

There is of course more functionality needed, the above only outlines
the very basic needs.


Regards,

	Joerg
Jean-Philippe Brucker Oct. 12, 2017, 11:13 a.m. UTC | #2
Hi Joerg,

On 11/10/17 12:33, Joerg Roedel wrote:
> Hi Jean-Philipe,
> 
> Thanks for your patches, this is definitly a step in the right direction
> to get generic support for IO virtual memory into the IOMMU core code.
> 
> But I see an issue with the design around task_struct, please see
> below.
> 
> On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> I just took this patch as an example, it is in the other patches as
> well. The code is designed around 'struct task_struct' while it should
> really be designed around 'struct mm_struct'. You are not attaching a
> specific process to a device, but the address-space of one or more
> processes. And that should be reflected in the design.

Agreed. The task struct is only really needed for obtaining the mm in my
code. It also keeps hold of a pid, but that's wrong and easy to remove.

> There are several bad implications of building it around task_struct,
> one is the life-time of the binding. If the address space is detached
> from the device when the process exits, only the thread that did the
> setup can can safely make use of the device, because if the device is
> accessed from another thread it will crash when the setup-thread exits.
> 
> There are other benefits of using mm_struct, for example that you can
> use mmu_notifiers to exit-handling.
> 
> Here is how I think the base API should look like:
> 
> 	* iommu_iovm_device_init(struct device *dev);
> 	  iommu_iovm_device_shutdown(struct device *dev);
> 
> 	  These two functions do the device specific setup/shutdown. For
> 	  PCI this would include enabling the PRI, PASID, and ATS
> 	  capabilities and setting up a PASID table for the device.

Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
so the table and capability cannot be setup later than attach_device (and
we'll probably have to enable PASID in add_device). But I suppose it's an
implementation detail.

Some device drivers will want to use ATS alone, for accelerating IOVA
traffic. Should we enable it automatically, or provide device drivers with
a way to enable it manually? According to the PCI spec, PASID has to be
enabled before ATS, so device_init would have to first disable ATS in that
case.

> 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> 			     iovm_shutdown_cb *cb);
> 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> 
> 	  These functions add and delete the entries in the PASID table
> 	  for the device and setup mmu_notifiers for the mm_struct to
> 	  keep IOMMU TLBs in sync with the CPU TLBs.
> 
> 	  The shutdown_cb is invoked by the IOMMU core code when the
> 	  mm_struct goes away, for example because the process
> 	  segfaults.
> 
> 	The PASID handling is best done these functions as well, unless
> 	there is a strong reason to allow device drivers to do the
> 	handling themselves.
> 
> The context data can be stored directly in mm_struct, including the
> PASID for that mm.

Changing mm_struct probably isn't required at the moment, since the mm
subsystem won't use the context data or the PASID. Outside of
drivers/iommu/, only the caller of bind_mm needs the PASID in order to
program it into the device. The only advantage I see would be slightly
faster bind(), when finding out if a mm is already bound to devices. But
we don't really need fast bind(), so I don't think we'd have enough
material to argue for a change in mm_struct.

We do need to allocate a separate "iommu_mm_struct" wrapper to store the
mmu_notifier. Maybe bind() could return this structure (that contains the
PASID), and unbind() would take this iommu_mm_struct as argument?

Thanks
Jean
Joerg Roedel Oct. 12, 2017, 12:47 p.m. UTC | #3
Hi Jean-Philippe,

On Thu, Oct 12, 2017 at 12:13:20PM +0100, Jean-Philippe Brucker wrote:
> On 11/10/17 12:33, Joerg Roedel wrote:
> > Here is how I think the base API should look like:
> > 
> > 	* iommu_iovm_device_init(struct device *dev);
> > 	  iommu_iovm_device_shutdown(struct device *dev);
> > 
> > 	  These two functions do the device specific setup/shutdown. For
> > 	  PCI this would include enabling the PRI, PASID, and ATS
> > 	  capabilities and setting up a PASID table for the device.
> 
> Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
> so the table and capability cannot be setup later than attach_device (and
> we'll probably have to enable PASID in add_device). But I suppose it's an
> implementation detail.

Right, when the capabilities are enabled is an implementation detail of
the iommu-drivers. 

> Some device drivers will want to use ATS alone, for accelerating IOVA
> traffic. Should we enable it automatically, or provide device drivers with
> a way to enable it manually? According to the PCI spec, PASID has to be
> enabled before ATS, so device_init would have to first disable ATS in that
> case.

Yes, driver can enable ATS for normal use of a device, and
disable/re-enable it when the driver requests PASID/PRI functionality.
That is also an implementation detail. You should probably also document
that the init/shutdown functions may interrupt device operation, so that
driver writers are aware of that.

> 
> > 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> > 			     iovm_shutdown_cb *cb);
> > 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> > 
> > 	  These functions add and delete the entries in the PASID table
> > 	  for the device and setup mmu_notifiers for the mm_struct to
> > 	  keep IOMMU TLBs in sync with the CPU TLBs.
> > 
> > 	  The shutdown_cb is invoked by the IOMMU core code when the
> > 	  mm_struct goes away, for example because the process
> > 	  segfaults.
> > 
> > 	The PASID handling is best done these functions as well, unless
> > 	there is a strong reason to allow device drivers to do the
> > 	handling themselves.
> > 
> > The context data can be stored directly in mm_struct, including the
> > PASID for that mm.
> 
> Changing mm_struct probably isn't required at the moment, since the mm
> subsystem won't use the context data or the PASID. Outside of
> drivers/iommu/, only the caller of bind_mm needs the PASID in order to
> program it into the device. The only advantage I see would be slightly
> faster bind(), when finding out if a mm is already bound to devices. But
> we don't really need fast bind(), so I don't think we'd have enough
> material to argue for a change in mm_struct.

The idea behind storing the PASID in mm_struct is that we have a
system-wide PASID-allocator and only one PASID per address space, even
when accessed from multiple devices.

There will be hardware implemenations where this is required, afaik. It
doesn't mean that it _needs_ to be part of mm_struct, but it is
certainly easier than tracking this 1-1 relation separatly.

> We do need to allocate a separate "iommu_mm_struct" wrapper to store the
> mmu_notifier. Maybe bind() could return this structure (that contains the
> PASID), and unbind() would take this iommu_mm_struct as argument?

I'd like to track this iommu_mm_struct only in iommu-code, otherwise
drivers need to track the pointer somewhere. And we need to track the
mm_struct->iommu_mm_struct relation anyway in core-code to handle events
like segfaults, when the whole mm_struct goes away under us. So when we
track this in core code, there is no need to track this again in the
device drivers.


Regards,

	Joerg
Sinan Kaya Oct. 21, 2017, 3:47 p.m. UTC | #4
Just some improvement suggestions.

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
> +	spin_lock(&iommu_process_lock);
> +	idr_for_each_entry(&iommu_process_idr, process, i) {
> +		if (process->pid != pid)
> +			continue;
if you see this construct a lot, this could be a for_each_iommu_process.

> +
> +		if (!iommu_process_get_locked(process)) {
> +			/* Process is defunct, create a new one */
> +			process = NULL;
> +			break;
> +		}
> +
> +		/* Great, is it also bound to this domain? */
> +		list_for_each_entry(cur_context, &process->domains,
> +				    process_head) {
> +			if (cur_context->domain != domain)
> +				continue;
if you see this construct a lot, this could be a for_each_iommu_process_domain.

> +
> +			context = cur_context;
> +			*pasid = process->pasid;
> +
> +			/* Splendid, tell the driver and increase the ref */
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
> +
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_process_lock);
> +	put_pid(pid);
> +
> +	if (context)
> +		return err;

I think you should make the section above a independent function and return here when the
context is found.
Jean-Philippe Brucker Nov. 2, 2017, 4:21 p.m. UTC | #5
On 21/10/17 16:47, Sinan Kaya wrote:
> Just some improvement suggestions.
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>> +	spin_lock(&iommu_process_lock);
>> +	idr_for_each_entry(&iommu_process_idr, process, i) {
>> +		if (process->pid != pid)
>> +			continue;
> if you see this construct a lot, this could be a for_each_iommu_process.
> 
>> +
>> +		if (!iommu_process_get_locked(process)) {
>> +			/* Process is defunct, create a new one */
>> +			process = NULL;
>> +			break;
>> +		}
>> +
>> +		/* Great, is it also bound to this domain? */
>> +		list_for_each_entry(cur_context, &process->domains,
>> +				    process_head) {
>> +			if (cur_context->domain != domain)
>> +				continue;
> if you see this construct a lot, this could be a for_each_iommu_process_domain.
> 
>> +
>> +			context = cur_context;
>> +			*pasid = process->pasid;
>> +
>> +			/* Splendid, tell the driver and increase the ref */
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
>> +
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_process_lock);
>> +	put_pid(pid);
>> +
>> +	if (context)
>> +		return err;
> 
> I think you should make the section above a independent function and return here when the
> context is found.

Hopefully this code will only be needed for bind(), but moving it to a
separate function should look better.

Thanks,
Jean
Xie Yisheng Nov. 29, 2017, 6:08 a.m. UTC | #6
On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)
> +{
[..]
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
one ref for a context is enough right? so also need call iommu_process_put_locked()
if attach ok, or will be leak if user call bind twice for the same device and task.
Jean-Philippe Brucker Nov. 29, 2017, 3:01 p.m. UTC | #7
On 29/11/17 06:08, Yisheng Xie wrote:
> 
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
>> +{
> [..]
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
> one ref for a context is enough right? so also need call iommu_process_put_locked()
> if attach ok, or will be leak if user call bind twice for the same device and task.

I wasn't sure, I think I prefer taking one ref for each bind. If user
calls bind twice, it should call unbind twice as well (in case of leak we
free the context on process exit).

Also with this implementation, user can call bind for two devices in the
same domain, which will share the same context structure. So we have to
take as many refs as bind() calls.

Thanks,
Jean
Xie Yisheng Nov. 30, 2017, 1:11 a.m. UTC | #8
Hi, Jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> On 29/11/17 06:08, Yisheng Xie wrote:
>>
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>> +			      int *pasid, int flags)
>>> +{
>> [..]
>>> +			err = iommu_process_attach_locked(context, dev);
>>> +			if (err)
>>> +				iommu_process_put_locked(process);
>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>> if attach ok, or will be leak if user call bind twice for the same device and task.
> 
> I wasn't sure, I think I prefer taking one ref for each bind. If user
> calls bind twice, it should call unbind twice as well (in case of leak we
> free the context on process exit).
> 
> Also with this implementation, user can call bind for two devices in the
> same domain, which will share the same context structure. So we have to
> take as many refs as bind() calls.


hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
your next version), right? For each bind it will take a ref of context as present
design. but why also process ref need be taken for each bind? I mean it seems does
not break _user can call bind for two devices in the same domain_.

And if you really want to take a ref of *process* for echo bind, you should put it when
unbind, right? I just not find where you put the ref of process when unbind. But just put
the process ref when free context.

Maybe I just miss something.

Thanks
Yisheng Xie
> 
> Thanks,
> Jean
> 
> .
>
Jean-Philippe Brucker Nov. 30, 2017, 1:39 p.m. UTC | #9
On 30/11/17 01:11, Yisheng Xie wrote:
> Hi, Jean,
> 
> On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
>> On 29/11/17 06:08, Yisheng Xie wrote:
>>>
>>>
>>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>>> +			      int *pasid, int flags)
>>>> +{
>>> [..]
>>>> +			err = iommu_process_attach_locked(context, dev);
>>>> +			if (err)
>>>> +				iommu_process_put_locked(process);
>>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>>> if attach ok, or will be leak if user call bind twice for the same device and task.
>>
>> I wasn't sure, I think I prefer taking one ref for each bind. If user
>> calls bind twice, it should call unbind twice as well (in case of leak we
>> free the context on process exit).
>>
>> Also with this implementation, user can call bind for two devices in the
>> same domain, which will share the same context structure. So we have to
>> take as many refs as bind() calls.
> 
> 
> hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
> your next version), right? For each bind it will take a ref of context as present
> design. but why also process ref need be taken for each bind? I mean it seems does
> not break _user can call bind for two devices in the same domain_.
> 
> And if you really want to take a ref of *process* for echo bind, you should put it when
> unbind, right? I just not find where you put the ref of process when unbind. But just put
> the process ref when free context.
> 
> Maybe I just miss something.

No you're right I misunderstood, sorry about that. Each context has a
single ref to a process, so we do need to drop the process ref here as you
pointed out.

I thought I exercised this path though, I'll update my test suite. Also
attach_locked shouldn't take a context ref if attach fails...

Thanks a lot,
Jean
Sinan Kaya Jan. 19, 2018, 4:52 a.m. UTC | #10
Hi Jean-Philippe,

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>  /**
> + * iommu_process_bind_device - Bind a process address space to a device
> + * @dev: the device
> + * @task: the process to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + *
> + * Create a bond between device and task, allowing the device to access the
> + * process address space using the returned PASID.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

This API doesn't play nice with endpoint device drivers that have PASID limitations.

The AMD driver seems to have PASID limitations per product that are not being
advertised in the PCI capability.

device_iommu_pasid_init()
{
        pasid_limit = min_t(unsigned int,
                        (unsigned int)(1 << kfd->device_info->max_pasid_bits),
                        iommu_info.max_pasids);
        /*
         * last pasid is used for kernel queues doorbells
         * in the future the last pasid might be used for a kernel thread.
         */
        pasid_limit = min_t(unsigned int,
                                pasid_limit,
                                kfd->doorbell_process_limit - 1);
}

kfd->device_info->max_pasid_bits seems to contain per device limitations.

Would you be willing to extend the API so that the requester can impose some limit
on the PASID value that is getting allocated.

Sinan
Jean-Philippe Brucker Jan. 19, 2018, 10:27 a.m. UTC | #11
Hi Sinan,

On 19/01/18 04:52, Sinan Kaya wrote:
> Hi Jean-Philippe,
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>  /**
>> + * iommu_process_bind_device - Bind a process address space to a device
>> + * @dev: the device
>> + * @task: the process to bind
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>> + *
>> + * Create a bond between device and task, allowing the device to access the
>> + * process address space using the returned PASID.
>> + *
>> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
>> + * is returned.
>> + */
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> This API doesn't play nice with endpoint device drivers that have PASID limitations.
> 
> The AMD driver seems to have PASID limitations per product that are not being
> advertised in the PCI capability.
> 
> device_iommu_pasid_init()
> {
>         pasid_limit = min_t(unsigned int,
>                         (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>                         iommu_info.max_pasids);
>         /*
>          * last pasid is used for kernel queues doorbells
>          * in the future the last pasid might be used for a kernel thread.
>          */
>         pasid_limit = min_t(unsigned int,
>                                 pasid_limit,
>                                 kfd->doorbell_process_limit - 1);
> }
> 
> kfd->device_info->max_pasid_bits seems to contain per device limitations.
> 
> Would you be willing to extend the API so that the requester can impose some limit
> on the PASID value that is getting allocated.

Good point. Following the feedback for this series, next version adds
another public function:

int iommu_sva_device_init(struct device *dev, int features);

that has to be called by the device driver before any bind(). The intent
is to let some IOMMU drivers initialize PASID tables and other features
lazily, only if the device driver actually intends to use them. Maybe I
could change this function to:

int iommu_sva_device_init(struct device *dev, int features, unsigned int
max_pasid);

@features is a bitmask telling what the device driver needs (PASID and/or
page faults). If features has IOMMU_SVA_FEAT_PASID set, then device driver
can set a max_pasid limit, that we'd store in our private device-iommu
data. If max_pasid is 0, then we'd use the PCI limit.

Thanks,
Jean
Sinan Kaya Jan. 19, 2018, 1:07 p.m. UTC | #12
On 2018-01-19 05:27, Jean-Philippe Brucker wrote:
> Hi Sinan,
> 
> On 19/01/18 04:52, Sinan Kaya wrote:
>> Hi Jean-Philippe,
>> 
>> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>>  /**
>>> + * iommu_process_bind_device - Bind a process address space to a 
>>> device
>>> + * @dev: the device
>>> + * @task: the process to bind
>>> + * @pasid: valid address where the PASID will be stored
>>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>>> + *
>>> + * Create a bond between device and task, allowing the device to 
>>> access the
>>> + * process address space using the returned PASID.
>>> + *
>>> + * On success, 0 is returned and @pasid contains a valid ID. 
>>> Otherwise, an error
>>> + * is returned.
>>> + */
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct 
>>> *task,
>>> +			      int *pasid, int flags)
>> 
>> This API doesn't play nice with endpoint device drivers that have 
>> PASID limitations.
>> 
>> The AMD driver seems to have PASID limitations per product that are 
>> not being
>> advertised in the PCI capability.
>> 
>> device_iommu_pasid_init()
>> {
>>         pasid_limit = min_t(unsigned int,
>>                         (unsigned int)(1 << 
>> kfd->device_info->max_pasid_bits),
>>                         iommu_info.max_pasids);
>>         /*
>>          * last pasid is used for kernel queues doorbells
>>          * in the future the last pasid might be used for a kernel 
>> thread.
>>          */
>>         pasid_limit = min_t(unsigned int,
>>                                 pasid_limit,
>>                                 kfd->doorbell_process_limit - 1);
>> }
>> 
>> kfd->device_info->max_pasid_bits seems to contain per device 
>> limitations.
>> 
>> Would you be willing to extend the API so that the requester can 
>> impose some limit
>> on the PASID value that is getting allocated.
> 
> Good point. Following the feedback for this series, next version adds
> another public function:
> 
> int iommu_sva_device_init(struct device *dev, int features);
> 
> that has to be called by the device driver before any bind(). The 
> intent
> is to let some IOMMU drivers initialize PASID tables and other features
> lazily, only if the device driver actually intends to use them. Maybe I
> could change this function to:
> 
> int iommu_sva_device_init(struct device *dev, int features, unsigned 
> int
> max_pasid);
> 
> @features is a bitmask telling what the device driver needs (PASID 
> and/or
> page faults). If features has IOMMU_SVA_FEAT_PASID set, then device 
> driver
> can set a max_pasid limit, that we'd store in our private device-iommu
> data. If max_pasid is 0, then we'd use the PCI limit.

Yes, this should work.

> 
> Thanks,
> Jean
diff mbox

Patch

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 1ef3f55b962b..dee7691e3791 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -411,6 +411,171 @@  static struct mmu_notifier_ops iommu_process_mmu_notfier = {
 };
 
 /**
+ * iommu_process_bind_device - Bind a process address space to a device
+ * @dev: the device
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between device and task, allowing the device to access the
+ * process address space using the returned PASID.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_device(struct device *dev, struct task_struct *task,
+			      int *pasid, int flags)
+{
+	int err, i;
+	int nesting;
+	struct pid *pid;
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	if (!iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting) &&
+	    nesting)
+		return -EINVAL;
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	if (!pid)
+		return -EINVAL;
+
+	/* If an iommu_process already exists, use it */
+	spin_lock(&iommu_process_lock);
+	idr_for_each_entry(&iommu_process_idr, process, i) {
+		if (process->pid != pid)
+			continue;
+
+		if (!iommu_process_get_locked(process)) {
+			/* Process is defunct, create a new one */
+			process = NULL;
+			break;
+		}
+
+		/* Great, is it also bound to this domain? */
+		list_for_each_entry(cur_context, &process->domains,
+				    process_head) {
+			if (cur_context->domain != domain)
+				continue;
+
+			context = cur_context;
+			*pasid = process->pasid;
+
+			/* Splendid, tell the driver and increase the ref */
+			err = iommu_process_attach_locked(context, dev);
+			if (err)
+				iommu_process_put_locked(process);
+
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_process_lock);
+	put_pid(pid);
+
+	if (context)
+		return err;
+
+	if (!process) {
+		process = iommu_process_alloc(domain, task);
+		if (IS_ERR(process))
+			return PTR_ERR(process);
+	}
+
+	err = iommu_process_attach(domain, dev, process);
+	if (err) {
+		iommu_process_put(process);
+		return err;
+	}
+
+	*pasid = process->pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_device);
+
+/**
+ * iommu_process_unbind_device - Remove a bond created with
+ * iommu_process_bind_device.
+ *
+ * @dev: the device
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (!process) {
+		spin_unlock(&iommu_process_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(cur_context, &process->domains, process_head) {
+		if (cur_context->domain == domain) {
+			context = cur_context;
+			break;
+		}
+	}
+
+	if (context)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+
+	return context ? 0 : -ESRCH;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_device);
+
+/*
+ * __iommu_process_unbind_dev_all - Detach all processes attached to this
+ * device.
+ *
+ * When detaching @device from @domain, IOMMU drivers have to use this function.
+ */
+void __iommu_process_unbind_dev_all(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_context *context, *next;
+
+	/* Ask device driver to stop using all PASIDs */
+	spin_lock(&iommu_process_lock);
+	if (domain->process_exit) {
+		list_for_each_entry(context, &domain->processes, domain_head)
+			domain->process_exit(domain, dev,
+					     context->process->pasid,
+					     domain->process_exit_token);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry_safe(context, next, &domain->processes, domain_head)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(__iommu_process_unbind_dev_all);
+
+/**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
  * @dev: the device
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b2b34cf7c978..f9cb89dd28f5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1460,6 +1460,70 @@  void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_process_bind_group - Share process address space with all devices in
+ * the group.
+ * @group: the iommu group
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_group(struct iommu_group *group,
+			     struct task_struct *task, int *pasid, int flags)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_process_bind_device(device->dev, task, pasid,
+						flags);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_process_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_group);
+
+/**
+ * iommu_process_unbind_group - Remove a bond created with
+ * iommu_process_bind_group
+ *
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_process_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42b818437fa1..e64c2711ea8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -454,6 +454,10 @@  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_process_bind_group(struct iommu_group *group,
+				    struct task_struct *task, int *pasid,
+				    int flags);
+extern int iommu_process_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -739,6 +743,19 @@  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_process_bind_group(struct iommu_group *group,
+					   struct task_struct *task, int *pasid,
+					   int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_group(struct iommu_group *group,
+					     int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_PROCESS
@@ -747,6 +764,12 @@  extern void iommu_set_process_exit_handler(struct device *dev,
 					   void *token);
 extern struct iommu_process *iommu_process_find(int pasid);
 extern void iommu_process_put(struct iommu_process *process);
+extern int iommu_process_bind_device(struct device *dev,
+				     struct task_struct *task, int *pasid,
+				     int flags);
+extern int iommu_process_unbind_device(struct device *dev, int pasid);
+extern void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+					   struct device *dev);
 
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
@@ -763,6 +786,24 @@  static inline struct iommu_process *iommu_process_find(int pasid)
 static inline void iommu_process_put(struct iommu_process *process)
 {
 }
+
+static inline int iommu_process_bind_device(struct device *dev,
+					    struct task_struct *task,
+					    int *pasid, int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+						  struct device *dev)
+{
+}
+
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */