Message ID | 20220106022053.2406748-4-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Scrap iommu_attach/detach_group() interfaces | expand |
On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote: > The iommu_attach/detach_device() interfaces were exposed for the device > drivers to attach/detach their own domains. The commit <426a273834eae> > ("iommu: Limit iommu_attach/detach_device to device with their own group") > restricted them to singleton groups to avoid different device in a group > attaching different domain. > > As we've introduced device DMA ownership into the iommu core. We can now > extend these interfaces for muliple-device groups, and "all devices are in > the same address space" is still guaranteed. > > For multiple devices belonging to a same group, iommu_device_use_dma_api() > and iommu_attach_device() are exclusive. Therefore, when drivers decide to > use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() at > the same time. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++---------- > 1 file changed, 62 insertions(+), 17 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index ab8ab95969f5..2c9efd85e447 100644 > +++ b/drivers/iommu/iommu.c > @@ -47,6 +47,7 @@ struct iommu_group { > struct iommu_domain *domain; > struct list_head entry; > unsigned int owner_cnt; > + unsigned int attach_cnt; Why did we suddenly need another counter? None of the prior versions needed this. I suppose this is being used a some flag to indicate if owner_cnt == 1 or owner_cnt == 0 should restore the default domain? Would rather a flag 'auto_no_kernel_dma_api_compat' or something > +/** > + * iommu_attach_device() - attach external or UNMANAGED domain to device > + * @domain: the domain about to attach > + * @dev: the device about to be attached > + * > + * For devices belonging to the same group, iommu_device_use_dma_api() and > + * iommu_attach_device() are exclusive. Therefore, when drivers decide to > + * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() > + * at the same time. > + */ > int iommu_attach_device(struct iommu_domain *domain, struct device *dev) > { > struct iommu_group *group; > + int ret = 0; > + > + if (domain->type != IOMMU_DOMAIN_UNMANAGED) > + return -EINVAL; > > group = iommu_group_get(dev); > if (!group) > return -ENODEV; > > + if (group->owner_cnt) { > + /* > + * Group has been used for kernel-api dma or claimed explicitly > + * for exclusive occupation. For backward compatibility, device > + * in a singleton group is allowed to ignore setting the > + * drv.no_kernel_api_dma field. BTW why is this call 'no kernel api dma' ? That reads backwards 'no kernel dma api' right? Aother appeal of putting no_kernel_api_dma in the struct device_driver is that this could could simply do 'dev->driver->no_kernel_api_dma' to figure out how it is being called and avoid this messy implicitness. Once we know our calling context we can always automatic switch from DMA API mode to another domain without any trouble or special counters: if (!dev->driver->no_kernel_api_dma) { if (group->owner_cnt > 1 || group->owner) return -EBUSY; return __iommu_attach_group(domain, group); } if (!group->owner_cnt) { ret = __iommu_attach_group(domain, group); if (ret) return ret; } else if (group->owner || group->domain != domain) return -EBUSY; group->owner_cnt++; Right? > + if (!group->attach_cnt) { > + ret = __iommu_attach_group(domain, group); How come we don't have to detatch the default domain here? Doesn't that mean that the iommu_replace_group could also just call attach directly without going through detatch? Jason
Hi Jason, On 1/7/22 1:22 AM, Jason Gunthorpe wrote: > On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote: >> The iommu_attach/detach_device() interfaces were exposed for the device >> drivers to attach/detach their own domains. The commit <426a273834eae> >> ("iommu: Limit iommu_attach/detach_device to device with their own group") >> restricted them to singleton groups to avoid different device in a group >> attaching different domain. >> >> As we've introduced device DMA ownership into the iommu core. We can now >> extend these interfaces for muliple-device groups, and "all devices are in >> the same address space" is still guaranteed. >> >> For multiple devices belonging to a same group, iommu_device_use_dma_api() >> and iommu_attach_device() are exclusive. Therefore, when drivers decide to >> use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() at >> the same time. >> >> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++---------- >> 1 file changed, 62 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index ab8ab95969f5..2c9efd85e447 100644 >> +++ b/drivers/iommu/iommu.c >> @@ -47,6 +47,7 @@ struct iommu_group { >> struct iommu_domain *domain; >> struct list_head entry; >> unsigned int owner_cnt; >> + unsigned int attach_cnt; > > Why did we suddenly need another counter? None of the prior versions > needed this. I suppose this is being used a some flag to indicate if > owner_cnt == 1 or owner_cnt == 0 should restore the default domain? Yes, exactly. > Would rather a flag 'auto_no_kernel_dma_api_compat' or something Adding a flag also works. > > >> +/** >> + * iommu_attach_device() - attach external or UNMANAGED domain to device >> + * @domain: the domain about to attach >> + * @dev: the device about to be attached >> + * >> + * For devices belonging to the same group, iommu_device_use_dma_api() and >> + * iommu_attach_device() are exclusive. Therefore, when drivers decide to >> + * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() >> + * at the same time. >> + */ >> int iommu_attach_device(struct iommu_domain *domain, struct device *dev) >> { >> struct iommu_group *group; >> + int ret = 0; >> + >> + if (domain->type != IOMMU_DOMAIN_UNMANAGED) >> + return -EINVAL; >> >> group = iommu_group_get(dev); >> if (!group) >> return -ENODEV; >> >> + if (group->owner_cnt) { >> + /* >> + * Group has been used for kernel-api dma or claimed explicitly >> + * for exclusive occupation. For backward compatibility, device >> + * in a singleton group is allowed to ignore setting the >> + * drv.no_kernel_api_dma field. > > BTW why is this call 'no kernel api dma' ? That reads backwards 'no > kernel dma api' right? Yes. Need to rephrase this wording. > > Aother appeal of putting no_kernel_api_dma in the struct device_driver > is that this could could simply do 'dev->driver->no_kernel_api_dma' to > figure out how it is being called and avoid this messy implicitness. Yes. > > Once we know our calling context we can always automatic switch from > DMA API mode to another domain without any trouble or special > counters: > > if (!dev->driver->no_kernel_api_dma) { > if (group->owner_cnt > 1 || group->owner) > return -EBUSY; > return __iommu_attach_group(domain, group); > } Is there any lock issue when referencing dev->driver here? I guess this requires iommu_attach_device() only being called during the driver life (a.k.a. between driver .probe and .release). > > if (!group->owner_cnt) { > ret = __iommu_attach_group(domain, group); > if (ret) > return ret; > } else if (group->owner || group->domain != domain) > return -EBUSY; > group->owner_cnt++; > > Right? Yes. It's more straightforward if there's no issue around dev->driver referencing. > >> + if (!group->attach_cnt) { >> + ret = __iommu_attach_group(domain, group); > > How come we don't have to detatch the default domain here? Doesn't > that mean that the iommu_replace_group could also just call attach > directly without going through detatch? __iommu_attach_group() allows replacing the default domain with a private domain. Corresponding __iommu_detach_group() automatically replaces private domain with the default domain. The auto-switch logic should not apply to iommu_group_replace_domain() which is designed for components with iommu_set_dma_owner() called. > > Jason > Best regards, baolu
On Fri, Jan 07, 2022 at 09:14:38AM +0800, Lu Baolu wrote: > > Once we know our calling context we can always automatic switch from > > DMA API mode to another domain without any trouble or special > > counters: > > > > if (!dev->driver->no_kernel_api_dma) { > > if (group->owner_cnt > 1 || group->owner) > > return -EBUSY; > > return __iommu_attach_group(domain, group); > > } > > Is there any lock issue when referencing dev->driver here? I guess this > requires iommu_attach_device() only being called during the driver life > (a.k.a. between driver .probe and .release). Yes, that is correct. That would need to be documented. It is the same reason the routine was able to get the group from the dev. The dev's group must be stable so long as a driver is attached or everything is broken :) Much of the group refcounting code is useless for this reason. The group simply cannot be concurrently destroyed in these contexts. Jason
On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote: > int iommu_attach_device(struct iommu_domain *domain, struct device *dev) > { > struct iommu_group *group; > - int ret; > + int ret = 0; > + > + if (domain->type != IOMMU_DOMAIN_UNMANAGED) > + return -EINVAL; > > group = iommu_group_get(dev); > if (!group) > return -ENODEV; > > - /* > - * Lock the group to make sure the device-count doesn't > - * change while we are attaching > - */ > mutex_lock(&group->mutex); > - ret = -EINVAL; > - if (iommu_group_device_count(group) != 1) > - goto out_unlock; > + if (group->owner_cnt) { > + /* > + * Group has been used for kernel-api dma or claimed explicitly > + * for exclusive occupation. For backward compatibility, device > + * in a singleton group is allowed to ignore setting the > + * drv.no_kernel_api_dma field. > + */ > + if ((group->domain == group->default_domain && > + iommu_group_device_count(group) != 1) || > + group->owner) { > + ret = -EBUSY; > + goto unlock_out; > + } > + } > > - ret = __iommu_attach_group(domain, group); > + if (!group->attach_cnt) { > + ret = __iommu_attach_group(domain, group); > + if (ret) > + goto unlock_out; > + } else { > + if (group->domain != domain) { > + ret = -EPERM; > + goto unlock_out; > + } > + } > > -out_unlock: > + group->owner_cnt++; > + group->attach_cnt++; > + > +unlock_out: > mutex_unlock(&group->mutex); > iommu_group_put(group); This extends iommu_attach_device() to behave as iommu_attach_group(), changing the domain for the whole group. Wouldn't it be better to scrap the iommu_attach_device() interface instead and only rely on iommu_attach_group()? This way it is clear that a call changes the whole group. IIUC this work is heading towards allowing multiple domains in one group as long as the group is owned by one entity. That is a valid requirement, but the way to get there is in my eyes: 1) Introduce a concept of a sub-group (or whatever we want to call it), which groups devices together which must be in the same domain because they use the same request ID and thus look all the same to the IOMMU. 2) Keep todays IOMMU groups to group devices together which can bypass the IOMMU when talking to each other, like multi-function devices and devices behind a no-ACS bridge. 3) Rework group->domain and group->default_domain, eventually moving them to sub-groups. This is an important distinction to make and also the reason the iommu_attach/detach_device() interface will always be misleading. Item 1) in this list will also be beneficial to other parts of the iommu code, namely iommu-dma code which can have finer-grained DMA-API domains with sub-groups instead of groups. Regards, Joerg
On Mon, Feb 14, 2022 at 12:39:36PM +0100, Joerg Roedel wrote: > This extends iommu_attach_device() to behave as iommu_attach_group(), > changing the domain for the whole group. Of course, the only action to take is to change the domain of a group.. > Wouldn't it be better to scrap the iommu_attach_device() interface > instead and only rely on iommu_attach_group()? This way it is clear > that a call changes the whole group. From an API design perspective drivers should never touch groups - they have struct devices, they should have a clean struct device based API. Groups should disappear into an internal implementation detail, not be so prominent in the API. > IIUC this work is heading towards allowing multiple domains in one group > as long as the group is owned by one entity. No, it isn't. This work is only about properly arbitrating which single domain is attached to an entire group. > 1) Introduce a concept of a sub-group (or whatever we want to > call it), which groups devices together which must be in the > same domain because they use the same request ID and thus > look all the same to the IOMMU. > > 2) Keep todays IOMMU groups to group devices together which can > bypass the IOMMU when talking to each other, like > multi-function devices and devices behind a no-ACS bridge. We've talked about all these details before and nobody has thought they are important enough to implement. This distinction is not the goal of this series. I think if someone did want to do this there is room in the API to allow the distinction between 1 (must share) and 2 (sharing is insecure). eg by checking owner and blocking mixing user/kernel. This is another reason to stick with the device centric API as if we did someday want multi-domain groups then the device input is still the correct input and the iommu code can figure out what sub-groups or whatever transparently. Jason
On Mon, Feb 14, 2022 at 09:03:13AM -0400, Jason Gunthorpe wrote: > Groups should disappear into an internal implementation detail, not be > so prominent in the API. Not going to happen, IOMMU groups are ABI and todays device assignment code, including user-space, relies on them. Groups implement and important aspect of hardware IOMMUs that the API can not abstract away: That there are devices which share the same request-id. This is not an issue for devices concerned by iommufd, but for legacy device assignment it is. The IOMMU-API needs to handle both in a clean API, even if it means that drivers need to lookup the sub-group of a device first. And I don't see how a per-device API can handle both in a device-centric way. For sure it is not making it 'device centric but operate on groups under the hood'. Regards, Joerg
On 2022-02-14 14:39, Joerg Roedel wrote: > On Mon, Feb 14, 2022 at 09:03:13AM -0400, Jason Gunthorpe wrote: >> Groups should disappear into an internal implementation detail, not be >> so prominent in the API. > > Not going to happen, IOMMU groups are ABI and todays device assignment > code, including user-space, relies on them. > > Groups implement and important aspect of hardware IOMMUs that the API > can not abstract away: That there are devices which share the same > request-id. > > This is not an issue for devices concerned by iommufd, but for legacy > device assignment it is. The IOMMU-API needs to handle both in a clean > API, even if it means that drivers need to lookup the sub-group of a > device first. > > And I don't see how a per-device API can handle both in a device-centric > way. For sure it is not making it 'device centric but operate on groups > under the hood'. Arguably, iommu_attach_device() could be renamed something like iommu_attach_group_for_dev(), since that's effectively the semantic that all the existing API users want anyway (even VFIO at the high level - the group is the means for the user to assign their GPU/NIC/whatever device to their process, not the end in itself). That's just a lot more churn. It's not that callers should be blind to the entire concept of groups altogether - they remain a significant reason why iommu_attach_device() might fail, for one thing - however what callers really shouldn't need to be bothered with is the exact *implementation* of groups. I do actually quite like the idea of refining the group abstraction into isolation groups as a superset of alias groups, but if anything that's a further argument for not having the guts of the current abstraction exposed in places that don't need to care - otherwise that would be liable to be a microcosm of this series in itself: widespread churn vs. "same name, new meaning" compromises. Robin.
On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote: > Arguably, iommu_attach_device() could be renamed something like > iommu_attach_group_for_dev(), since that's effectively the semantic that all > the existing API users want anyway (even VFIO at the high level - the group > is the means for the user to assign their GPU/NIC/whatever device to their > process, not the end in itself). That's just a lot more churn. Right > It's not that callers should be blind to the entire concept of groups > altogether - they remain a significant reason why iommu_attach_device() > might fail, for one thing - however what callers really shouldn't need to be > bothered with is the exact *implementation* of groups. I do actually quite > like the idea of refining the group abstraction into isolation groups as a > superset of alias groups, but if anything that's a further argument for not > having the guts of the current abstraction exposed in places that don't need > to care - otherwise that would be liable to be a microcosm of this series in > itself: widespread churn vs. "same name, new meaning" compromises. Exactly, groups should not leak out through the abstraction more than necessary. If the caller can't do anything with the group information then it shouldn't touch it. VFIO needs them because its uAPI is tied, but even so we keep talking about ways to narrow the amount of group API it consumes. We should not set the recommended/good kAPI based on VFIOs uAPI design. Jason
On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote: > On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote: > > > Arguably, iommu_attach_device() could be renamed something like > > iommu_attach_group_for_dev(), since that's effectively the semantic that all > > the existing API users want anyway (even VFIO at the high level - the group > > is the means for the user to assign their GPU/NIC/whatever device to their > > process, not the end in itself). That's just a lot more churn. > > Right Okay, good point. I can live with an iommu_attach_group_for_dev() interface, it is still better than making iommu_attach_device() silently operate on whole groups. > VFIO needs them because its uAPI is tied, but even so we keep talking > about ways to narrow the amount of group API it consumes. > > We should not set the recommended/good kAPI based on VFIOs uAPI > design. Agree here too. The current way groups are implemented can be turned into a VFIO specific interface to keep its semantics and kABI. But the IOMMU core code still needs the concept of alias groups. Regards, Joerg
On Tue, Feb 15, 2022 at 09:58:13AM +0100, Joerg Roedel wrote: > On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote: > > On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote: > > > > > Arguably, iommu_attach_device() could be renamed something like > > > iommu_attach_group_for_dev(), since that's effectively the semantic that all > > > the existing API users want anyway (even VFIO at the high level - the group > > > is the means for the user to assign their GPU/NIC/whatever device to their > > > process, not the end in itself). That's just a lot more churn. > > > > Right > > Okay, good point. I can live with an iommu_attach_group_for_dev() > interface, it is still better than making iommu_attach_device() silently > operate on whole groups. I think this is what Lu's series currently does, it just doesn't do the rename churn as Robin noted. Lu, why not add a note like Robin explained to the kdoc so it is clear this api impacts the whole group? There is no argument that the internal operation of the iommu layer should always be using groups - we are just presenting a simplified API toward drivers. Jason
On 2/15/22 9:47 PM, Jason Gunthorpe via iommu wrote: > On Tue, Feb 15, 2022 at 09:58:13AM +0100, Joerg Roedel wrote: >> On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote: >>> On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote: >>> >>>> Arguably, iommu_attach_device() could be renamed something like >>>> iommu_attach_group_for_dev(), since that's effectively the semantic that all >>>> the existing API users want anyway (even VFIO at the high level - the group >>>> is the means for the user to assign their GPU/NIC/whatever device to their >>>> process, not the end in itself). That's just a lot more churn. >>> >>> Right >> >> Okay, good point. I can live with an iommu_attach_group_for_dev() >> interface, it is still better than making iommu_attach_device() silently >> operate on whole groups. > > I think this is what Lu's series currently does, it just doesn't do > the rename churn as Robin noted. Lu, why not add a note like Robin > explained to the kdoc so it is clear this api impacts the whole group? I feel that the debate here is not about API name, but how should iommu_attach/detach_device() be implemented and used. It seems everyone agrees that for device assignment (where the I/O address is owned by the user-space application), the iommu_group-based APIs should always be used. Otherwise, the isolation and protection are not guaranteed. For kernel DMA (where the I/O address space is owned by the kernel drivers), the device driver oriented interface should meet below expectations: - the concept of iommu_group should be transparent to the device drivers; - but internally, iommu core only allows a single domain to attach to a group. If the device driver uses default domain, the above expectations are naturally met. But when the driver wants to attach its own domain, the problem arises. This series tries to use the DMA ownership mechanism to solve this. The devices drivers explicitly declare that - I know that the device I am driving shares the iommu_group with others; - Other device drivers with the same awareness can only be bound to the devices in the shared group; - We can sync with each other so that only a shared domain could be attached to the devices in the group. Another proposal (as suggested by Joerg) is to introduce the concept of "sub-group". An iommu group could have one or multiple sub-groups with non-aliased devices sitting in different sub-groups and use different domains. Above are what I get so far. If there's any misunderstanding, please help to correct. Best regards, baolu
On Wed, Feb 16, 2022 at 02:28:09PM +0800, Lu Baolu wrote: > It seems everyone agrees that for device assignment (where the I/O > address is owned by the user-space application), the iommu_group-based > APIs should always be used. Otherwise, the isolation and protection are > not guaranteed. This group/device split is all just driven by VFIO. There is nothing preventing a struct device * API from being used with user-space, and Robin has been pushing that way. With enough fixing of VFIO we can do it. eg the device-centric VFIO patches should be able to eventually work entirely on an iommu device API. > Another proposal (as suggested by Joerg) is to introduce the concept of > "sub-group". An iommu group could have one or multiple sub-groups with > non-aliased devices sitting in different sub-groups and use different > domains. I still don't see how sub groups help or really change anything here. The API already has the concept of 'ownership' seperated from the concept of 'attach a domain to a device'. Ownership works on the ACS group and attach works on the 'same RID' group. The API can take in the struct device and select which internal group to use based on which action is being done. Jason
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ab8ab95969f5..2c9efd85e447 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -47,6 +47,7 @@ struct iommu_group { struct iommu_domain *domain; struct list_head entry; unsigned int owner_cnt; + unsigned int attach_cnt; void *owner; }; @@ -1921,27 +1922,59 @@ static int __iommu_attach_device(struct iommu_domain *domain, return ret; } +/** + * iommu_attach_device() - attach external or UNMANAGED domain to device + * @domain: the domain about to attach + * @dev: the device about to be attached + * + * For devices belonging to the same group, iommu_device_use_dma_api() and + * iommu_attach_device() are exclusive. Therefore, when drivers decide to + * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() + * at the same time. + */ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; - int ret; + int ret = 0; + + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; group = iommu_group_get(dev); if (!group) return -ENODEV; - /* - * Lock the group to make sure the device-count doesn't - * change while we are attaching - */ mutex_lock(&group->mutex); - ret = -EINVAL; - if (iommu_group_device_count(group) != 1) - goto out_unlock; + if (group->owner_cnt) { + /* + * Group has been used for kernel-api dma or claimed explicitly + * for exclusive occupation. For backward compatibility, device + * in a singleton group is allowed to ignore setting the + * drv.no_kernel_api_dma field. + */ + if ((group->domain == group->default_domain && + iommu_group_device_count(group) != 1) || + group->owner) { + ret = -EBUSY; + goto unlock_out; + } + } - ret = __iommu_attach_group(domain, group); + if (!group->attach_cnt) { + ret = __iommu_attach_group(domain, group); + if (ret) + goto unlock_out; + } else { + if (group->domain != domain) { + ret = -EPERM; + goto unlock_out; + } + } -out_unlock: + group->owner_cnt++; + group->attach_cnt++; + +unlock_out: mutex_unlock(&group->mutex); iommu_group_put(group); @@ -2182,23 +2215,35 @@ static void __iommu_detach_device(struct iommu_domain *domain, trace_detach_device_from_domain(dev); } +/** + * iommu_detach_device() - detach external or UNMANAGED domain from device + * @domain: the domain about to detach + * @dev: the device about to be detached + * + * Paired with iommu_attach_device(), it detaches the domain from the device. + */ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; + if (WARN_ON(domain->type != IOMMU_DOMAIN_UNMANAGED)) + return; + group = iommu_group_get(dev); - if (!group) + if (WARN_ON(!group)) return; mutex_lock(&group->mutex); - if (iommu_group_device_count(group) != 1) { - WARN_ON(1); - goto out_unlock; - } + if (WARN_ON(!group->attach_cnt || !group->owner_cnt || + group->domain != domain)) + goto unlock_out; - __iommu_detach_group(domain, group); + group->attach_cnt--; + group->owner_cnt--; + if (!group->attach_cnt) + __iommu_detach_group(domain, group); -out_unlock: +unlock_out: mutex_unlock(&group->mutex); iommu_group_put(group); }