Message ID | 3-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add iommufd physical device operations for replace and alloc hwpt | expand |
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, March 22, 2023 3:15 AM > > /* > - * FIXME: Hack around missing a device-centric iommu api, only > attach to > - * the group once for the first device that is in the group. > + * Only attach to the group once for the first device that is in the > + * group. All the other devices will follow this attachment. The user > + * should attach every device individually to as the per-device > reserved "individually to the hwpt" > > - mutex_lock(&hwpt->devices_lock); > + mutex_lock(&idev->igroup->lock); > > /* > * immediate_attach exists only to accommodate iommu drivers that > cannot > * directly allocate a domain. These drivers do not finish creating the > * domain until attach is completed. Thus we must have this call > * sequence. Once those drivers are fixed this should be removed. > + * > + * Note we hold the igroup->lock here which prevents any other > thread > + * from observing igroup->hwpt until we finish setting it up. > */ > if (immediate_attach) { > rc = iommufd_hw_pagetable_attach(hwpt, idev); I thought about whether holding igroup->lock is necessary here. The caller should avoid racing attach/detach/replace on the same device. Then the only possible race would come from other devices in the group. But if above attach call succeeds it implies that all other devices must haven't been attached yet then the only allowed operation is to attach them to the same ioas as the calling device does (replace cannot happen w/o attach first). Then it's already protected by ioas->mutex in iommufd_device_auto_get_domain(). If no oversight then we can directly put the lock in iommufd_hw_pagetable_attach/detach() which can also simplify a bit on its callers in device.c.
On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, March 22, 2023 3:15 AM > > > > /* > > - * FIXME: Hack around missing a device-centric iommu api, only > > attach to > > - * the group once for the first device that is in the group. > > + * Only attach to the group once for the first device that is in the > > + * group. All the other devices will follow this attachment. The user > > + * should attach every device individually to as the per-device > > reserved > > "individually to the hwpt" Done > I thought about whether holding igroup->lock is necessary here. > > The caller should avoid racing attach/detach/replace on the same device. I think even if the caller races we should be fine The point of the lock scope was the capture these lines: rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); if (rc) goto out_detach; list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); But based on the current arrangement none of them rely on the igroup mutex so it does seem we can narrow it Thanks, Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, March 23, 2023 10:24 PM > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Wednesday, March 22, 2023 3:15 AM > > > > > > /* > > > - * FIXME: Hack around missing a device-centric iommu api, only > > > attach to > > > - * the group once for the first device that is in the group. > > > + * Only attach to the group once for the first device that is in the > > > + * group. All the other devices will follow this attachment. The user > > > + * should attach every device individually to as the per-device > > > reserved > > > > "individually to the hwpt" > > Done > > > I thought about whether holding igroup->lock is necessary here. > > > > The caller should avoid racing attach/detach/replace on the same device. > > I think even if the caller races we should be fine If vfio races attach/detach then lots of things are messed. e.g. iommufd_device_detach() directly calls list_del(&idev->group_item) w/o checking whether the device has been attached. And with that race UAF could occur if we narrow down the lock scope to iommufd_hw_pagetable_attach(): cpu0 cpu1 vfio_iommufd_attach() iommufd_device_attach() iommufd_device_auto_get_domain() mutex_lock(&ioas->mutex); iommufd_hw_pagetable_alloc() hwpt = iommufd_object_alloc() //hwpt.users=1 hwpt->domain = iommu_domain_alloc(idev->dev->bus); iommufd_hw_pagetable_attach() //hwpt.users=2 vfio_iommufd_detach() iommufd_device_detach() mutex_lock(&idev->igroup->lock); hwpt = iommufd_hw_pagetable_detach() mutex_unlock(&idev->igroup->lock); iommufd_hw_pagetable_put(hwpt) iommufd_object_destroy_user(hwpt) //hwpt.users=0 iommufd_hw_pagetable_destroy(hwpt) iommu_domain_free(hwpt->domain); iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF From past discussion we assumed the calling driver i.e. vfio should do the right thing e.g. by holding dev_set->lock otherwise itself is already messed. igroup->lock is really for protection cross devices in the group. But as pointed out below we can narrow its scope in this function as another device cannot detach from this hwpt w/o first attaching to it which is already protected by ioas->mutex. > > The point of the lock scope was the capture these lines: > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > if (rc) > goto out_detach; > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > But based on the current arrangement none of them rely on the igroup > mutex so it does seem we can narrow it >
On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote: > If vfio races attach/detach then lots of things are messed. > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item) > w/o checking whether the device has been attached. Yeah, you obviously can't race attach/detach or detach/replace > And with that race UAF could occur if we narrow down the lock scope > to iommufd_hw_pagetable_attach(): > > cpu0 cpu1 > vfio_iommufd_attach() > iommufd_device_attach() > iommufd_device_auto_get_domain() > mutex_lock(&ioas->mutex); > iommufd_hw_pagetable_alloc() > hwpt = iommufd_object_alloc() //hwpt.users=1 > hwpt->domain = iommu_domain_alloc(idev->dev->bus); > iommufd_hw_pagetable_attach() //hwpt.users=2 > vfio_iommufd_detach() > iommufd_device_detach() > mutex_lock(&idev->igroup->lock); > hwpt = iommufd_hw_pagetable_detach() > mutex_unlock(&idev->igroup->lock); > iommufd_hw_pagetable_put(hwpt) > iommufd_object_destroy_user(hwpt) //hwpt.users=0 > iommufd_hw_pagetable_destroy(hwpt) > iommu_domain_free(hwpt->domain); > iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF You didn't balance the refcounts properly, the cpu1 put will get to hwpt.users=1 There is a refcount_inc in iommufd_hw_pagetable_attach(), the iommufd_hw_pagetable_alloc() retains its reference and so the domain is guarenteed valid Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, March 24, 2023 11:03 PM > > On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote: > > > If vfio races attach/detach then lots of things are messed. > > > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item) > > w/o checking whether the device has been attached. > > Yeah, you obviously can't race attach/detach or detach/replace > > > And with that race UAF could occur if we narrow down the lock scope > > to iommufd_hw_pagetable_attach(): > > > > cpu0 cpu1 > > vfio_iommufd_attach() > > iommufd_device_attach() > > iommufd_device_auto_get_domain() > > mutex_lock(&ioas->mutex); > > iommufd_hw_pagetable_alloc() > > hwpt = iommufd_object_alloc() //hwpt.users=1 > > hwpt->domain = iommu_domain_alloc(idev->dev->bus); > > iommufd_hw_pagetable_attach() //hwpt.users=2 > > vfio_iommufd_detach() > > iommufd_device_detach() > > mutex_lock(&idev->igroup->lock); > > hwpt = iommufd_hw_pagetable_detach() > > mutex_unlock(&idev->igroup->lock); > > iommufd_hw_pagetable_put(hwpt) > > iommufd_object_destroy_user(hwpt) > //hwpt.users=0 > > iommufd_hw_pagetable_destroy(hwpt) > > iommu_domain_free(hwpt->domain); > > iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF > > You didn't balance the refcounts properly, the cpu1 put will get to > hwpt.users=1 > iommufd_object_destroy_user() decrements the count twice if the value is two: refcount_dec(&obj->users); if (!refcount_dec_if_one(&obj->users)) {
On Tue, Mar 28, 2023 at 02:32:22AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Friday, March 24, 2023 11:03 PM > > > > On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote: > > > > > If vfio races attach/detach then lots of things are messed. > > > > > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item) > > > w/o checking whether the device has been attached. > > > > Yeah, you obviously can't race attach/detach or detach/replace > > > > > And with that race UAF could occur if we narrow down the lock scope > > > to iommufd_hw_pagetable_attach(): > > > > > > cpu0 cpu1 > > > vfio_iommufd_attach() > > > iommufd_device_attach() > > > iommufd_device_auto_get_domain() > > > mutex_lock(&ioas->mutex); > > > iommufd_hw_pagetable_alloc() > > > hwpt = iommufd_object_alloc() //hwpt.users=1 > > > hwpt->domain = iommu_domain_alloc(idev->dev->bus); > > > iommufd_hw_pagetable_attach() //hwpt.users=2 > > > vfio_iommufd_detach() > > > iommufd_device_detach() > > > mutex_lock(&idev->igroup->lock); > > > hwpt = iommufd_hw_pagetable_detach() > > > mutex_unlock(&idev->igroup->lock); > > > iommufd_hw_pagetable_put(hwpt) > > > iommufd_object_destroy_user(hwpt) > > //hwpt.users=0 > > > iommufd_hw_pagetable_destroy(hwpt) > > > iommu_domain_free(hwpt->domain); > > > iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF > > > > You didn't balance the refcounts properly, the cpu1 put will get to > > hwpt.users=1 > > > > iommufd_object_destroy_user() decrements the count twice if the value > is two: > > refcount_dec(&obj->users); > if (!refcount_dec_if_one(&obj->users)) { Ohh, it isn't allowed to call iommufd_object_destroy_user() until finalize has passed.. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 28, 2023 7:39 PM > > On Tue, Mar 28, 2023 at 02:32:22AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Friday, March 24, 2023 11:03 PM > > > > > > On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote: > > > > > > > If vfio races attach/detach then lots of things are messed. > > > > > > > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item) > > > > w/o checking whether the device has been attached. > > > > > > Yeah, you obviously can't race attach/detach or detach/replace > > > > > > > And with that race UAF could occur if we narrow down the lock scope > > > > to iommufd_hw_pagetable_attach(): > > > > > > > > cpu0 cpu1 > > > > vfio_iommufd_attach() > > > > iommufd_device_attach() > > > > iommufd_device_auto_get_domain() > > > > mutex_lock(&ioas->mutex); > > > > iommufd_hw_pagetable_alloc() > > > > hwpt = iommufd_object_alloc() //hwpt.users=1 > > > > hwpt->domain = iommu_domain_alloc(idev->dev->bus); > > > > iommufd_hw_pagetable_attach() //hwpt.users=2 > > > > vfio_iommufd_detach() > > > > iommufd_device_detach() > > > > mutex_lock(&idev->igroup->lock); > > > > hwpt = iommufd_hw_pagetable_detach() > > > > mutex_unlock(&idev->igroup->lock); > > > > iommufd_hw_pagetable_put(hwpt) > > > > iommufd_object_destroy_user(hwpt) > > > //hwpt.users=0 > > > > iommufd_hw_pagetable_destroy(hwpt) > > > > iommu_domain_free(hwpt->domain); > > > > iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF > > > > > > You didn't balance the refcounts properly, the cpu1 put will get to > > > hwpt.users=1 > > > > > > > iommufd_object_destroy_user() decrements the count twice if the value > > is two: > > > > refcount_dec(&obj->users); > > if (!refcount_dec_if_one(&obj->users)) { > > Ohh, it isn't allowed to call iommufd_object_destroy_user() until > finalize has passed.. > ah you are right. In this case iommufd_get_object() will fail in the first place.
On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > If no oversight then we can directly put the lock in > iommufd_hw_pagetable_attach/detach() which can also simplify a bit on > its callers in device.c. So, I did this, and syzkaller explains why this can't be done: https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com We can't allow the hwpt to be discovered by a parallel iommufd_hw_pagetable_attach() until it is done being setup, otherwise if we fail to set it up we can't destroy the hwpt. if (immediate_attach) { rc = iommufd_hw_pagetable_attach(hwpt, idev); if (rc) goto out_abort; } rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); if (rc) goto out_detach; list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); return hwpt; out_detach: if (immediate_attach) iommufd_hw_pagetable_detach(idev); out_abort: iommufd_object_abort_and_destroy(ictx, &hwpt->obj); As some other idev could be pointing at it too now. So the lock has to come back out.. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, April 11, 2023 10:31 PM > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > If no oversight then we can directly put the lock in > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit on > > its callers in device.c. > > So, I did this, and syzkaller explains why this can't be done: > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com > > We can't allow the hwpt to be discovered by a parallel > iommufd_hw_pagetable_attach() until it is done being setup, otherwise > if we fail to set it up we can't destroy the hwpt. > > if (immediate_attach) { > rc = iommufd_hw_pagetable_attach(hwpt, idev); > if (rc) > goto out_abort; > } > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > if (rc) > goto out_detach; > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > return hwpt; > > out_detach: > if (immediate_attach) > iommufd_hw_pagetable_detach(idev); > out_abort: > iommufd_object_abort_and_destroy(ictx, &hwpt->obj); > > As some other idev could be pointing at it too now. How could this happen before this object is finalized? iirc you pointed to me this fact in previous discussion. For this specific lockdep issue isn't the simple fix is to move the group lock into iommufd_hw_pagetable_detach() just like done in attach()? > > So the lock has to come back out.. > > Jason
On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Tuesday, April 11, 2023 10:31 PM > > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > > > If no oversight then we can directly put the lock in > > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit on > > > its callers in device.c. > > > > So, I did this, and syzkaller explains why this can't be done: > > > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com > > > > We can't allow the hwpt to be discovered by a parallel > > iommufd_hw_pagetable_attach() until it is done being setup, otherwise > > if we fail to set it up we can't destroy the hwpt. > > > > if (immediate_attach) { > > rc = iommufd_hw_pagetable_attach(hwpt, idev); > > if (rc) > > goto out_abort; > > } > > > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > > if (rc) > > goto out_detach; > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > return hwpt; > > > > out_detach: > > if (immediate_attach) > > iommufd_hw_pagetable_detach(idev); > > out_abort: > > iommufd_object_abort_and_destroy(ictx, &hwpt->obj); > > > > As some other idev could be pointing at it too now. > > How could this happen before this object is finalized? iirc you pointed to > me this fact in previous discussion. It only is unavailable through the xarray, but we've added it to at least one internal list on the group already, it is kind of sketchy to work like this, it should all be atomic.. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, April 12, 2023 7:18 PM > > On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Tuesday, April 11, 2023 10:31 PM > > > > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > > > > > If no oversight then we can directly put the lock in > > > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit > on > > > > its callers in device.c. > > > > > > So, I did this, and syzkaller explains why this can't be done: > > > > > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com > > > > > > We can't allow the hwpt to be discovered by a parallel > > > iommufd_hw_pagetable_attach() until it is done being setup, otherwise > > > if we fail to set it up we can't destroy the hwpt. > > > > > > if (immediate_attach) { > > > rc = iommufd_hw_pagetable_attach(hwpt, idev); > > > if (rc) > > > goto out_abort; > > > } > > > > > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > > > if (rc) > > > goto out_detach; > > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > > return hwpt; > > > > > > out_detach: > > > if (immediate_attach) > > > iommufd_hw_pagetable_detach(idev); > > > out_abort: > > > iommufd_object_abort_and_destroy(ictx, &hwpt->obj); > > > > > > As some other idev could be pointing at it too now. > > > > How could this happen before this object is finalized? iirc you pointed to > > me this fact in previous discussion. > > It only is unavailable through the xarray, but we've added it to at > least one internal list on the group already, it is kind of sketchy to > work like this, it should all be atomic.. > which internal list? group has a list for attached devices but regarding to hwpt it's stored in a single field igroup->hwpt. with that being set in iommufd_hw_pagetable_attach() another device cannot race attaching to a different ioas/hwpt (mismatching igroup->hwpt) or the same hwpt being created (not finalized and holding ioas->mutex). So although it's added to group none will reference its state before it's finalized. btw removing this lock in this file also makes it easier to support siov device which doesn't have group. We can have internal group attach and pasid attach wrappers within device.c and leave igroup->lock held in the group attach path. Otherwise we'll have to create a locking wrapper used in this file to touch igroup->lock in particular for iommufd_device which has a igroup object.
On Thu, Apr 13, 2023 at 02:52:54AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, April 12, 2023 7:18 PM > > > > On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > Sent: Tuesday, April 11, 2023 10:31 PM > > > > > > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > > > > > > > If no oversight then we can directly put the lock in > > > > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit > > on > > > > > its callers in device.c. > > > > > > > > So, I did this, and syzkaller explains why this can't be done: > > > > > > > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com > > > > > > > > We can't allow the hwpt to be discovered by a parallel > > > > iommufd_hw_pagetable_attach() until it is done being setup, otherwise > > > > if we fail to set it up we can't destroy the hwpt. > > > > > > > > if (immediate_attach) { > > > > rc = iommufd_hw_pagetable_attach(hwpt, idev); > > > > if (rc) > > > > goto out_abort; > > > > } > > > > > > > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > > > > if (rc) > > > > goto out_detach; > > > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > > > return hwpt; > > > > > > > > out_detach: > > > > if (immediate_attach) > > > > iommufd_hw_pagetable_detach(idev); > > > > out_abort: > > > > iommufd_object_abort_and_destroy(ictx, &hwpt->obj); > > > > > > > > As some other idev could be pointing at it too now. > > > > > > How could this happen before this object is finalized? iirc you pointed to > > > me this fact in previous discussion. > > > > It only is unavailable through the xarray, but we've added it to at > > least one internal list on the group already, it is kind of sketchy to > > work like this, it should all be atomic.. > > > > which internal list? group has a list for attached devices but regarding > to hwpt it's stored in a single field igroup->hwpt. It is added to list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); Which can be observed from mutex_lock(&ioas->mutex); list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { if (!hwpt->auto_domain) continue; if (!iommufd_lock_obj(&hwpt->obj)) continue; If iommufd_lock_obj() has happened then iommufd_object_abort_and_destroy() is in trouble. Thus we need to hold the ioas->mutex right up until we know we can't call iommufd_object_abort_and_destroy(), or lift out the hwpt list_add This could maybe also be fixed by holding the destroy_rw_sem right up until finalize. Though, I think I looked at this once and decided against it for some reason.. > btw removing this lock in this file also makes it easier to support siov > device which doesn't have group. We can have internal group attach > and pasid attach wrappers within device.c and leave igroup->lock held > in the group attach path. Yeah, I expect this will need more work when we get to PASID support Most likely the resolution will be something like PASID domains can't be used as PF/VF domains because they don't have the right reserved regions, so they shouldn't be in the hwpt_list at all, so we can use a more relaxed locking. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Friday, April 14, 2023 9:32 PM > > On Thu, Apr 13, 2023 at 02:52:54AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Wednesday, April 12, 2023 7:18 PM > > > > > > On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote: > > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > > Sent: Tuesday, April 11, 2023 10:31 PM > > > > > > > > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote: > > > > > > > > > > > If no oversight then we can directly put the lock in > > > > > > iommufd_hw_pagetable_attach/detach() which can also simplify a > bit > > > on > > > > > > its callers in device.c. > > > > > > > > > > So, I did this, and syzkaller explains why this can't be done: > > > > > > > > > > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com > > > > > > > > > > We can't allow the hwpt to be discovered by a parallel > > > > > iommufd_hw_pagetable_attach() until it is done being setup, > otherwise > > > > > if we fail to set it up we can't destroy the hwpt. > > > > > > > > > > if (immediate_attach) { > > > > > rc = iommufd_hw_pagetable_attach(hwpt, idev); > > > > > if (rc) > > > > > goto out_abort; > > > > > } > > > > > > > > > > rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); > > > > > if (rc) > > > > > goto out_detach; > > > > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > > > > return hwpt; > > > > > > > > > > out_detach: > > > > > if (immediate_attach) > > > > > iommufd_hw_pagetable_detach(idev); > > > > > out_abort: > > > > > iommufd_object_abort_and_destroy(ictx, &hwpt->obj); > > > > > > > > > > As some other idev could be pointing at it too now. > > > > > > > > How could this happen before this object is finalized? iirc you pointed to > > > > me this fact in previous discussion. > > > > > > It only is unavailable through the xarray, but we've added it to at > > > least one internal list on the group already, it is kind of sketchy to > > > work like this, it should all be atomic.. > > > > > > > which internal list? group has a list for attached devices but regarding > > to hwpt it's stored in a single field igroup->hwpt. > > It is added to > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); this is called under ioas->mutex. > > Which can be observed from > > mutex_lock(&ioas->mutex); > list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { > if (!hwpt->auto_domain) > continue; > > if (!iommufd_lock_obj(&hwpt->obj)) > continue; this is called also under ioas->mutex. So no race.
On Thu, Apr 20, 2023 at 06:15:16AM +0000, Tian, Kevin wrote: > > > which internal list? group has a list for attached devices but regarding > > > to hwpt it's stored in a single field igroup->hwpt. > > > > It is added to > > > > list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); > > this is called under ioas->mutex. Yes.. But.. that is troubled too, we are calling destroy under the same mutex, there is a missing a fault point to catch it in the test, and hwpt_alloc doesn't have the lock wide enough :\ So you want to argue that it is safe to do this: mutex_lock(&ioas->mutex); alloc attach detach abort mutex_unlock(&ioas->mutex); Even if attach/detach lock/unlock the group mutex during their cycle? It seems OK.. I don't see any places that Though I don't much like the locking pattern where we succeed attach, drop all the locks and the fail and then relock and do error unwind.. Sketchy.. Jason
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 3fd623208c691f..66de0274d65629 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -20,9 +20,12 @@ static void iommufd_group_release(struct kref *kref) struct iommufd_group *igroup = container_of(kref, struct iommufd_group, ref); + WARN_ON(igroup->hwpt || !list_empty(&igroup->device_list)); + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, NULL, GFP_KERNEL); iommu_group_put(igroup->group); + mutex_destroy(&igroup->lock); kfree(igroup); } @@ -83,6 +86,8 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, } kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + INIT_LIST_HEAD(&new_igroup->device_list); /* group reference moves into new_igroup */ new_igroup->group = group; @@ -277,28 +282,15 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, return 0; } -static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt, - struct iommufd_group *igroup) -{ - struct iommufd_device *cur_dev; - - lockdep_assert_held(&hwpt->devices_lock); - - list_for_each_entry(cur_dev, &hwpt->devices, devices_item) - if (cur_dev->igroup->group == igroup->group) - return true; - return false; -} - int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { phys_addr_t sw_msi_start = PHYS_ADDR_MAX; int rc; - lockdep_assert_held(&hwpt->devices_lock); + lockdep_assert_held(&idev->igroup->lock); - if (WARN_ON(idev->hwpt)) + if (idev->igroup->hwpt != NULL && idev->igroup->hwpt != hwpt) return -EINVAL; /* @@ -313,7 +305,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, hwpt->domain->ops->enforce_cache_coherency( hwpt->domain); if (!hwpt->enforce_cache_coherency) { - WARN_ON(list_empty(&hwpt->devices)); + WARN_ON(list_empty(&idev->igroup->device_list)); return -EINVAL; } } @@ -329,26 +321,40 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unresv; /* - * FIXME: Hack around missing a device-centric iommu api, only attach to - * the group once for the first device that is in the group. + * Only attach to the group once for the first device that is in the + * group. All the other devices will follow this attachment. The user + * should attach every device individually to as the per-device reserved + * regions are only updated during individual device attachment. */ - if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) { + if (list_empty(&idev->igroup->device_list)) { rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; + idev->igroup->hwpt = hwpt; } + refcount_inc(&hwpt->obj.users); + list_add_tail(&idev->group_item, &idev->igroup->device_list); return 0; err_unresv: iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); return rc; } -void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev) +struct iommufd_hw_pagetable * +iommufd_hw_pagetable_detach(struct iommufd_device *idev) { - if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) + struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt; + + lockdep_assert_held(&idev->igroup->lock); + + list_del(&idev->group_item); + if (list_empty(&idev->igroup->device_list)) { iommu_detach_group(hwpt->domain, idev->igroup->group); + idev->igroup->hwpt = NULL; + } iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + /* Caller must destroy hwpt */ + return hwpt; } static int iommufd_device_do_attach(struct iommufd_device *idev, @@ -356,16 +362,9 @@ static int iommufd_device_do_attach(struct iommufd_device *idev, { int rc; - mutex_lock(&hwpt->devices_lock); + mutex_lock(&idev->igroup->lock); rc = iommufd_hw_pagetable_attach(hwpt, idev); - if (rc) - goto out_unlock; - - idev->hwpt = hwpt; - refcount_inc(&hwpt->obj.users); - list_add(&idev->devices_item, &hwpt->devices); -out_unlock: - mutex_unlock(&hwpt->devices_lock); + mutex_unlock(&idev->igroup->lock); return rc; } @@ -375,7 +374,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev, * Automatic domain selection will never pick a manually created domain. */ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, - struct iommufd_ioas *ioas) + struct iommufd_ioas *ioas, u32 *pt_id) { struct iommufd_hw_pagetable *hwpt; int rc; @@ -402,6 +401,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, */ if (rc == -EINVAL) continue; + *pt_id = hwpt->obj.id; goto out_unlock; } @@ -411,6 +411,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; } hwpt->auto_domain = true; + *pt_id = hwpt->obj.id; mutex_unlock(&ioas->mutex); iommufd_object_finalize(idev->ictx, &hwpt->obj); @@ -455,7 +456,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) struct iommufd_ioas *ioas = container_of(pt_obj, struct iommufd_ioas, obj); - rc = iommufd_device_auto_get_domain(idev, ioas); + rc = iommufd_device_auto_get_domain(idev, ioas, pt_id); if (rc) goto out_put_pt_obj; break; @@ -466,7 +467,6 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id) } refcount_inc(&idev->obj.users); - *pt_id = idev->hwpt->obj.id; rc = 0; out_put_pt_obj: @@ -484,13 +484,11 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD); */ void iommufd_device_detach(struct iommufd_device *idev) { - struct iommufd_hw_pagetable *hwpt = idev->hwpt; + struct iommufd_hw_pagetable *hwpt; - mutex_lock(&hwpt->devices_lock); - list_del(&idev->devices_item); - idev->hwpt = NULL; - iommufd_hw_pagetable_detach(hwpt, idev); - mutex_unlock(&hwpt->devices_lock); + mutex_lock(&idev->igroup->lock); + hwpt = iommufd_hw_pagetable_detach(idev); + mutex_unlock(&idev->igroup->lock); if (hwpt->auto_domain) iommufd_object_destroy_user(idev->ictx, &hwpt->obj); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 6cdb6749d359f3..566eba0cd9b917 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -11,8 +11,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); - WARN_ON(!list_empty(&hwpt->devices)); - if (!list_empty(&hwpt->hwpt_item)) { mutex_lock(&hwpt->ioas->mutex); list_del(&hwpt->hwpt_item); @@ -25,7 +23,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) iommu_domain_free(hwpt->domain); refcount_dec(&hwpt->ioas->obj.users); - mutex_destroy(&hwpt->devices_lock); } /** @@ -52,9 +49,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (IS_ERR(hwpt)) return hwpt; - INIT_LIST_HEAD(&hwpt->devices); INIT_LIST_HEAD(&hwpt->hwpt_item); - mutex_init(&hwpt->devices_lock); /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas; @@ -65,13 +60,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_abort; } - mutex_lock(&hwpt->devices_lock); + mutex_lock(&idev->igroup->lock); /* * immediate_attach exists only to accommodate iommu drivers that cannot * directly allocate a domain. These drivers do not finish creating the * domain until attach is completed. Thus we must have this call * sequence. Once those drivers are fixed this should be removed. + * + * Note we hold the igroup->lock here which prevents any other thread + * from observing igroup->hwpt until we finish setting it up. */ if (immediate_attach) { rc = iommufd_hw_pagetable_attach(hwpt, idev); @@ -84,21 +82,14 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_detach; list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); - if (immediate_attach) { - /* See iommufd_device_do_attach() */ - refcount_inc(&hwpt->obj.users); - idev->hwpt = hwpt; - list_add(&idev->devices_item, &hwpt->devices); - } - - mutex_unlock(&hwpt->devices_lock); + mutex_unlock(&idev->igroup->lock); return hwpt; out_detach: if (immediate_attach) - iommufd_hw_pagetable_detach(hwpt, idev); + iommufd_hw_pagetable_detach(idev); out_unlock: - mutex_unlock(&hwpt->devices_lock); + mutex_unlock(&idev->igroup->lock); out_abort: iommufd_object_abort_and_destroy(ictx, &hwpt->obj); return ERR_PTR(rc); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 2544f10dae9aef..2ff192777f27d3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -250,8 +250,6 @@ struct iommufd_hw_pagetable { bool msi_cookie : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; - struct mutex devices_lock; - struct list_head devices; }; struct iommufd_hw_pagetable * @@ -259,14 +257,17 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, bool immediate_attach); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); -void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt, - struct iommufd_device *idev); +struct iommufd_hw_pagetable * +iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); struct iommufd_group { struct kref ref; + struct mutex lock; struct iommufd_ctx *ictx; struct iommu_group *group; + struct iommufd_hw_pagetable *hwpt; + struct list_head device_list; }; /* @@ -278,9 +279,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; - struct iommufd_hw_pagetable *hwpt; - /* Head at iommufd_hw_pagetable::devices */ - struct list_head devices_item; + struct list_head group_item; /* always the physical device */ struct device *dev; bool enforce_cache_coherency;