diff mbox series

[09/14] iommufd: Add iommufd_device_replace()

Message ID 9-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia.com (mailing list archive)
State New
Headers show
Series Add iommufd physical device operations for replace and alloc hwpt | expand

Commit Message

Jason Gunthorpe Feb. 25, 2023, 12:27 a.m. UTC
Replace allows all the devices in a group to move in one step to a new
HWPT. Further, the HWPT move is done without going through a blocking
domain so that the IOMMU driver can implement some level of
non-distruption to ongoing DMA if that has meaning for it (eg for future
special driver domains)

Replace uses a lot of the same logic as normal attach, except the actual
domain change over has different restrictions, and we are careful to
sequence things so that failure is going to leave everything the way it
was, and not get trapped in a blocking domain or something if there is
ENOMEM.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 216 ++++++++++++++++++++++++++-------
 drivers/iommu/iommufd/main.c   |   1 +
 2 files changed, 175 insertions(+), 42 deletions(-)

Comments

Baolu Lu Feb. 26, 2023, 3:01 a.m. UTC | #1
On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
> @@ -437,25 +517,77 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
>   		struct iommufd_ioas *ioas =
>   			container_of(pt_obj, struct iommufd_ioas, obj);
>   
> -		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
> -		if (rc)
> +		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
> +							      do_attach);
> +		if (IS_ERR(destroy_hwpt))
>   			goto out_put_pt_obj;
>   		break;
>   	}
>   	default:
> -		rc = -EINVAL;
> +		destroy_hwpt = ERR_PTR(-EINVAL);
>   		goto out_put_pt_obj;
>   	}
> +	iommufd_put_object(pt_obj);
>   
> -	refcount_inc(&idev->obj.users);
> -	rc = 0;
> +	/* This destruction has to be after we unlock everything */
> +	if (destroy_hwpt)

Should this be

	if (!IS_ERR_OR_NULL(destroy_hwpt))

?

> +		iommufd_hw_pagetable_put(idev->ictx, destroy_hwpt);
> +	return 0;
>   
>   out_put_pt_obj:
>   	iommufd_put_object(pt_obj);
> -	return rc;
> +	return PTR_ERR(destroy_hwpt);
> +}

Best regards,
baolu
Baolu Lu Feb. 26, 2023, 3:13 a.m. UTC | #2
On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
> +/**
> + * iommufd_device_attach - Connect a device to an iommu_domain
> + * @idev: device to attach
> + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID

"Output the hwpt ID" only happens when the caller input an IOAS object
and an auto domain was selected or created for the device.

Do I understand it right?

> + *
> + * This connects the device to an iommu_domain, either automatically or manually
> + * selected. Once this completes the device could do DMA.
> + *
> + * The caller should return the resulting pt_id back to userspace.
> + * This function is undone by calling iommufd_device_detach().
> + */
> +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
> +{
> +	int rc;
> +
> +	rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Pairs with iommufd_device_detach() - catches caller bugs attempting
> +	 * to destroy a device with an attachment.
> +	 */
> +	refcount_inc(&idev->obj.users);
> +	return 0;
>   }
>   EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
>   
> +/**
> + * iommufd_device_replace - Change the device's iommu_domain
> + * @idev: device to change
> + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID

If my above understanding is correct, then replace will never output a
hwpt id as it only happens after a successful attach.

> + *
> + * This is the same as:
> + *   iommufd_device_detach();
> + *   iommufd_device_attach()
> + * If it fails then no change is made to the attachment. The iommu driver may
> + * implement this so there is no disruption in translation. This can only be
> + * called if iommufd_device_attach() has already succeeded.
> + */
> +int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
> +{
> +	return iommufd_device_change_pt(idev, pt_id,
> +					&iommufd_device_do_replace);
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);

Best regards,
baolu
Jason Gunthorpe Feb. 27, 2023, 1:58 p.m. UTC | #3
On Sun, Feb 26, 2023 at 11:01:59AM +0800, Baolu Lu wrote:
> On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
> > @@ -437,25 +517,77 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
> >   		struct iommufd_ioas *ioas =
> >   			container_of(pt_obj, struct iommufd_ioas, obj);
> > -		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
> > -		if (rc)
> > +		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
> > +							      do_attach);
> > +		if (IS_ERR(destroy_hwpt))
> >   			goto out_put_pt_obj;
> >   		break;
> >   	}
> >   	default:
> > -		rc = -EINVAL;
> > +		destroy_hwpt = ERR_PTR(-EINVAL);
> >   		goto out_put_pt_obj;
> >   	}
> > +	iommufd_put_object(pt_obj);
> > -	refcount_inc(&idev->obj.users);
> > -	rc = 0;
> > +	/* This destruction has to be after we unlock everything */
> > +	if (destroy_hwpt)
> 
> Should this be
> 
> 	if (!IS_ERR_OR_NULL(destroy_hwpt))
> 
> ?

Never use IS_ERR_OR_NULL ..

What am I missing? all the flows that could possibly have err_ptr here
do goto_out_put_pt_obj ?

Jason
Jason Gunthorpe Feb. 27, 2023, 2 p.m. UTC | #4
On Sun, Feb 26, 2023 at 11:13:16AM +0800, Baolu Lu wrote:
> On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
> > +/**
> > + * iommufd_device_attach - Connect a device to an iommu_domain
> > + * @idev: device to attach
> > + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> > + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
> 
> "Output the hwpt ID" only happens when the caller input an IOAS object
> and an auto domain was selected or created for the device.
> 
> Do I understand it right?

Technically it always outputs the hwpt, if a hwpt is in put then the
same hwpt is output.

> >   EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
> > +/**
> > + * iommufd_device_replace - Change the device's iommu_domain
> > + * @idev: device to change
> > + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> > + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
> 
> If my above understanding is correct, then replace will never output a
> hwpt id as it only happens after a successful attach.

Replace calls iommufd_device_auto_get_domain() which always sets pt_id
on success?

If a HWPT was passed in then it just leaves it unchanged which is also
correct.

Jason
Baolu Lu Feb. 28, 2023, 1:50 a.m. UTC | #5
On 2/27/23 9:58 PM, Jason Gunthorpe wrote:
> On Sun, Feb 26, 2023 at 11:01:59AM +0800, Baolu Lu wrote:
>> On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
>>> @@ -437,25 +517,77 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
>>>    		struct iommufd_ioas *ioas =
>>>    			container_of(pt_obj, struct iommufd_ioas, obj);
>>> -		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
>>> -		if (rc)
>>> +		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
>>> +							      do_attach);
>>> +		if (IS_ERR(destroy_hwpt))
>>>    			goto out_put_pt_obj;
>>>    		break;
>>>    	}
>>>    	default:
>>> -		rc = -EINVAL;
>>> +		destroy_hwpt = ERR_PTR(-EINVAL);
>>>    		goto out_put_pt_obj;
>>>    	}
>>> +	iommufd_put_object(pt_obj);
>>> -	refcount_inc(&idev->obj.users);
>>> -	rc = 0;
>>> +	/* This destruction has to be after we unlock everything */
>>> +	if (destroy_hwpt)
>> Should this be
>>
>> 	if (!IS_ERR_OR_NULL(destroy_hwpt))
>>
>> ?
> Never use IS_ERR_OR_NULL ..

Can you please elaborate a bit on this? I can still see a lot of use of
it in the tree.

> 
> What am I missing? all the flows that could possibly have err_ptr here
> do goto_out_put_pt_obj ?

Oh yes! You are right. All err_ptr's have gone to an error handling
path.

Best regards,
baolu
Baolu Lu Feb. 28, 2023, 2:10 a.m. UTC | #6
On 2/27/23 10:00 PM, Jason Gunthorpe wrote:
> On Sun, Feb 26, 2023 at 11:13:16AM +0800, Baolu Lu wrote:
>> On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
>>> +/**
>>> + * iommufd_device_attach - Connect a device to an iommu_domain
>>> + * @idev: device to attach
>>> + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
>>> + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
>>
>> "Output the hwpt ID" only happens when the caller input an IOAS object
>> and an auto domain was selected or created for the device.
>>
>> Do I understand it right?
> 
> Technically it always outputs the hwpt, if a hwpt is in put then the
> same hwpt is output.

 From the code point of view, the pt_id is set only when an auto domain
is selected. Otherwise, it is untouched. Hence, probably we could
describe it more accurately in the comments. That is, if auto domain is
selected, its hwpt id will be returned in pt_id and the caller could
return it to userspace.

> 
>>>    EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
>>> +/**
>>> + * iommufd_device_replace - Change the device's iommu_domain
>>> + * @idev: device to change
>>> + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
>>> + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
>>
>> If my above understanding is correct, then replace will never output a
>> hwpt id as it only happens after a successful attach.
> 
> Replace calls iommufd_device_auto_get_domain() which always sets pt_id
> on success?

Yes. replace also calls iommufd_device_auto_get_domain().

> 
> If a HWPT was passed in then it just leaves it unchanged which is also
> correct.

Functionally right. Above I just want to make the comment matches what
the real code does.

Best regards,
baolu
Jason Gunthorpe Feb. 28, 2023, 1:51 p.m. UTC | #7
On Tue, Feb 28, 2023 at 09:50:52AM +0800, Baolu Lu wrote:
> On 2/27/23 9:58 PM, Jason Gunthorpe wrote:
> > On Sun, Feb 26, 2023 at 11:01:59AM +0800, Baolu Lu wrote:
> > > On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
> > > > @@ -437,25 +517,77 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
> > > >    		struct iommufd_ioas *ioas =
> > > >    			container_of(pt_obj, struct iommufd_ioas, obj);
> > > > -		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
> > > > -		if (rc)
> > > > +		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
> > > > +							      do_attach);
> > > > +		if (IS_ERR(destroy_hwpt))
> > > >    			goto out_put_pt_obj;
> > > >    		break;
> > > >    	}
> > > >    	default:
> > > > -		rc = -EINVAL;
> > > > +		destroy_hwpt = ERR_PTR(-EINVAL);
> > > >    		goto out_put_pt_obj;
> > > >    	}
> > > > +	iommufd_put_object(pt_obj);
> > > > -	refcount_inc(&idev->obj.users);
> > > > -	rc = 0;
> > > > +	/* This destruction has to be after we unlock everything */
> > > > +	if (destroy_hwpt)
> > > Should this be
> > > 
> > > 	if (!IS_ERR_OR_NULL(destroy_hwpt))
> > > 
> > > ?
> > Never use IS_ERR_OR_NULL ..
> 
> Can you please elaborate a bit on this? I can still see a lot of use of
> it in the tree.

Yes, sadly. It is usually some signal of toxic confusion about what
things mean.

A function that returns an ERR_PTR should very rarely return NULL, and
if it does return NULL then NULL wasn't an error.

Further you should never store an ERR_PTR in some structure and then
later try to test it, that is madness.

So with properly structured code the need should not exist.

https://lore.kernel.org/all/20130109150427.GL3931@n2100.arm.linux.org.uk/

Jason
Jason Gunthorpe Feb. 28, 2023, 1:52 p.m. UTC | #8
On Tue, Feb 28, 2023 at 10:10:41AM +0800, Baolu Lu wrote:

> > If a HWPT was passed in then it just leaves it unchanged which is also
> > correct.
> 
> Functionally right. Above I just want to make the comment matches what
> the real code does.

It does, on output the pt_id is always the IOMMUFD_OBJ_HW_PAGETABLE ID
of the attached HWPT.

Jason
Baolu Lu March 1, 2023, 1:55 a.m. UTC | #9
On 2/28/23 9:51 PM, Jason Gunthorpe wrote:
> On Tue, Feb 28, 2023 at 09:50:52AM +0800, Baolu Lu wrote:
>> On 2/27/23 9:58 PM, Jason Gunthorpe wrote:
>>> On Sun, Feb 26, 2023 at 11:01:59AM +0800, Baolu Lu wrote:
>>>> On 2/25/23 8:27 AM, Jason Gunthorpe wrote:
>>>>> @@ -437,25 +517,77 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
>>>>>     		struct iommufd_ioas *ioas =
>>>>>     			container_of(pt_obj, struct iommufd_ioas, obj);
>>>>> -		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
>>>>> -		if (rc)
>>>>> +		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
>>>>> +							      do_attach);
>>>>> +		if (IS_ERR(destroy_hwpt))
>>>>>     			goto out_put_pt_obj;
>>>>>     		break;
>>>>>     	}
>>>>>     	default:
>>>>> -		rc = -EINVAL;
>>>>> +		destroy_hwpt = ERR_PTR(-EINVAL);
>>>>>     		goto out_put_pt_obj;
>>>>>     	}
>>>>> +	iommufd_put_object(pt_obj);
>>>>> -	refcount_inc(&idev->obj.users);
>>>>> -	rc = 0;
>>>>> +	/* This destruction has to be after we unlock everything */
>>>>> +	if (destroy_hwpt)
>>>> Should this be
>>>>
>>>> 	if (!IS_ERR_OR_NULL(destroy_hwpt))
>>>>
>>>> ?
>>> Never use IS_ERR_OR_NULL ..
>> Can you please elaborate a bit on this? I can still see a lot of use of
>> it in the tree.
> Yes, sadly. It is usually some signal of toxic confusion about what
> things mean.
> 
> A function that returns an ERR_PTR should very rarely return NULL, and
> if it does return NULL then NULL wasn't an error.

That's true.

> Further you should never store an ERR_PTR in some structure and then
> later try to test it, that is madness.
> 
> So with properly structured code the need should not exist.
> 
> https://lore.kernel.org/all/20130109150427.GL3931@n2100.arm.linux.org.uk/

It's clear to me now. Thanks a lot for the explanation.

Best regards,
baolu
Baolu Lu March 1, 2023, 2:23 a.m. UTC | #10
On 2/28/23 9:52 PM, Jason Gunthorpe wrote:
> On Tue, Feb 28, 2023 at 10:10:41AM +0800, Baolu Lu wrote:
> 
>>> If a HWPT was passed in then it just leaves it unchanged which is also
>>> correct.
>> Functionally right. Above I just want to make the comment matches what
>> the real code does.
> It does, on output the pt_id is always the IOMMUFD_OBJ_HW_PAGETABLE ID
> of the attached HWPT.

OK. That's fine.

Best regards,
baolu
Tian, Kevin March 2, 2023, 8:20 a.m. UTC | #11
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, February 25, 2023 8:28 AM
>
> +static struct iommufd_hw_pagetable *
> +iommufd_device_do_replace_locked(struct iommufd_device *idev,
> +				 struct iommufd_hw_pagetable *hwpt)
> +{
> +	struct iommufd_hw_pagetable *old_hwpt;
> +	int rc;
> +
> +	lockdep_assert_held(&idev->igroup->lock);
> +
> +	/* Try to upgrade the domain we have */
> +	if (idev->enforce_cache_coherency) {
> +		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
> +		if (rc)
> +			return ERR_PTR(-EINVAL);
> +	}
> +
> +	rc = iommufd_device_setup_msi(idev, hwpt);
> +	if (rc)
> +		return ERR_PTR(rc);
> +
> +	old_hwpt = idev->igroup->hwpt;
> +	if (hwpt->ioas != old_hwpt->ioas) {
> +		rc = iopt_table_enforce_group_resv_regions(
> +			&hwpt->ioas->iopt, idev->igroup->group, NULL);
> +		if (rc)
> +			return ERR_PTR(rc);
> +	}

This is inconsistent with the earlier cleanup in the attach path
where setup_msi/enforce_group_resv_region are done only
once per group (if that is the right thing to do).
Jason Gunthorpe March 6, 2023, 8:44 p.m. UTC | #12
On Thu, Mar 02, 2023 at 08:20:05AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, February 25, 2023 8:28 AM
> >
> > +static struct iommufd_hw_pagetable *
> > +iommufd_device_do_replace_locked(struct iommufd_device *idev,
> > +				 struct iommufd_hw_pagetable *hwpt)
> > +{
> > +	struct iommufd_hw_pagetable *old_hwpt;
> > +	int rc;
> > +
> > +	lockdep_assert_held(&idev->igroup->lock);
> > +
> > +	/* Try to upgrade the domain we have */
> > +	if (idev->enforce_cache_coherency) {
> > +		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
> > +		if (rc)
> > +			return ERR_PTR(-EINVAL);
> > +	}
> > +
> > +	rc = iommufd_device_setup_msi(idev, hwpt);
> > +	if (rc)
> > +		return ERR_PTR(rc);
> > +
> > +	old_hwpt = idev->igroup->hwpt;
> > +	if (hwpt->ioas != old_hwpt->ioas) {
> > +		rc = iopt_table_enforce_group_resv_regions(
> > +			&hwpt->ioas->iopt, idev->igroup->group, NULL);
> > +		if (rc)
> > +			return ERR_PTR(rc);
> > +	}
> 
> This is inconsistent with the earlier cleanup in the attach path
> where setup_msi/enforce_group_resv_region are done only
> once per group (if that is the right thing to do).

Logically replace is 'detach every device in the group' - which makes
devices 0 - then 'reattach them all' to the new ioas.

So at this point it is still being done only once per group.

The 2nd idevs to call this function will see hwpt->ioas ==
old_hwpt->ioas

Jason
Tian, Kevin March 7, 2023, 2:42 a.m. UTC | #13
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 7, 2023 4:45 AM
> 
> On Thu, Mar 02, 2023 at 08:20:05AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, February 25, 2023 8:28 AM
> > >
> > > +static struct iommufd_hw_pagetable *
> > > +iommufd_device_do_replace_locked(struct iommufd_device *idev,
> > > +				 struct iommufd_hw_pagetable *hwpt)
> > > +{
> > > +	struct iommufd_hw_pagetable *old_hwpt;
> > > +	int rc;
> > > +
> > > +	lockdep_assert_held(&idev->igroup->lock);
> > > +
> > > +	/* Try to upgrade the domain we have */
> > > +	if (idev->enforce_cache_coherency) {
> > > +		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
> > > +		if (rc)
> > > +			return ERR_PTR(-EINVAL);
> > > +	}
> > > +
> > > +	rc = iommufd_device_setup_msi(idev, hwpt);
> > > +	if (rc)
> > > +		return ERR_PTR(rc);
> > > +
> > > +	old_hwpt = idev->igroup->hwpt;
> > > +	if (hwpt->ioas != old_hwpt->ioas) {
> > > +		rc = iopt_table_enforce_group_resv_regions(
> > > +			&hwpt->ioas->iopt, idev->igroup->group, NULL);
> > > +		if (rc)
> > > +			return ERR_PTR(rc);
> > > +	}
> >
> > This is inconsistent with the earlier cleanup in the attach path
> > where setup_msi/enforce_group_resv_region are done only
> > once per group (if that is the right thing to do).
> 
> Logically replace is 'detach every device in the group' - which makes
> devices 0 - then 'reattach them all' to the new ioas.
> 
> So at this point it is still being done only once per group.
> 
> The 2nd idevs to call this function will see hwpt->ioas ==
> old_hwpt->ioas
> 

but setup_msi() is still done for every device which is inconsistent
with what patch5 does.
Jason Gunthorpe March 7, 2023, 1:54 p.m. UTC | #14
On Tue, Mar 07, 2023 at 02:42:23AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, March 7, 2023 4:45 AM
> > 
> > On Thu, Mar 02, 2023 at 08:20:05AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Saturday, February 25, 2023 8:28 AM
> > > >
> > > > +static struct iommufd_hw_pagetable *
> > > > +iommufd_device_do_replace_locked(struct iommufd_device *idev,
> > > > +				 struct iommufd_hw_pagetable *hwpt)
> > > > +{
> > > > +	struct iommufd_hw_pagetable *old_hwpt;
> > > > +	int rc;
> > > > +
> > > > +	lockdep_assert_held(&idev->igroup->lock);
> > > > +
> > > > +	/* Try to upgrade the domain we have */
> > > > +	if (idev->enforce_cache_coherency) {
> > > > +		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
> > > > +		if (rc)
> > > > +			return ERR_PTR(-EINVAL);
> > > > +	}
> > > > +
> > > > +	rc = iommufd_device_setup_msi(idev, hwpt);
> > > > +	if (rc)
> > > > +		return ERR_PTR(rc);
> > > > +
> > > > +	old_hwpt = idev->igroup->hwpt;
> > > > +	if (hwpt->ioas != old_hwpt->ioas) {
> > > > +		rc = iopt_table_enforce_group_resv_regions(
> > > > +			&hwpt->ioas->iopt, idev->igroup->group, NULL);
> > > > +		if (rc)
> > > > +			return ERR_PTR(rc);
> > > > +	}
> > >
> > > This is inconsistent with the earlier cleanup in the attach path
> > > where setup_msi/enforce_group_resv_region are done only
> > > once per group (if that is the right thing to do).
> > 
> > Logically replace is 'detach every device in the group' - which makes
> > devices 0 - then 'reattach them all' to the new ioas.
> > 
> > So at this point it is still being done only once per group.
> > 
> > The 2nd idevs to call this function will see hwpt->ioas ==
> > old_hwpt->ioas
> > 
> 
> but setup_msi() is still done for every device which is inconsistent
> with what patch5 does.

There is a missing check to do nothing if the hwpt is already set

If the hwpt is not set and the ioas is the same then we still have to
do the setup_msi

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 7f95876d142d7f..913de911361115 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -4,6 +4,7 @@ 
 #include <linux/iommufd.h>
 #include <linux/slab.h>
 #include <linux/iommu.h>
+#include "../iommu-priv.h"
 
 #include "io_pagetable.h"
 #include "iommufd_private.h"
@@ -338,27 +339,101 @@  iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	return hwpt;
 }
 
-static int iommufd_device_do_attach(struct iommufd_device *idev,
-				    struct iommufd_hw_pagetable *hwpt)
+static struct iommufd_hw_pagetable *
+iommufd_device_do_attach(struct iommufd_device *idev,
+			 struct iommufd_hw_pagetable *hwpt)
 {
 	int rc;
 
 	mutex_lock(&idev->igroup->lock);
 	rc = iommufd_hw_pagetable_attach(hwpt, idev);
 	mutex_unlock(&idev->igroup->lock);
-	return rc;
+	if (rc)
+		return ERR_PTR(rc);
+	return NULL;
+}
+
+static struct iommufd_hw_pagetable *
+iommufd_device_do_replace_locked(struct iommufd_device *idev,
+				 struct iommufd_hw_pagetable *hwpt)
+{
+	struct iommufd_hw_pagetable *old_hwpt;
+	int rc;
+
+	lockdep_assert_held(&idev->igroup->lock);
+
+	/* Try to upgrade the domain we have */
+	if (idev->enforce_cache_coherency) {
+		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
+		if (rc)
+			return ERR_PTR(-EINVAL);
+	}
+
+	rc = iommufd_device_setup_msi(idev, hwpt);
+	if (rc)
+		return ERR_PTR(rc);
+
+	old_hwpt = idev->igroup->hwpt;
+	if (hwpt->ioas != old_hwpt->ioas) {
+		rc = iopt_table_enforce_group_resv_regions(
+			&hwpt->ioas->iopt, idev->igroup->group, NULL);
+		if (rc)
+			return ERR_PTR(rc);
+	}
+
+	rc = iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
+	if (rc)
+		goto err_unresv;
+
+	if (hwpt->ioas != old_hwpt->ioas)
+		iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
+					  idev->igroup->group);
+
+	idev->igroup->hwpt = hwpt;
+	refcount_inc(&hwpt->obj.users);
+	return old_hwpt;
+err_unresv:
+	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->igroup->group);
+	return ERR_PTR(rc);
+}
+
+static struct iommufd_hw_pagetable *
+iommufd_device_do_replace(struct iommufd_device *idev,
+			  struct iommufd_hw_pagetable *hwpt)
+{
+	struct iommufd_hw_pagetable *destroy_hwpt = NULL;
+	int rc;
+
+	mutex_lock(&idev->igroup->lock);
+	destroy_hwpt = iommufd_device_do_replace_locked(idev, hwpt);
+	if (IS_ERR(destroy_hwpt)) {
+		rc = PTR_ERR(destroy_hwpt);
+		goto out_unlock;
+	}
+	mutex_unlock(&idev->igroup->lock);
+	return destroy_hwpt;
+
+out_unlock:
+	mutex_unlock(&idev->igroup->lock);
+	return ERR_PTR(rc);
 }
 
+typedef struct iommufd_hw_pagetable *(*attach_fn)(
+	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
+
 /*
  * When automatically managing the domains we search for a compatible domain in
  * the iopt and if one is found use it, otherwise create a new domain.
  * Automatic domain selection will never pick a manually created domain.
  */
-static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
-					  struct iommufd_ioas *ioas, u32 *pt_id)
+static struct iommufd_hw_pagetable *
+iommufd_device_auto_get_domain(struct iommufd_device *idev,
+			       struct iommufd_ioas *ioas, u32 *pt_id,
+			       attach_fn do_attach)
 {
+	bool immediate_attach = do_attach == iommufd_device_do_attach;
+	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_hw_pagetable *hwpt;
-	int rc;
 
 	/*
 	 * There is no differentiation when domains are allocated, so any domain
@@ -372,52 +447,57 @@  static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 
 		if (!iommufd_lock_obj(&hwpt->obj))
 			continue;
-		rc = iommufd_device_do_attach(idev, hwpt);
-		iommufd_put_object(&hwpt->obj);
-
-		/*
-		 * -EINVAL means the domain is incompatible with the device.
-		 * Other error codes should propagate to userspace as failure.
-		 * Success means the domain is attached.
-		 */
-		if (rc == -EINVAL)
-			continue;
+		destroy_hwpt = (*do_attach)(idev, hwpt);
 		*pt_id = hwpt->obj.id;
+		iommufd_put_object(&hwpt->obj);
+		if (IS_ERR(destroy_hwpt)) {
+			/*
+			 * -EINVAL means the domain is incompatible with the
+			 * device. Other error codes should propagate to
+			 * userspace as failure. Success means the domain is
+			 * attached.
+			 */
+			if (PTR_ERR(destroy_hwpt) == -EINVAL)
+				continue;
+			goto out_unlock;
+		}
 		goto out_unlock;
 	}
 
-	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, true);
+	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
+					  immediate_attach);
 	if (IS_ERR(hwpt)) {
-		rc = PTR_ERR(hwpt);
+		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
 	}
+
+	if (!immediate_attach) {
+		destroy_hwpt = (*do_attach)(idev, hwpt);
+		if (IS_ERR(destroy_hwpt))
+			goto out_abort;
+	} else {
+		destroy_hwpt = NULL;
+	}
+
 	hwpt->auto_domain = true;
 	*pt_id = hwpt->obj.id;
 
 	mutex_unlock(&ioas->mutex);
 	iommufd_object_finalize(idev->ictx, &hwpt->obj);
-	return 0;
+	return destroy_hwpt;
+
+out_abort:
+	iommufd_object_abort_and_destroy(idev->ictx, &hwpt->obj);
 out_unlock:
 	mutex_unlock(&ioas->mutex);
-	return rc;
+	return destroy_hwpt;
 }
 
-/**
- * iommufd_device_attach - Connect a device from an iommu_domain
- * @idev: device to attach
- * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
- *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
- *
- * This connects the device to an iommu_domain, either automatically or manually
- * selected. Once this completes the device could do DMA.
- *
- * The caller should return the resulting pt_id back to userspace.
- * This function is undone by calling iommufd_device_detach().
- */
-int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
+static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
+				    attach_fn do_attach)
 {
+	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_object *pt_obj;
-	int rc;
 
 	pt_obj = iommufd_get_object(idev->ictx, *pt_id, IOMMUFD_OBJ_ANY);
 	if (IS_ERR(pt_obj))
@@ -428,8 +508,8 @@  int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
-		rc = iommufd_device_do_attach(idev, hwpt);
-		if (rc)
+		destroy_hwpt = (*do_attach)(idev, hwpt);
+		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
 	}
@@ -437,25 +517,77 @@  int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
-		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
-		if (rc)
+		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
+							      do_attach);
+		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
 	}
 	default:
-		rc = -EINVAL;
+		destroy_hwpt = ERR_PTR(-EINVAL);
 		goto out_put_pt_obj;
 	}
+	iommufd_put_object(pt_obj);
 
-	refcount_inc(&idev->obj.users);
-	rc = 0;
+	/* This destruction has to be after we unlock everything */
+	if (destroy_hwpt)
+		iommufd_hw_pagetable_put(idev->ictx, destroy_hwpt);
+	return 0;
 
 out_put_pt_obj:
 	iommufd_put_object(pt_obj);
-	return rc;
+	return PTR_ERR(destroy_hwpt);
+}
+
+/**
+ * iommufd_device_attach - Connect a device to an iommu_domain
+ * @idev: device to attach
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This connects the device to an iommu_domain, either automatically or manually
+ * selected. Once this completes the device could do DMA.
+ *
+ * The caller should return the resulting pt_id back to userspace.
+ * This function is undone by calling iommufd_device_detach().
+ */
+int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
+{
+	int rc;
+
+	rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach);
+	if (rc)
+		return rc;
+
+	/*
+	 * Pairs with iommufd_device_detach() - catches caller bugs attempting
+	 * to destroy a device with an attachment.
+	 */
+	refcount_inc(&idev->obj.users);
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
 
+/**
+ * iommufd_device_replace - Change the device's iommu_domain
+ * @idev: device to change
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This is the same as:
+ *   iommufd_device_detach();
+ *   iommufd_device_attach()
+ * If it fails then no change is made to the attachment. The iommu driver may
+ * implement this so there is no disruption in translation. This can only be
+ * called if iommufd_device_attach() has already succeeded.
+ */
+int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
+{
+	return iommufd_device_change_pt(idev, pt_id,
+					&iommufd_device_do_replace);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
+
 /**
  * iommufd_device_detach - Disconnect a device to an iommu_domain
  * @idev: device to detach
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e5ed5dfa91a0b5..8597f2fb88da3a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -461,5 +461,6 @@  module_exit(iommufd_exit);
 MODULE_ALIAS_MISCDEV(VFIO_MINOR);
 MODULE_ALIAS("devname:vfio/vfio");
 #endif
+MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
 MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
 MODULE_LICENSE("GPL");