diff mbox series

[v1,5/8] iommu/amd: Use iommu_attach/detach_device()

Message ID 20220106022053.2406748-6-baolu.lu@linux.intel.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Scrap iommu_attach/detach_group() interfaces | expand

Commit Message

Baolu Lu Jan. 6, 2022, 2:20 a.m. UTC
The individual device driver should use iommu_attach/detach_device()
for domain attachment/detachment.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/amd/iommu_v2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe Jan. 6, 2022, 2:33 p.m. UTC | #1
On Thu, Jan 06, 2022 at 10:20:50AM +0800, Lu Baolu wrote:
> The individual device driver should use iommu_attach/detach_device()
> for domain attachment/detachment.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>  drivers/iommu/amd/iommu_v2.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index 58da08cc3d01..7d9d0fe89064 100644
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -133,7 +133,7 @@ static void free_device_state(struct device_state *dev_state)
>  	if (WARN_ON(!group))
>  		return;
>  
> -	iommu_detach_group(dev_state->domain, group);
> +	iommu_detach_device(dev_state->domain, &dev_state->pdev->dev);
>  
>  	iommu_group_put(group);

This is the only user of the group in the function all the
group_get/put should be deleted too.

Joerg said in commit 55c99a4dc50f ("iommu/amd: Use
iommu_attach_group()") that the device API doesn't work here because
there are multi-device groups?

But I'm not sure how this can work with multi-device groups - this
seems to assigns a domain setup for direct map, so perhaps this only
works if all devices are setup for direct map?

> @@ -791,7 +791,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>  		goto out_free_domain;
>  	}
>  
> -	ret = iommu_attach_group(dev_state->domain, group);
> +	ret = iommu_attach_device(dev_state->domain, &pdev->dev);
>  	if (ret != 0)
>  		goto out_drop_group;

Same comment here

Jason
Baolu Lu Jan. 7, 2022, 12:23 a.m. UTC | #2
Hi Jason,

On 1/6/22 10:33 PM, Jason Gunthorpe wrote:
> On Thu, Jan 06, 2022 at 10:20:50AM +0800, Lu Baolu wrote:
>> The individual device driver should use iommu_attach/detach_device()
>> for domain attachment/detachment.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>   drivers/iommu/amd/iommu_v2.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
>> index 58da08cc3d01..7d9d0fe89064 100644
>> +++ b/drivers/iommu/amd/iommu_v2.c
>> @@ -133,7 +133,7 @@ static void free_device_state(struct device_state *dev_state)
>>   	if (WARN_ON(!group))
>>   		return;
>>   
>> -	iommu_detach_group(dev_state->domain, group);
>> +	iommu_detach_device(dev_state->domain, &dev_state->pdev->dev);
>>   
>>   	iommu_group_put(group);
> 
> This is the only user of the group in the function all the
> group_get/put should be deleted too.
> 
> Joerg said in commit 55c99a4dc50f ("iommu/amd: Use
> iommu_attach_group()") that the device API doesn't work here because
> there are multi-device groups?
> 
> But I'm not sure how this can work with multi-device groups - this
> seems to assigns a domain setup for direct map, so perhaps this only
> works if all devices are setup for direct map?

It's also difficult for me to understand how this can work with multi-
device group. The iommu_attach_group() returns -EBUSY if _init_device()
is called for the second device in the group. That's the reason why I
didn't set no_kernel_dma.

> 
>> @@ -791,7 +791,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>>   		goto out_free_domain;
>>   	}
>>   
>> -	ret = iommu_attach_group(dev_state->domain, group);
>> +	ret = iommu_attach_device(dev_state->domain, &pdev->dev);
>>   	if (ret != 0)
>>   		goto out_drop_group;
> 
> Same comment here

Yes.

> 
> Jason
> 

Best regards,
baolu
Joerg Roedel Feb. 14, 2022, 11:27 a.m. UTC | #3
On Thu, Jan 06, 2022 at 10:33:45AM -0400, Jason Gunthorpe wrote:
> But I'm not sure how this can work with multi-device groups - this
> seems to assigns a domain setup for direct map, so perhaps this only
> works if all devices are setup for direct map?

Right, at boot all devices in this group are already setup for using a
direct map. There are usually two devices in those groups, the GPU
itself and the sound device for HDMI sound output.

Regards,

	Joerg
Jason Gunthorpe Feb. 14, 2022, 1:15 p.m. UTC | #4
On Mon, Feb 14, 2022 at 12:27:05PM +0100, Joerg Roedel wrote:
> On Thu, Jan 06, 2022 at 10:33:45AM -0400, Jason Gunthorpe wrote:
> > But I'm not sure how this can work with multi-device groups - this
> > seems to assigns a domain setup for direct map, so perhaps this only
> > works if all devices are setup for direct map?
> 
> Right, at boot all devices in this group are already setup for using a
> direct map. There are usually two devices in those groups, the GPU
> itself and the sound device for HDMI sound output.

But how does the sound device know that this has been done to it?

eg how do we know the sound device hasn't been bound to VFIO or
something at this point?

Jason
Joerg Roedel Feb. 14, 2022, 1:40 p.m. UTC | #5
On Mon, Feb 14, 2022 at 09:15:44AM -0400, Jason Gunthorpe wrote:
> But how does the sound device know that this has been done to it?
> 
> eg how do we know the sound device hasn't been bound to VFIO or
> something at this point?

The iommu_attach_group() call will fail when the group (which includes
GPU and sound device) it not in its default-domain. So if VFIO attached
the group to its own domain, there is a failure in this init function.

Note that this function is intended to be called by the driver currently
controling this device, so there should also be no race with VFIO trying
to grab the device in parallel.

Regards,

	Joerg
Jason Gunthorpe Feb. 14, 2022, 2:02 p.m. UTC | #6
On Mon, Feb 14, 2022 at 02:40:56PM +0100, Joerg Roedel wrote:
> On Mon, Feb 14, 2022 at 09:15:44AM -0400, Jason Gunthorpe wrote:
> > But how does the sound device know that this has been done to it?
> > 
> > eg how do we know the sound device hasn't been bound to VFIO or
> > something at this point?
> 
> The iommu_attach_group() call will fail when the group (which includes
> GPU and sound device) it not in its default-domain. So if VFIO attached
> the group to its own domain, there is a failure in this init function.

That works for VFIO, but it doesn't work for other in-kernel
drivers.. Is there something ensuring the group is only the GPU and
sound device? Is the GPU never an addin card?

I'd say the right way to code this after Lu's series to have both the
GPU and sound driver call iommu_attach_device() during their probe()'s
and specify the identity domain as the attaching domain.

That would be quite similar to how the Tegra drivers got arranged.

And then maybe someone could better guess what the "sound driver" is
since it would be marked with an iommu_attach_device() call..

Jason
Joerg Roedel Feb. 14, 2022, 2:23 p.m. UTC | #7
On Mon, Feb 14, 2022 at 10:02:36AM -0400, Jason Gunthorpe wrote:
> That works for VFIO, but it doesn't work for other in-kernel
> drivers.. Is there something ensuring the group is only the GPU and
> sound device? Is the GPU never an addin card?

GPUs supporting this functionality are always iGPUs, AFAIK.

> I'd say the right way to code this after Lu's series to have both the
> GPU and sound driver call iommu_attach_device() during their probe()'s
> and specify the identity domain as the attaching domain.
> 
> That would be quite similar to how the Tegra drivers got arranged.
> 
> And then maybe someone could better guess what the "sound driver" is
> since it would be marked with an iommu_attach_device() call.

Device drivers calling into iommu_attach_device() is seldom a good
idea.  In this case the sound device has some generic hardware
interface so that an existing sound driver can be re-used. Making this
driver call iommu-specific functions for some devices is something hard
to justify.

With sub-groups on the other hand it would be a no-brainer, because the
sound device would be in a separate sub-group. Basically any device in
the same group as the GPU would be in a separate sub-group.

Regards,

	Joerg
Jason Gunthorpe Feb. 14, 2022, 3 p.m. UTC | #8
On Mon, Feb 14, 2022 at 03:23:07PM +0100, Joerg Roedel wrote:

> Device drivers calling into iommu_attach_device() is seldom a good
> idea.  In this case the sound device has some generic hardware
> interface so that an existing sound driver can be re-used. Making this
> driver call iommu-specific functions for some devices is something hard
> to justify.

Er, so this is transparent to the generic sound device? I guess
something fixed up the dma_api on that device to keep working?

But, then, the requirement is that nobody is using the dma API when we
make this change?

> With sub-groups on the other hand it would be a no-brainer, because the
> sound device would be in a separate sub-group. Basically any device in
> the same group as the GPU would be in a separate sub-group.

I don't think it matters how big/small the group is, only that when we
change the domain we know everything flowing through the domain is
still happy.

Jason
Joerg Roedel Feb. 15, 2022, 9:11 a.m. UTC | #9
On Mon, Feb 14, 2022 at 11:00:59AM -0400, Jason Gunthorpe wrote:
> On Mon, Feb 14, 2022 at 03:23:07PM +0100, Joerg Roedel wrote:
> 
> > Device drivers calling into iommu_attach_device() is seldom a good
> > idea.  In this case the sound device has some generic hardware
> > interface so that an existing sound driver can be re-used. Making this
> > driver call iommu-specific functions for some devices is something hard
> > to justify.
> 
> Er, so this is transparent to the generic sound device? I guess
> something fixed up the dma_api on that device to keep working?

Right, this is completly transparent to the sound device. The IOMMU code
will not set dma_ops on the device because it uses a direct mapping and
so the standard implementation will be used.

> But, then, the requirement is that nobody is using the dma API when we
> make this change?

That is the tricky part. DMA-API keeps working after the change is made,
because the new domain is also direct mapped. The new domain just has
the ability to assign host page-tables to device PASIDs, so that DMA
requests with a PASID TLP will be remapped.

It was actually a requirement for this code that when it jumps in, the
DMA-API mappings stay live. And the reason a direct mapping is used at
all is that the page-table walker of the IOMMU is a two-dimensional
walker, which will treat the addresses found in the host page-tables as
IO-virtual an translates them through the underlying page-table. So to
use host-pagetables the underlying mapping must be direct mapped.


> I don't think it matters how big/small the group is, only that when we
> change the domain we know everything flowing through the domain is
> still happy.

Yes, that matters. The group size matters too for DMA-API performance.
If two devices compete for the same lock in the allocator and/or the
same cached magazines, things will slow down. That only matters for
high-throughput devices, but still...

Regards,

	Joerg
Robin Murphy Feb. 15, 2022, 1:02 p.m. UTC | #10
On 2022-02-15 09:11, Joerg Roedel wrote:
> On Mon, Feb 14, 2022 at 11:00:59AM -0400, Jason Gunthorpe wrote:
>> On Mon, Feb 14, 2022 at 03:23:07PM +0100, Joerg Roedel wrote:
>>
>>> Device drivers calling into iommu_attach_device() is seldom a good
>>> idea.  In this case the sound device has some generic hardware
>>> interface so that an existing sound driver can be re-used. Making this
>>> driver call iommu-specific functions for some devices is something hard
>>> to justify.
>>
>> Er, so this is transparent to the generic sound device? I guess
>> something fixed up the dma_api on that device to keep working?
> 
> Right, this is completly transparent to the sound device. The IOMMU code
> will not set dma_ops on the device because it uses a direct mapping and
> so the standard implementation will be used.
> 
>> But, then, the requirement is that nobody is using the dma API when we
>> make this change?
> 
> That is the tricky part. DMA-API keeps working after the change is made,
> because the new domain is also direct mapped. The new domain just has
> the ability to assign host page-tables to device PASIDs, so that DMA
> requests with a PASID TLP will be remapped.
> 
> It was actually a requirement for this code that when it jumps in, the
> DMA-API mappings stay live. And the reason a direct mapping is used at
> all is that the page-table walker of the IOMMU is a two-dimensional
> walker, which will treat the addresses found in the host page-tables as
> IO-virtual an translates them through the underlying page-table. So to
> use host-pagetables the underlying mapping must be direct mapped.

Given how things have evolved since that code was originally written, 
and that we seemingly now have the def_domain_type override kicking in 
as soon as we first see an IOMMUv2-capable device, do we even need to 
then subsequently switch to this special unmanaged domain with its 
pagetable sucked out, or could we just install the PASID table in the 
default domain itself?

Robin.

>> I don't think it matters how big/small the group is, only that when we
>> change the domain we know everything flowing through the domain is
>> still happy.
> 
> Yes, that matters. The group size matters too for DMA-API performance.
> If two devices compete for the same lock in the allocator and/or the
> same cached magazines, things will slow down. That only matters for
> high-throughput devices, but still...
> 
> Regards,
> 
> 	Joerg
>
Jason Gunthorpe Feb. 15, 2022, 2:37 p.m. UTC | #11
On Tue, Feb 15, 2022 at 10:11:01AM +0100, Joerg Roedel wrote:

> > But, then, the requirement is that nobody is using the dma API when we
> > make this change?
> 
> That is the tricky part. DMA-API keeps working after the change is made,
> because the new domain is also direct mapped. The new domain just has
> the ability to assign host page-tables to device PASIDs, so that DMA
> requests with a PASID TLP will be remapped.

From, what you've described this is also a good use case for the
replace_group idea..

Using Lu's series we'd:
 - Require the group to be in DMA-API mode. We know this is the case
   because of how we got here from probe()
 - Use the 'replace group's domain' API to switch the group from one
   identity translation domain to another identity translation domain
 - Rely on the DMA-API refcount to block VFIO/etc instead of checking
   for the default domain
 - Restore the default domain when all the DMA-API mode drivers unprobe

That is, if Robin's idea to just get the right domain in the first
place doesn't work.

Anyhow, given all this I think this patch is not OK as is.

Thanks,
Jason
diff mbox series

Patch

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 58da08cc3d01..7d9d0fe89064 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -133,7 +133,7 @@  static void free_device_state(struct device_state *dev_state)
 	if (WARN_ON(!group))
 		return;
 
-	iommu_detach_group(dev_state->domain, group);
+	iommu_detach_device(dev_state->domain, &dev_state->pdev->dev);
 
 	iommu_group_put(group);
 
@@ -791,7 +791,7 @@  int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 		goto out_free_domain;
 	}
 
-	ret = iommu_attach_group(dev_state->domain, group);
+	ret = iommu_attach_device(dev_state->domain, &pdev->dev);
 	if (ret != 0)
 		goto out_drop_group;