diff mbox series

[1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

Message ID 1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com (mailing list archive)
State Not Applicable
Headers show
Series Make the iommu driver no-snoop block feature consistent | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Jason Gunthorpe April 5, 2022, 4:16 p.m. UTC
vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
way to do this is via dev_is_dma_coherent() like the DMA API does. If
IOMMU_CACHE is not supported then these drivers won't work as they don't
call any coherency-restoring routines around their DMAs.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++---------
 drivers/vhost/vdpa.c                     |  3 ++-
 2 files changed, 9 insertions(+), 10 deletions(-)

Comments

Christoph Hellwig April 6, 2022, 5:30 a.m. UTC | #1
On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote:
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 760b254ba42d6b..24d118198ac756 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -42,6 +42,7 @@
>  #include <linux/list.h>
>  #include <linux/pci.h>
>  #include <rdma/ib_verbs.h>
> +#include <linux/dma-map-ops.h>
>  
>  #include "usnic_log.h"
>  #include "usnic_uiom.h"
> @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>  	struct usnic_uiom_dev *uiom_dev;
>  	int err;
>  
> +	if (!dev_is_dma_coherent(dev)) {

Which part of the comment at the top of dma-map-ops.h is not clear
enough to you?
Jason Gunthorpe April 6, 2022, 12:07 p.m. UTC | #2
On Wed, Apr 06, 2022 at 07:30:39AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote:
> > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> > index 760b254ba42d6b..24d118198ac756 100644
> > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> > @@ -42,6 +42,7 @@
> >  #include <linux/list.h>
> >  #include <linux/pci.h>
> >  #include <rdma/ib_verbs.h>
> > +#include <linux/dma-map-ops.h>
> >  
> >  #include "usnic_log.h"
> >  #include "usnic_uiom.h"
> > @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
> >  	struct usnic_uiom_dev *uiom_dev;
> >  	int err;
> >  
> > +	if (!dev_is_dma_coherent(dev)) {
> 
> Which part of the comment at the top of dma-map-ops.h is not clear
> enough to you?

Didn't see it

I'll move dev_is_dma_coherent to device.h along with
device_iommu_mapped() and others then

Thanks,
Jason
Christoph Hellwig April 6, 2022, 1:51 p.m. UTC | #3
On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
> Didn't see it
> 
> I'll move dev_is_dma_coherent to device.h along with
> device_iommu_mapped() and others then

No.  It it is internal for a reason.  It also doesn't actually work
outside of the dma core.  E.g. for non-swiotlb ARM configs it will
not actually work.
Robin Murphy April 6, 2022, 1:56 p.m. UTC | #4
On 2022-04-05 17:16, Jason Gunthorpe wrote:
> vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
> way to do this is via dev_is_dma_coherent()

Not necessarily...

Disregarding the complete disaster of PCIe No Snoop on Arm-Based 
systems, there's the more interesting effectively-opposite scenario 
where an SMMU bridges non-coherent devices to a coherent interconnect. 
It's not something we take advantage of yet in Linux, and it can only be 
properly described in ACPI, but there do exist situations where 
IOMMU_CACHE is capable of making the device's traffic snoop, but 
dev_is_dma_coherent() - and device_get_dma_attr() for external users - 
would still say non-coherent because they can't assume that the SMMU is 
enabled and programmed in just the right way.

I've also not thought too much about how things might look with S2FWB 
thrown into the mix in future...

Robin.

> like the DMA API does. If
> IOMMU_CACHE is not supported then these drivers won't work as they don't
> call any coherency-restoring routines around their DMAs.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++---------
>   drivers/vhost/vdpa.c                     |  3 ++-
>   2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 760b254ba42d6b..24d118198ac756 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -42,6 +42,7 @@
>   #include <linux/list.h>
>   #include <linux/pci.h>
>   #include <rdma/ib_verbs.h>
> +#include <linux/dma-map-ops.h>
>   
>   #include "usnic_log.h"
>   #include "usnic_uiom.h"
> @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   	struct usnic_uiom_dev *uiom_dev;
>   	int err;
>   
> +	if (!dev_is_dma_coherent(dev)) {
> +		usnic_err("IOMMU of %s does not support cache coherency\n",
> +				dev_name(dev));
> +		return -EINVAL;
> +	}
> +
>   	uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC);
>   	if (!uiom_dev)
>   		return -ENOMEM;
> @@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   	if (err)
>   		goto out_free_dev;
>   
> -	if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {
> -		usnic_err("IOMMU of %s does not support cache coherency\n",
> -				dev_name(dev));
> -		err = -EINVAL;
> -		goto out_detach_device;
> -	}
> -
>   	spin_lock(&pd->lock);
>   	list_add_tail(&uiom_dev->link, &pd->devs);
>   	pd->dev_cnt++;
> @@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   
>   	return 0;
>   
> -out_detach_device:
> -	iommu_detach_device(pd->domain, dev);
>   out_free_dev:
>   	kfree(uiom_dev);
>   	return err;
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 4c2f0bd062856a..05ea5800febc37 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -22,6 +22,7 @@
>   #include <linux/vdpa.h>
>   #include <linux/nospec.h>
>   #include <linux/vhost.h>
> +#include <linux/dma-map-ops.h>
>   
>   #include "vhost.h"
>   
> @@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
>   	if (!bus)
>   		return -EFAULT;
>   
> -	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> +	if (!dev_is_dma_coherent(dma_dev))
>   		return -ENOTSUPP;
>   
>   	v->domain = iommu_domain_alloc(bus);
Jason Gunthorpe April 6, 2022, 2:14 p.m. UTC | #5
On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
> > Didn't see it
> > 
> > I'll move dev_is_dma_coherent to device.h along with
> > device_iommu_mapped() and others then
> 
> No.  It it is internal for a reason.  It also doesn't actually work
> outside of the dma core.  E.g. for non-swiotlb ARM configs it will
> not actually work.

Really? It is the only condition that dma_info_to_prot() tests to
decide of IOMMU_CACHE is used or not, so you are saying that there is
a condition where a device can be attached to an iommu_domain and
dev_is_dma_coherent() returns the wrong information? How does
dma-iommu.c safely use it then?

In any case I still need to do something about the places checking
IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE
works. Any idea?

Jason
Jason Gunthorpe April 6, 2022, 2:24 p.m. UTC | #6
On Wed, Apr 06, 2022 at 02:56:56PM +0100, Robin Murphy wrote:
> On 2022-04-05 17:16, Jason Gunthorpe wrote:
> > vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
> > way to do this is via dev_is_dma_coherent()
> 
> Not necessarily...
> 
> Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems,
> there's the more interesting effectively-opposite scenario where an SMMU
> bridges non-coherent devices to a coherent interconnect. It's not something
> we take advantage of yet in Linux, and it can only be properly described in
> ACPI, but there do exist situations where IOMMU_CACHE is capable of making
> the device's traffic snoop, but dev_is_dma_coherent() - and
> device_get_dma_attr() for external users - would still say non-coherent
> because they can't assume that the SMMU is enabled and programmed in just
> the right way.

Oh, I didn't know about device_get_dma_attr()..

Considering your future issue, maybe this:

/*
 * true if the given domain supports IOMMU_CACHE and when dev is attached to
 * that domain it will have coherent DMA and require no cache
 * maintenance when IOMMU_CACHE is used.
 */
bool iommu_domain_supports_coherent_dma(struct iommu_domain *domain, struct device *dev)
{
	return device_get_dma_attr(dev) == DEV_DMA_COHERENT;
}

? In future it could become a domain op and the SMMU driver could
figure out the situation you described?

Jason
Jason Gunthorpe April 6, 2022, 3:18 p.m. UTC | #7
On Wed, Apr 06, 2022 at 11:24:32AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 02:56:56PM +0100, Robin Murphy wrote:
> > On 2022-04-05 17:16, Jason Gunthorpe wrote:
> > > vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
> > > way to do this is via dev_is_dma_coherent()
> > 
> > Not necessarily...
> > 
> > Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems,
> > there's the more interesting effectively-opposite scenario where an SMMU
> > bridges non-coherent devices to a coherent interconnect. It's not something
> > we take advantage of yet in Linux, and it can only be properly described in
> > ACPI, but there do exist situations where IOMMU_CACHE is capable of making
> > the device's traffic snoop, but dev_is_dma_coherent() - and
> > device_get_dma_attr() for external users - would still say non-coherent
> > because they can't assume that the SMMU is enabled and programmed in just
> > the right way.
> 
> Oh, I didn't know about device_get_dma_attr()..
> 
> Considering your future issue, maybe this:
> 
> /*
>  * true if the given domain supports IOMMU_CACHE and when dev is attached to
>  * that domain it will have coherent DMA and require no cache
>  * maintenance when IOMMU_CACHE is used.
>  */
> bool iommu_domain_supports_coherent_dma(struct iommu_domain *domain, struct device *dev)
> {
> 	return device_get_dma_attr(dev) == DEV_DMA_COHERENT;
> }
> 
> ? In future it could become a domain op and the SMMU driver could
> figure out the situation you described?

I also spent some time looking at something like this:

struct iommu_domain *iommu_domain_alloc_coherent(struct device *device)
{
	if (device_get_dma_attr(device) == DEV_DMA_COHERENT)
		return NULL;
	return __iommu_domain_alloc(device->bus, IOMMU_DOMAIN_UNMANAGED);
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc_coherent);

Which could evolve into to passing the flag down to the iommu driver
and then it could ensure SMMU is "programmed in just the right way"
or fail?

Could also go like this:

#define IOMMU_DOMAIN_FLAG_COHERENT 1
struct iommu_domain *iommu_device_alloc_domain(struct device *device,
                                               unsigned int flags)

A new alloc option is not so easy to fit VFIO into right now though.

Advices?

Thanks,
Jason
Christoph Hellwig April 6, 2022, 3:47 p.m. UTC | #8
On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote:
> Really? It is the only condition that dma_info_to_prot() tests to
> decide of IOMMU_CACHE is used or not, so you are saying that there is
> a condition where a device can be attached to an iommu_domain and
> dev_is_dma_coherent() returns the wrong information? How does
> dma-iommu.c safely use it then?

arm does not use dma-iommu.c
Robin Murphy April 6, 2022, 3:48 p.m. UTC | #9
On 2022-04-06 15:14, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote:
>> On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
>>> Didn't see it
>>>
>>> I'll move dev_is_dma_coherent to device.h along with
>>> device_iommu_mapped() and others then
>>
>> No.  It it is internal for a reason.  It also doesn't actually work
>> outside of the dma core.  E.g. for non-swiotlb ARM configs it will
>> not actually work.
> 
> Really? It is the only condition that dma_info_to_prot() tests to
> decide of IOMMU_CACHE is used or not, so you are saying that there is
> a condition where a device can be attached to an iommu_domain and
> dev_is_dma_coherent() returns the wrong information? How does
> dma-iommu.c safely use it then?

The common iommu-dma layer happens to be part of the subset of the DMA 
core which *does* play the dev->dma_coherent game. 32-bit Arm has its 
own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with 
CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do 
VFIO, but that would be another example if so.

> In any case I still need to do something about the places checking
> IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE
> works. Any idea?

Can we improve the IOMMU drivers such that that *can* be the case 
(within a reasonable margin of error)? That's kind of where I was hoping 
to head with device_iommu_capable(), e.g. [1].

Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e
Jason Gunthorpe April 6, 2022, 3:48 p.m. UTC | #10
On Wed, Apr 06, 2022 at 05:47:23PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote:
> > Really? It is the only condition that dma_info_to_prot() tests to
> > decide of IOMMU_CACHE is used or not, so you are saying that there is
> > a condition where a device can be attached to an iommu_domain and
> > dev_is_dma_coherent() returns the wrong information? How does
> > dma-iommu.c safely use it then?
> 
> arm does not use dma-iommu.c

Oh. I did not know that. Thanks

Jason
Christoph Hellwig April 6, 2022, 3:50 p.m. UTC | #11
On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > Oh, I didn't know about device_get_dma_attr()..

Which is completely broken for any non-OF, non-ACPI plaform.
Jason Gunthorpe April 6, 2022, 4:06 p.m. UTC | #12
On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > Oh, I didn't know about device_get_dma_attr()..
> 
> Which is completely broken for any non-OF, non-ACPI plaform.

I saw that, but I spent some time searching and could not find an
iommu driver that would load independently of OF or ACPI. ie no IOMMU
platform drivers are created by board files. Things like Intel/AMD
discover only from ACPI, etc.

So it might be OK in the constrained case of having an iommu_domain
attached to the device being queried?

Jason
Christoph Hellwig April 6, 2022, 4:10 p.m. UTC | #13
On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > Oh, I didn't know about device_get_dma_attr()..
> > 
> > Which is completely broken for any non-OF, non-ACPI plaform.
> 
> I saw that, but I spent some time searching and could not find an
> iommu driver that would load independently of OF or ACPI. ie no IOMMU
> platform drivers are created by board files. Things like Intel/AMD
> discover only from ACPI, etc.

s390?
Jason Gunthorpe April 6, 2022, 5:17 p.m. UTC | #14
On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > Oh, I didn't know about device_get_dma_attr()..
> > > 
> > > Which is completely broken for any non-OF, non-ACPI plaform.
> > 
> > I saw that, but I spent some time searching and could not find an
> > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > platform drivers are created by board files. Things like Intel/AMD
> > discover only from ACPI, etc.
> 
> s390?

Ah, I missed looking in s390, hyperv and virtio.. 

hyperv is not creating iommu_domains, just IRQ remapping

virtio is using OF

And s390 indeed doesn't obviously have OF or ACPI parts..

This seems like it would be consistent with other things:

enum dev_dma_attr device_get_dma_attr(struct device *dev)
{
	const struct fwnode_handle *fwnode = dev_fwnode(dev);
	struct acpi_device *adev = to_acpi_device_node(fwnode);

	if (is_of_node(fwnode)) {
		if (of_dma_is_coherent(to_of_node(fwnode)))
			return DEV_DMA_COHERENT;
		return DEV_DMA_NON_COHERENT;
	} else if (adev) {
		return acpi_get_dma_attr(adev);
	}

	/* Platform is always DMA coherent */
	if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
	    device_iommu_mapped(dev))
		return DEV_DMA_COHERENT;
	return DEV_DMA_NOT_SUPPORTED;
}
EXPORT_SYMBOL_GPL(device_get_dma_attr);

ie s390 has no of or acpi but the entire platform is known DMA
coherent at config time so allow it. Not sure we need the
device_iommu_mapped() or not.

We could alternatively use existing device_get_dma_attr() as a default
with an iommu wrapper and push the exception down through the iommu
driver and s390 can override it.

Thanks,
Jason
Tian, Kevin April 7, 2022, 7:18 a.m. UTC | #15
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, April 7, 2022 1:17 AM
> 
> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > >
> > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > >
> > > I saw that, but I spent some time searching and could not find an
> > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > platform drivers are created by board files. Things like Intel/AMD
> > > discover only from ACPI, etc.

Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices. 

> >
> > s390?
> 
> Ah, I missed looking in s390, hyperv and virtio..
> 
> hyperv is not creating iommu_domains, just IRQ remapping
> 
> virtio is using OF
> 
> And s390 indeed doesn't obviously have OF or ACPI parts..
> 
> This seems like it would be consistent with other things:
> 
> enum dev_dma_attr device_get_dma_attr(struct device *dev)
> {
> 	const struct fwnode_handle *fwnode = dev_fwnode(dev);
> 	struct acpi_device *adev = to_acpi_device_node(fwnode);
> 
> 	if (is_of_node(fwnode)) {
> 		if (of_dma_is_coherent(to_of_node(fwnode)))
> 			return DEV_DMA_COHERENT;
> 		return DEV_DMA_NON_COHERENT;
> 	} else if (adev) {
> 		return acpi_get_dma_attr(adev);
> 	}
> 
> 	/* Platform is always DMA coherent */
> 	if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
> 	    device_iommu_mapped(dev))
> 		return DEV_DMA_COHERENT;
> 	return DEV_DMA_NOT_SUPPORTED;
> }
> EXPORT_SYMBOL_GPL(device_get_dma_attr);
> 
> ie s390 has no of or acpi but the entire platform is known DMA
> coherent at config time so allow it. Not sure we need the
> device_iommu_mapped() or not.

Probably not. If adding an iommu may change the coherency it would
come from specific IOPTE attributes for a device. The fact whether the
device is mapped by iommu doesn't tell that information.

> 
> We could alternatively use existing device_get_dma_attr() as a default
> with an iommu wrapper and push the exception down through the iommu
> driver and s390 can override it.
> 

if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?

Thanks
Kevin
Niklas Schnelle April 7, 2022, 8:53 a.m. UTC | #16
On Wed, 2022-04-06 at 14:17 -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > > 
> > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > > 
> > > I saw that, but I spent some time searching and could not find an
> > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > platform drivers are created by board files. Things like Intel/AMD
> > > discover only from ACPI, etc.
> > 
> > s390?
> 
> Ah, I missed looking in s390, hyperv and virtio.. 
> 
> hyperv is not creating iommu_domains, just IRQ remapping
> 
> virtio is using OF
> 
> And s390 indeed doesn't obviously have OF or ACPI parts..
> 
> This seems like it would be consistent with other things:
> 
> enum dev_dma_attr device_get_dma_attr(struct device *dev)
> {
> 	const struct fwnode_handle *fwnode = dev_fwnode(dev);
> 	struct acpi_device *adev = to_acpi_device_node(fwnode);
> 
> 	if (is_of_node(fwnode)) {
> 		if (of_dma_is_coherent(to_of_node(fwnode)))
> 			return DEV_DMA_COHERENT;
> 		return DEV_DMA_NON_COHERENT;
> 	} else if (adev) {
> 		return acpi_get_dma_attr(adev);
> 	}
> 
> 	/* Platform is always DMA coherent */
> 	if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
> 	    device_iommu_mapped(dev))
> 		return DEV_DMA_COHERENT;
> 	return DEV_DMA_NOT_SUPPORTED;
> }
> EXPORT_SYMBOL_GPL(device_get_dma_attr);
> 
> ie s390 has no of or acpi but the entire platform is known DMA
> coherent at config time so allow it. Not sure we need the
> device_iommu_mapped() or not.

I only took a short look but I think the device_iommu_mapped() call in
there is wrong. On s390 PCI always goes through IOMMU hardware both
when using the IOMMU API and when using the DMA API and this hardware
is always coherent. This is even true for s390 guests in QEMU/KVM and
under the z/VM hypervisor. As far as I can see device_iommu_mapped()'s
check for dev->iommu_group would only work while the device is under
IOMMU API control not DMA API, no?

Also, while it is true that s390 *hardware* devices are always cache
coherent there is also the case that SWIOTLB is used for protected
virtualization and then cache flushing APIs must be used.
Jason Gunthorpe April 7, 2022, 1:59 p.m. UTC | #17
On Thu, Apr 07, 2022 at 07:18:48AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, April 7, 2022 1:17 AM
> > 
> > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > > >
> > > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > > >
> > > > I saw that, but I spent some time searching and could not find an
> > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > > platform drivers are created by board files. Things like Intel/AMD
> > > > discover only from ACPI, etc.
> 
> Intel discovers IOMMUs (and optionally ACPI namespace devices) from
> ACPI, but there is no ACPI description for PCI devices i.e. the current
> logic of device_get_dma_attr() cannot be used on PCI devices. 

Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
DEV_DMA_NOT_SUPPORTED?

I think I should give up on this and just redefine the existing iommu
cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.

> > We could alternatively use existing device_get_dma_attr() as a default
> > with an iommu wrapper and push the exception down through the iommu
> > driver and s390 can override it.
> > 
> 
> if going this way probably device_get_dma_attr() should be renamed to
> device_fwnode_get_dma_attr() instead to make it clearer?

I'm looking at the few users:

drivers/ata/ahci_ceva.c
drivers/ata/ahci_qoriq.c
 - These are ARM only drivers. They are trying to copy the dma-coherent
   property from its DT/ACPI definition to internal register settings
   which look like they tune how the AXI bus transactions are created.

   I'm guessing the SATA IP block's AXI interface can be configured to
   generate coherent or non-coherent requests and it has to be set
   in a way that is consistent with the SOC architecture and match
   what the DMA API expects the device will do.

drivers/crypto/ccp/sp-platform.c
 - Only used on ARM64 and also programs a HW register similar to the
   sata drivers. Refuses to work if the FW property is not present.

drivers/net/ethernet/amd/xgbe/xgbe-platform.c
 - Seems to be configuring another ARM AXI block

drivers/gpu/drm/panfrost/panfrost_drv.c
 - Robin's commit comment here is good, and one of the things this
   controls is if the coherent_walk is set for the io-pgtable-arm.c
   code which avoids DMA API calls

drivers/gpu/drm/tegra/uapi.c
 - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.

My take is that the drivers using this API are doing it to make sure
their HW blocks are setup in a way that is consistent with the DMA API
they are also using, and run in constrained embedded-style
environments that know the firmware support is present.

So in the end it does not seem suitable right now for linking to
IOMMU_CACHE..

Jason
Robin Murphy April 7, 2022, 3:17 p.m. UTC | #18
On 2022-04-07 14:59, Jason Gunthorpe wrote:
> On Thu, Apr 07, 2022 at 07:18:48AM +0000, Tian, Kevin wrote:
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>> Sent: Thursday, April 7, 2022 1:17 AM
>>>
>>> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
>>>> On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
>>>>> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
>>>>>> On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
>>>>>>>> Oh, I didn't know about device_get_dma_attr()..
>>>>>>
>>>>>> Which is completely broken for any non-OF, non-ACPI plaform.
>>>>>
>>>>> I saw that, but I spent some time searching and could not find an
>>>>> iommu driver that would load independently of OF or ACPI. ie no IOMMU
>>>>> platform drivers are created by board files. Things like Intel/AMD
>>>>> discover only from ACPI, etc.
>>
>> Intel discovers IOMMUs (and optionally ACPI namespace devices) from
>> ACPI, but there is no ACPI description for PCI devices i.e. the current
>> logic of device_get_dma_attr() cannot be used on PCI devices.
> 
> Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
> DEV_DMA_NOT_SUPPORTED?

I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA 
method was actually present to say otherwise), based on 
acpi_init_coherency(), but I only know for sure what happens on arm64.

> I think I should give up on this and just redefine the existing iommu
> cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.

TBH I don't see any issue with current name, but I'd certainly be happy 
to nail down a specific definition for it, along the lines of "this 
means that IOMMU_CACHE mappings are generally coherent". That works for 
things like Arm's S2FWB making it OK to assign an otherwise-non-coherent 
device without extra hassle.

For the specific case of overriding PCIe No Snoop (which is more 
problematic from an Arm SMMU PoV) when assigning to a VM, would that not 
be easier solved by just having vfio-pci clear the "Enable No Snoop" 
control bit in the endpoint's PCIe capability?

>>> We could alternatively use existing device_get_dma_attr() as a default
>>> with an iommu wrapper and push the exception down through the iommu
>>> driver and s390 can override it.
>>>
>>
>> if going this way probably device_get_dma_attr() should be renamed to
>> device_fwnode_get_dma_attr() instead to make it clearer?
> 
> I'm looking at the few users:
> 
> drivers/ata/ahci_ceva.c
> drivers/ata/ahci_qoriq.c
>   - These are ARM only drivers. They are trying to copy the dma-coherent
>     property from its DT/ACPI definition to internal register settings
>     which look like they tune how the AXI bus transactions are created.
> 
>     I'm guessing the SATA IP block's AXI interface can be configured to
>     generate coherent or non-coherent requests and it has to be set
>     in a way that is consistent with the SOC architecture and match
>     what the DMA API expects the device will do.
> 
> drivers/crypto/ccp/sp-platform.c
>   - Only used on ARM64 and also programs a HW register similar to the
>     sata drivers. Refuses to work if the FW property is not present.
> 
> drivers/net/ethernet/amd/xgbe/xgbe-platform.c
>   - Seems to be configuring another ARM AXI block
> 
> drivers/gpu/drm/panfrost/panfrost_drv.c
>   - Robin's commit comment here is good, and one of the things this
>     controls is if the coherent_walk is set for the io-pgtable-arm.c
>     code which avoids DMA API calls
> 
> drivers/gpu/drm/tegra/uapi.c
>   - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.
> 
> My take is that the drivers using this API are doing it to make sure
> their HW blocks are setup in a way that is consistent with the DMA API
> they are also using, and run in constrained embedded-style
> environments that know the firmware support is present.
> 
> So in the end it does not seem suitable right now for linking to
> IOMMU_CACHE..

That seems a pretty good summary - I think they're basically all 
"firmware told Linux I'm coherent so I'd better act coherent" cases, but 
that still doesn't necessarily mean that they're *forced* to respect 
that. One of the things on my to-do list is to try adding a 
DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent 
devices, primarily to hook up in Panfrost (where there is a bit of a 
performance to claw back on the coherent AmLogic SoCs by leaving certain 
buffers non-cacheable).

Cheers,
Robin.
Jason Gunthorpe April 7, 2022, 3:23 p.m. UTC | #19
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:

> For the specific case of overriding PCIe No Snoop (which is more problematic
> from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> the endpoint's PCIe capability?

Ideally.

That was rediscussed recently, apparently there are non-compliant
devices and drivers that just ignore the bit. 

Presumably this is why x86 had to move to an IOMMU enforced feature..

> That seems a pretty good summary - I think they're basically all "firmware
> told Linux I'm coherent so I'd better act coherent" cases, but that still
> doesn't necessarily mean that they're *forced* to respect that. One of the
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force
> DMA cache maintenance for coherent devices, primarily to hook up in Panfrost
> (where there is a bit of a performance to claw back on the coherent AmLogic
> SoCs by leaving certain buffers non-cacheable).

It would be great to see that in a way that could bring in the few other
GPU drivers doing no-snoop to a formal DMA API instead of hacking
their own stuff with wbinvd calls or whatever.

Thanks,
Jason
Christoph Hellwig April 7, 2022, 3:31 p.m. UTC | #20
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
>> My take is that the drivers using this API are doing it to make sure
>> their HW blocks are setup in a way that is consistent with the DMA API
>> they are also using, and run in constrained embedded-style
>> environments that know the firmware support is present.
>>
>> So in the end it does not seem suitable right now for linking to
>> IOMMU_CACHE..
>
> That seems a pretty good summary - I think they're basically all "firmware 
> told Linux I'm coherent so I'd better act coherent" cases, but that still 
> doesn't necessarily mean that they're *forced* to respect that.

Yes. And the interface is horribly misnamed for that.  I'll see what
I can do to clean this up as I've noticed various other not very
nice things in that area.

> One of the 
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force 
> DMA cache maintenance for coherent devices, primarily to hook up in 
> Panfrost (where there is a bit of a performance to claw back on the 
> coherent AmLogic SoCs by leaving certain buffers non-cacheable).

This has been an explicit request from the amdgpu folks and thus been
on my TODO list for quite a while as well.  Note that I don't think it
should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages
as the drivers that want non-snoop generally also want to actually
be able to deal with pages.
Alex Williamson April 7, 2022, 10:37 p.m. UTC | #21
On Thu, 7 Apr 2022 12:23:31 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
> 
> > For the specific case of overriding PCIe No Snoop (which is more problematic
> > from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> > solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> > the endpoint's PCIe capability?  
> 
> Ideally.
> 
> That was rediscussed recently, apparently there are non-compliant
> devices and drivers that just ignore the bit. 
> 
> Presumably this is why x86 had to move to an IOMMU enforced feature..

I considered this option when implementing the current solution, but
ultimately I didn't have confidence in being able to prevent drivers
from using device specific means to effect the change anyway.  GPUs
especially have various back channels to config space.  Thanks,

Alex
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 760b254ba42d6b..24d118198ac756 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -42,6 +42,7 @@ 
 #include <linux/list.h>
 #include <linux/pci.h>
 #include <rdma/ib_verbs.h>
+#include <linux/dma-map-ops.h>
 
 #include "usnic_log.h"
 #include "usnic_uiom.h"
@@ -474,6 +475,12 @@  int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
 	struct usnic_uiom_dev *uiom_dev;
 	int err;
 
+	if (!dev_is_dma_coherent(dev)) {
+		usnic_err("IOMMU of %s does not support cache coherency\n",
+				dev_name(dev));
+		return -EINVAL;
+	}
+
 	uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC);
 	if (!uiom_dev)
 		return -ENOMEM;
@@ -483,13 +490,6 @@  int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
 	if (err)
 		goto out_free_dev;
 
-	if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {
-		usnic_err("IOMMU of %s does not support cache coherency\n",
-				dev_name(dev));
-		err = -EINVAL;
-		goto out_detach_device;
-	}
-
 	spin_lock(&pd->lock);
 	list_add_tail(&uiom_dev->link, &pd->devs);
 	pd->dev_cnt++;
@@ -497,8 +497,6 @@  int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
 
 	return 0;
 
-out_detach_device:
-	iommu_detach_device(pd->domain, dev);
 out_free_dev:
 	kfree(uiom_dev);
 	return err;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 4c2f0bd062856a..05ea5800febc37 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,6 +22,7 @@ 
 #include <linux/vdpa.h>
 #include <linux/nospec.h>
 #include <linux/vhost.h>
+#include <linux/dma-map-ops.h>
 
 #include "vhost.h"
 
@@ -929,7 +930,7 @@  static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
 	if (!bus)
 		return -EFAULT;
 
-	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
+	if (!dev_is_dma_coherent(dma_dev))
 		return -ENOTSUPP;
 
 	v->domain = iommu_domain_alloc(bus);