Message ID | 1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Make the iommu driver no-snoop block feature consistent | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote: > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c > index 760b254ba42d6b..24d118198ac756 100644 > --- a/drivers/infiniband/hw/usnic/usnic_uiom.c > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > @@ -42,6 +42,7 @@ > #include <linux/list.h> > #include <linux/pci.h> > #include <rdma/ib_verbs.h> > +#include <linux/dma-map-ops.h> > > #include "usnic_log.h" > #include "usnic_uiom.h" > @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) > struct usnic_uiom_dev *uiom_dev; > int err; > > + if (!dev_is_dma_coherent(dev)) { Which part of the comment at the top of dma-map-ops.h is not clear enough to you?
On Wed, Apr 06, 2022 at 07:30:39AM +0200, Christoph Hellwig wrote: > On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote: > > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c > > index 760b254ba42d6b..24d118198ac756 100644 > > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > > @@ -42,6 +42,7 @@ > > #include <linux/list.h> > > #include <linux/pci.h> > > #include <rdma/ib_verbs.h> > > +#include <linux/dma-map-ops.h> > > > > #include "usnic_log.h" > > #include "usnic_uiom.h" > > @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) > > struct usnic_uiom_dev *uiom_dev; > > int err; > > > > + if (!dev_is_dma_coherent(dev)) { > > Which part of the comment at the top of dma-map-ops.h is not clear > enough to you? Didn't see it I'll move dev_is_dma_coherent to device.h along with device_iommu_mapped() and others then Thanks, Jason
On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote: > Didn't see it > > I'll move dev_is_dma_coherent to device.h along with > device_iommu_mapped() and others then No. It it is internal for a reason. It also doesn't actually work outside of the dma core. E.g. for non-swiotlb ARM configs it will not actually work.
On 2022-04-05 17:16, Jason Gunthorpe wrote: > vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct > way to do this is via dev_is_dma_coherent() Not necessarily... Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems, there's the more interesting effectively-opposite scenario where an SMMU bridges non-coherent devices to a coherent interconnect. It's not something we take advantage of yet in Linux, and it can only be properly described in ACPI, but there do exist situations where IOMMU_CACHE is capable of making the device's traffic snoop, but dev_is_dma_coherent() - and device_get_dma_attr() for external users - would still say non-coherent because they can't assume that the SMMU is enabled and programmed in just the right way. I've also not thought too much about how things might look with S2FWB thrown into the mix in future... Robin. > like the DMA API does. If > IOMMU_CACHE is not supported then these drivers won't work as they don't > call any coherency-restoring routines around their DMAs. > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > --- > drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++--------- > drivers/vhost/vdpa.c | 3 ++- > 2 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c > index 760b254ba42d6b..24d118198ac756 100644 > --- a/drivers/infiniband/hw/usnic/usnic_uiom.c > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > @@ -42,6 +42,7 @@ > #include <linux/list.h> > #include <linux/pci.h> > #include <rdma/ib_verbs.h> > +#include <linux/dma-map-ops.h> > > #include "usnic_log.h" > #include "usnic_uiom.h" > @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) > struct usnic_uiom_dev *uiom_dev; > int err; > > + if (!dev_is_dma_coherent(dev)) { > + usnic_err("IOMMU of %s does not support cache coherency\n", > + dev_name(dev)); > + return -EINVAL; > + } > + > uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC); > if (!uiom_dev) > return -ENOMEM; > @@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) > if (err) > goto out_free_dev; > > - if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) { > - usnic_err("IOMMU of %s does not support cache coherency\n", > - dev_name(dev)); > - err = -EINVAL; > - goto out_detach_device; > - } > - > spin_lock(&pd->lock); > list_add_tail(&uiom_dev->link, &pd->devs); > pd->dev_cnt++; > @@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) > > return 0; > > -out_detach_device: > - iommu_detach_device(pd->domain, dev); > out_free_dev: > kfree(uiom_dev); > return err; > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 4c2f0bd062856a..05ea5800febc37 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -22,6 +22,7 @@ > #include <linux/vdpa.h> > #include <linux/nospec.h> > #include <linux/vhost.h> > +#include <linux/dma-map-ops.h> > > #include "vhost.h" > > @@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v) > if (!bus) > return -EFAULT; > > - if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY)) > + if (!dev_is_dma_coherent(dma_dev)) > return -ENOTSUPP; > > v->domain = iommu_domain_alloc(bus);
On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote: > On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote: > > Didn't see it > > > > I'll move dev_is_dma_coherent to device.h along with > > device_iommu_mapped() and others then > > No. It it is internal for a reason. It also doesn't actually work > outside of the dma core. E.g. for non-swiotlb ARM configs it will > not actually work. Really? It is the only condition that dma_info_to_prot() tests to decide of IOMMU_CACHE is used or not, so you are saying that there is a condition where a device can be attached to an iommu_domain and dev_is_dma_coherent() returns the wrong information? How does dma-iommu.c safely use it then? In any case I still need to do something about the places checking IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE works. Any idea? Jason
On Wed, Apr 06, 2022 at 02:56:56PM +0100, Robin Murphy wrote: > On 2022-04-05 17:16, Jason Gunthorpe wrote: > > vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct > > way to do this is via dev_is_dma_coherent() > > Not necessarily... > > Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems, > there's the more interesting effectively-opposite scenario where an SMMU > bridges non-coherent devices to a coherent interconnect. It's not something > we take advantage of yet in Linux, and it can only be properly described in > ACPI, but there do exist situations where IOMMU_CACHE is capable of making > the device's traffic snoop, but dev_is_dma_coherent() - and > device_get_dma_attr() for external users - would still say non-coherent > because they can't assume that the SMMU is enabled and programmed in just > the right way. Oh, I didn't know about device_get_dma_attr().. Considering your future issue, maybe this: /* * true if the given domain supports IOMMU_CACHE and when dev is attached to * that domain it will have coherent DMA and require no cache * maintenance when IOMMU_CACHE is used. */ bool iommu_domain_supports_coherent_dma(struct iommu_domain *domain, struct device *dev) { return device_get_dma_attr(dev) == DEV_DMA_COHERENT; } ? In future it could become a domain op and the SMMU driver could figure out the situation you described? Jason
On Wed, Apr 06, 2022 at 11:24:32AM -0300, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 02:56:56PM +0100, Robin Murphy wrote: > > On 2022-04-05 17:16, Jason Gunthorpe wrote: > > > vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct > > > way to do this is via dev_is_dma_coherent() > > > > Not necessarily... > > > > Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems, > > there's the more interesting effectively-opposite scenario where an SMMU > > bridges non-coherent devices to a coherent interconnect. It's not something > > we take advantage of yet in Linux, and it can only be properly described in > > ACPI, but there do exist situations where IOMMU_CACHE is capable of making > > the device's traffic snoop, but dev_is_dma_coherent() - and > > device_get_dma_attr() for external users - would still say non-coherent > > because they can't assume that the SMMU is enabled and programmed in just > > the right way. > > Oh, I didn't know about device_get_dma_attr().. > > Considering your future issue, maybe this: > > /* > * true if the given domain supports IOMMU_CACHE and when dev is attached to > * that domain it will have coherent DMA and require no cache > * maintenance when IOMMU_CACHE is used. > */ > bool iommu_domain_supports_coherent_dma(struct iommu_domain *domain, struct device *dev) > { > return device_get_dma_attr(dev) == DEV_DMA_COHERENT; > } > > ? In future it could become a domain op and the SMMU driver could > figure out the situation you described? I also spent some time looking at something like this: struct iommu_domain *iommu_domain_alloc_coherent(struct device *device) { if (device_get_dma_attr(device) == DEV_DMA_COHERENT) return NULL; return __iommu_domain_alloc(device->bus, IOMMU_DOMAIN_UNMANAGED); } EXPORT_SYMBOL_GPL(iommu_domain_alloc_coherent); Which could evolve into to passing the flag down to the iommu driver and then it could ensure SMMU is "programmed in just the right way" or fail? Could also go like this: #define IOMMU_DOMAIN_FLAG_COHERENT 1 struct iommu_domain *iommu_device_alloc_domain(struct device *device, unsigned int flags) A new alloc option is not so easy to fit VFIO into right now though. Advices? Thanks, Jason
On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote: > Really? It is the only condition that dma_info_to_prot() tests to > decide of IOMMU_CACHE is used or not, so you are saying that there is > a condition where a device can be attached to an iommu_domain and > dev_is_dma_coherent() returns the wrong information? How does > dma-iommu.c safely use it then? arm does not use dma-iommu.c
On 2022-04-06 15:14, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote: >> On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote: >>> Didn't see it >>> >>> I'll move dev_is_dma_coherent to device.h along with >>> device_iommu_mapped() and others then >> >> No. It it is internal for a reason. It also doesn't actually work >> outside of the dma core. E.g. for non-swiotlb ARM configs it will >> not actually work. > > Really? It is the only condition that dma_info_to_prot() tests to > decide of IOMMU_CACHE is used or not, so you are saying that there is > a condition where a device can be attached to an iommu_domain and > dev_is_dma_coherent() returns the wrong information? How does > dma-iommu.c safely use it then? The common iommu-dma layer happens to be part of the subset of the DMA core which *does* play the dev->dma_coherent game. 32-bit Arm has its own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do VFIO, but that would be another example if so. > In any case I still need to do something about the places checking > IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE > works. Any idea? Can we improve the IOMMU drivers such that that *can* be the case (within a reasonable margin of error)? That's kind of where I was hoping to head with device_iommu_capable(), e.g. [1]. Robin. [1] https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e
On Wed, Apr 06, 2022 at 05:47:23PM +0200, Christoph Hellwig wrote: > On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote: > > Really? It is the only condition that dma_info_to_prot() tests to > > decide of IOMMU_CACHE is used or not, so you are saying that there is > > a condition where a device can be attached to an iommu_domain and > > dev_is_dma_coherent() returns the wrong information? How does > > dma-iommu.c safely use it then? > > arm does not use dma-iommu.c Oh. I did not know that. Thanks Jason
On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > Oh, I didn't know about device_get_dma_attr()..
Which is completely broken for any non-OF, non-ACPI plaform.
On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > Oh, I didn't know about device_get_dma_attr().. > > Which is completely broken for any non-OF, non-ACPI plaform. I saw that, but I spent some time searching and could not find an iommu driver that would load independently of OF or ACPI. ie no IOMMU platform drivers are created by board files. Things like Intel/AMD discover only from ACPI, etc. So it might be OK in the constrained case of having an iommu_domain attached to the device being queried? Jason
On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > Oh, I didn't know about device_get_dma_attr().. > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > I saw that, but I spent some time searching and could not find an > iommu driver that would load independently of OF or ACPI. ie no IOMMU > platform drivers are created by board files. Things like Intel/AMD > discover only from ACPI, etc. s390?
On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > I saw that, but I spent some time searching and could not find an > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > platform drivers are created by board files. Things like Intel/AMD > > discover only from ACPI, etc. > > s390? Ah, I missed looking in s390, hyperv and virtio.. hyperv is not creating iommu_domains, just IRQ remapping virtio is using OF And s390 indeed doesn't obviously have OF or ACPI parts.. This seems like it would be consistent with other things: enum dev_dma_attr device_get_dma_attr(struct device *dev) { const struct fwnode_handle *fwnode = dev_fwnode(dev); struct acpi_device *adev = to_acpi_device_node(fwnode); if (is_of_node(fwnode)) { if (of_dma_is_coherent(to_of_node(fwnode))) return DEV_DMA_COHERENT; return DEV_DMA_NON_COHERENT; } else if (adev) { return acpi_get_dma_attr(adev); } /* Platform is always DMA coherent */ if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) && !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) && device_iommu_mapped(dev)) return DEV_DMA_COHERENT; return DEV_DMA_NOT_SUPPORTED; } EXPORT_SYMBOL_GPL(device_get_dma_attr); ie s390 has no of or acpi but the entire platform is known DMA coherent at config time so allow it. Not sure we need the device_iommu_mapped() or not. We could alternatively use existing device_get_dma_attr() as a default with an iommu wrapper and push the exception down through the iommu driver and s390 can override it. Thanks, Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, April 7, 2022 1:17 AM > > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > I saw that, but I spent some time searching and could not find an > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > platform drivers are created by board files. Things like Intel/AMD > > > discover only from ACPI, etc. Intel discovers IOMMUs (and optionally ACPI namespace devices) from ACPI, but there is no ACPI description for PCI devices i.e. the current logic of device_get_dma_attr() cannot be used on PCI devices. > > > > s390? > > Ah, I missed looking in s390, hyperv and virtio.. > > hyperv is not creating iommu_domains, just IRQ remapping > > virtio is using OF > > And s390 indeed doesn't obviously have OF or ACPI parts.. > > This seems like it would be consistent with other things: > > enum dev_dma_attr device_get_dma_attr(struct device *dev) > { > const struct fwnode_handle *fwnode = dev_fwnode(dev); > struct acpi_device *adev = to_acpi_device_node(fwnode); > > if (is_of_node(fwnode)) { > if (of_dma_is_coherent(to_of_node(fwnode))) > return DEV_DMA_COHERENT; > return DEV_DMA_NON_COHERENT; > } else if (adev) { > return acpi_get_dma_attr(adev); > } > > /* Platform is always DMA coherent */ > if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) && > device_iommu_mapped(dev)) > return DEV_DMA_COHERENT; > return DEV_DMA_NOT_SUPPORTED; > } > EXPORT_SYMBOL_GPL(device_get_dma_attr); > > ie s390 has no of or acpi but the entire platform is known DMA > coherent at config time so allow it. Not sure we need the > device_iommu_mapped() or not. Probably not. If adding an iommu may change the coherency it would come from specific IOPTE attributes for a device. The fact whether the device is mapped by iommu doesn't tell that information. > > We could alternatively use existing device_get_dma_attr() as a default > with an iommu wrapper and push the exception down through the iommu > driver and s390 can override it. > if going this way probably device_get_dma_attr() should be renamed to device_fwnode_get_dma_attr() instead to make it clearer? Thanks Kevin
On Wed, 2022-04-06 at 14:17 -0300, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > I saw that, but I spent some time searching and could not find an > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > platform drivers are created by board files. Things like Intel/AMD > > > discover only from ACPI, etc. > > > > s390? > > Ah, I missed looking in s390, hyperv and virtio.. > > hyperv is not creating iommu_domains, just IRQ remapping > > virtio is using OF > > And s390 indeed doesn't obviously have OF or ACPI parts.. > > This seems like it would be consistent with other things: > > enum dev_dma_attr device_get_dma_attr(struct device *dev) > { > const struct fwnode_handle *fwnode = dev_fwnode(dev); > struct acpi_device *adev = to_acpi_device_node(fwnode); > > if (is_of_node(fwnode)) { > if (of_dma_is_coherent(to_of_node(fwnode))) > return DEV_DMA_COHERENT; > return DEV_DMA_NON_COHERENT; > } else if (adev) { > return acpi_get_dma_attr(adev); > } > > /* Platform is always DMA coherent */ > if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) && > device_iommu_mapped(dev)) > return DEV_DMA_COHERENT; > return DEV_DMA_NOT_SUPPORTED; > } > EXPORT_SYMBOL_GPL(device_get_dma_attr); > > ie s390 has no of or acpi but the entire platform is known DMA > coherent at config time so allow it. Not sure we need the > device_iommu_mapped() or not. I only took a short look but I think the device_iommu_mapped() call in there is wrong. On s390 PCI always goes through IOMMU hardware both when using the IOMMU API and when using the DMA API and this hardware is always coherent. This is even true for s390 guests in QEMU/KVM and under the z/VM hypervisor. As far as I can see device_iommu_mapped()'s check for dev->iommu_group would only work while the device is under IOMMU API control not DMA API, no? Also, while it is true that s390 *hardware* devices are always cache coherent there is also the case that SWIOTLB is used for protected virtualization and then cache flushing APIs must be used.
On Thu, Apr 07, 2022 at 07:18:48AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, April 7, 2022 1:17 AM > > > > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > > > I saw that, but I spent some time searching and could not find an > > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > > platform drivers are created by board files. Things like Intel/AMD > > > > discover only from ACPI, etc. > > Intel discovers IOMMUs (and optionally ACPI namespace devices) from > ACPI, but there is no ACPI description for PCI devices i.e. the current > logic of device_get_dma_attr() cannot be used on PCI devices. Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or DEV_DMA_NOT_SUPPORTED? I think I should give up on this and just redefine the existing iommu cap flag to IOMMU_CAP_CACHE_SUPPORTED or something. > > We could alternatively use existing device_get_dma_attr() as a default > > with an iommu wrapper and push the exception down through the iommu > > driver and s390 can override it. > > > > if going this way probably device_get_dma_attr() should be renamed to > device_fwnode_get_dma_attr() instead to make it clearer? I'm looking at the few users: drivers/ata/ahci_ceva.c drivers/ata/ahci_qoriq.c - These are ARM only drivers. They are trying to copy the dma-coherent property from its DT/ACPI definition to internal register settings which look like they tune how the AXI bus transactions are created. I'm guessing the SATA IP block's AXI interface can be configured to generate coherent or non-coherent requests and it has to be set in a way that is consistent with the SOC architecture and match what the DMA API expects the device will do. drivers/crypto/ccp/sp-platform.c - Only used on ARM64 and also programs a HW register similar to the sata drivers. Refuses to work if the FW property is not present. drivers/net/ethernet/amd/xgbe/xgbe-platform.c - Seems to be configuring another ARM AXI block drivers/gpu/drm/panfrost/panfrost_drv.c - Robin's commit comment here is good, and one of the things this controls is if the coherent_walk is set for the io-pgtable-arm.c code which avoids DMA API calls drivers/gpu/drm/tegra/uapi.c - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea. My take is that the drivers using this API are doing it to make sure their HW blocks are setup in a way that is consistent with the DMA API they are also using, and run in constrained embedded-style environments that know the firmware support is present. So in the end it does not seem suitable right now for linking to IOMMU_CACHE.. Jason
On 2022-04-07 14:59, Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 07:18:48AM +0000, Tian, Kevin wrote: >>> From: Jason Gunthorpe <jgg@nvidia.com> >>> Sent: Thursday, April 7, 2022 1:17 AM >>> >>> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: >>>> On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: >>>>> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: >>>>>> On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: >>>>>>>> Oh, I didn't know about device_get_dma_attr().. >>>>>> >>>>>> Which is completely broken for any non-OF, non-ACPI plaform. >>>>> >>>>> I saw that, but I spent some time searching and could not find an >>>>> iommu driver that would load independently of OF or ACPI. ie no IOMMU >>>>> platform drivers are created by board files. Things like Intel/AMD >>>>> discover only from ACPI, etc. >> >> Intel discovers IOMMUs (and optionally ACPI namespace devices) from >> ACPI, but there is no ACPI description for PCI devices i.e. the current >> logic of device_get_dma_attr() cannot be used on PCI devices. > > Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or > DEV_DMA_NOT_SUPPORTED? I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA method was actually present to say otherwise), based on acpi_init_coherency(), but I only know for sure what happens on arm64. > I think I should give up on this and just redefine the existing iommu > cap flag to IOMMU_CAP_CACHE_SUPPORTED or something. TBH I don't see any issue with current name, but I'd certainly be happy to nail down a specific definition for it, along the lines of "this means that IOMMU_CACHE mappings are generally coherent". That works for things like Arm's S2FWB making it OK to assign an otherwise-non-coherent device without extra hassle. For the specific case of overriding PCIe No Snoop (which is more problematic from an Arm SMMU PoV) when assigning to a VM, would that not be easier solved by just having vfio-pci clear the "Enable No Snoop" control bit in the endpoint's PCIe capability? >>> We could alternatively use existing device_get_dma_attr() as a default >>> with an iommu wrapper and push the exception down through the iommu >>> driver and s390 can override it. >>> >> >> if going this way probably device_get_dma_attr() should be renamed to >> device_fwnode_get_dma_attr() instead to make it clearer? > > I'm looking at the few users: > > drivers/ata/ahci_ceva.c > drivers/ata/ahci_qoriq.c > - These are ARM only drivers. They are trying to copy the dma-coherent > property from its DT/ACPI definition to internal register settings > which look like they tune how the AXI bus transactions are created. > > I'm guessing the SATA IP block's AXI interface can be configured to > generate coherent or non-coherent requests and it has to be set > in a way that is consistent with the SOC architecture and match > what the DMA API expects the device will do. > > drivers/crypto/ccp/sp-platform.c > - Only used on ARM64 and also programs a HW register similar to the > sata drivers. Refuses to work if the FW property is not present. > > drivers/net/ethernet/amd/xgbe/xgbe-platform.c > - Seems to be configuring another ARM AXI block > > drivers/gpu/drm/panfrost/panfrost_drv.c > - Robin's commit comment here is good, and one of the things this > controls is if the coherent_walk is set for the io-pgtable-arm.c > code which avoids DMA API calls > > drivers/gpu/drm/tegra/uapi.c > - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea. > > My take is that the drivers using this API are doing it to make sure > their HW blocks are setup in a way that is consistent with the DMA API > they are also using, and run in constrained embedded-style > environments that know the firmware support is present. > > So in the end it does not seem suitable right now for linking to > IOMMU_CACHE.. That seems a pretty good summary - I think they're basically all "firmware told Linux I'm coherent so I'd better act coherent" cases, but that still doesn't necessarily mean that they're *forced* to respect that. One of the things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent devices, primarily to hook up in Panfrost (where there is a bit of a performance to claw back on the coherent AmLogic SoCs by leaving certain buffers non-cacheable). Cheers, Robin.
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: > For the specific case of overriding PCIe No Snoop (which is more problematic > from an Arm SMMU PoV) when assigning to a VM, would that not be easier > solved by just having vfio-pci clear the "Enable No Snoop" control bit in > the endpoint's PCIe capability? Ideally. That was rediscussed recently, apparently there are non-compliant devices and drivers that just ignore the bit. Presumably this is why x86 had to move to an IOMMU enforced feature.. > That seems a pretty good summary - I think they're basically all "firmware > told Linux I'm coherent so I'd better act coherent" cases, but that still > doesn't necessarily mean that they're *forced* to respect that. One of the > things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force > DMA cache maintenance for coherent devices, primarily to hook up in Panfrost > (where there is a bit of a performance to claw back on the coherent AmLogic > SoCs by leaving certain buffers non-cacheable). It would be great to see that in a way that could bring in the few other GPU drivers doing no-snoop to a formal DMA API instead of hacking their own stuff with wbinvd calls or whatever. Thanks, Jason
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: >> My take is that the drivers using this API are doing it to make sure >> their HW blocks are setup in a way that is consistent with the DMA API >> they are also using, and run in constrained embedded-style >> environments that know the firmware support is present. >> >> So in the end it does not seem suitable right now for linking to >> IOMMU_CACHE.. > > That seems a pretty good summary - I think they're basically all "firmware > told Linux I'm coherent so I'd better act coherent" cases, but that still > doesn't necessarily mean that they're *forced* to respect that. Yes. And the interface is horribly misnamed for that. I'll see what I can do to clean this up as I've noticed various other not very nice things in that area. > One of the > things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force > DMA cache maintenance for coherent devices, primarily to hook up in > Panfrost (where there is a bit of a performance to claw back on the > coherent AmLogic SoCs by leaving certain buffers non-cacheable). This has been an explicit request from the amdgpu folks and thus been on my TODO list for quite a while as well. Note that I don't think it should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages as the drivers that want non-snoop generally also want to actually be able to deal with pages.
On Thu, 7 Apr 2022 12:23:31 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: > > > For the specific case of overriding PCIe No Snoop (which is more problematic > > from an Arm SMMU PoV) when assigning to a VM, would that not be easier > > solved by just having vfio-pci clear the "Enable No Snoop" control bit in > > the endpoint's PCIe capability? > > Ideally. > > That was rediscussed recently, apparently there are non-compliant > devices and drivers that just ignore the bit. > > Presumably this is why x86 had to move to an IOMMU enforced feature.. I considered this option when implementing the current solution, but ultimately I didn't have confidence in being able to prevent drivers from using device specific means to effect the change anyway. GPUs especially have various back channels to config space. Thanks, Alex
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 760b254ba42d6b..24d118198ac756 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -42,6 +42,7 @@ #include <linux/list.h> #include <linux/pci.h> #include <rdma/ib_verbs.h> +#include <linux/dma-map-ops.h> #include "usnic_log.h" #include "usnic_uiom.h" @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) struct usnic_uiom_dev *uiom_dev; int err; + if (!dev_is_dma_coherent(dev)) { + usnic_err("IOMMU of %s does not support cache coherency\n", + dev_name(dev)); + return -EINVAL; + } + uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC); if (!uiom_dev) return -ENOMEM; @@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) if (err) goto out_free_dev; - if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) { - usnic_err("IOMMU of %s does not support cache coherency\n", - dev_name(dev)); - err = -EINVAL; - goto out_detach_device; - } - spin_lock(&pd->lock); list_add_tail(&uiom_dev->link, &pd->devs); pd->dev_cnt++; @@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) return 0; -out_detach_device: - iommu_detach_device(pd->domain, dev); out_free_dev: kfree(uiom_dev); return err; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 4c2f0bd062856a..05ea5800febc37 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -22,6 +22,7 @@ #include <linux/vdpa.h> #include <linux/nospec.h> #include <linux/vhost.h> +#include <linux/dma-map-ops.h> #include "vhost.h" @@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v) if (!bus) return -EFAULT; - if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY)) + if (!dev_is_dma_coherent(dma_dev)) return -ENOTSUPP; v->domain = iommu_domain_alloc(bus);
vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct way to do this is via dev_is_dma_coherent() like the DMA API does. If IOMMU_CACHE is not supported then these drivers won't work as they don't call any coherency-restoring routines around their DMAs. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --- drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++--------- drivers/vhost/vdpa.c | 3 ++- 2 files changed, 9 insertions(+), 10 deletions(-)