diff mbox series

iommu/dma: Explicitly sort PCI DMA windows

Message ID 65657c5370fa0161739ba094ea948afdfa711b8a.1647967875.git.robin.murphy@arm.com (mailing list archive)
State Superseded
Headers show
Series iommu/dma: Explicitly sort PCI DMA windows | expand

Commit Message

Robin Murphy March 22, 2022, 5:27 p.m. UTC
Originally, creating the dma_ranges resource list in pre-sorted fashion
was the simplest and most efficient way to enforce the order required by
iova_reserve_pci_windows(). However since then at least one PCI host
driver is now re-sorting the list for its own probe-time processing,
which doesn't seem entirely unreasonable, so that basic assumption no
longer holds. Make iommu-dma robust and get the sort order it needs by
explicitly sorting, which means we can also save the effort at creation
time and just build the list in whatever natural order the DT had.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

Looking at this area off the back of the XGene thread[1] made me realise
that we need to do it anyway, regardless of whether it might also happen
to restore the previous XGene behaviour or not. Presumably nobody's
tried to use pcie-cadence-host behind an IOMMU yet...

Boot-tested on Juno to make sure I hadn't got the sort comparison
backwards.

Robin.

[1] https://lore.kernel.org/linux-pci/20220321104843.949645-1-maz@kernel.org/

 drivers/iommu/dma-iommu.c | 13 ++++++++++++-
 drivers/pci/of.c          |  7 +------
 2 files changed, 13 insertions(+), 7 deletions(-)

Comments

Marc Zyngier March 23, 2022, 9:49 a.m. UTC | #1
On Tue, 22 Mar 2022 17:27:36 +0000,
Robin Murphy <robin.murphy@arm.com> wrote:
> 
> Originally, creating the dma_ranges resource list in pre-sorted fashion
> was the simplest and most efficient way to enforce the order required by
> iova_reserve_pci_windows(). However since then at least one PCI host
> driver is now re-sorting the list for its own probe-time processing,
> which doesn't seem entirely unreasonable, so that basic assumption no
> longer holds. Make iommu-dma robust and get the sort order it needs by
> explicitly sorting, which means we can also save the effort at creation
> time and just build the list in whatever natural order the DT had.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> Looking at this area off the back of the XGene thread[1] made me realise
> that we need to do it anyway, regardless of whether it might also happen
> to restore the previous XGene behaviour or not. Presumably nobody's
> tried to use pcie-cadence-host behind an IOMMU yet...

This definitely restores PCIe functionality on my Mustang (XGene-1).
Hopefully dann can comment on whether this addresses his own issue, as
his firmware is significantly different.

Thanks,

	M.
dann frazier March 23, 2022, 10:15 p.m. UTC | #2
On Wed, Mar 23, 2022 at 09:49:04AM +0000, Marc Zyngier wrote:
> On Tue, 22 Mar 2022 17:27:36 +0000,
> Robin Murphy <robin.murphy@arm.com> wrote:
> > 
> > Originally, creating the dma_ranges resource list in pre-sorted fashion
> > was the simplest and most efficient way to enforce the order required by
> > iova_reserve_pci_windows(). However since then at least one PCI host
> > driver is now re-sorting the list for its own probe-time processing,
> > which doesn't seem entirely unreasonable, so that basic assumption no
> > longer holds. Make iommu-dma robust and get the sort order it needs by
> > explicitly sorting, which means we can also save the effort at creation
> > time and just build the list in whatever natural order the DT had.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> > 
> > Looking at this area off the back of the XGene thread[1] made me realise
> > that we need to do it anyway, regardless of whether it might also happen
> > to restore the previous XGene behaviour or not. Presumably nobody's
> > tried to use pcie-cadence-host behind an IOMMU yet...
> 
> This definitely restores PCIe functionality on my Mustang (XGene-1).
> Hopefully dann can comment on whether this addresses his own issue, as
> his firmware is significantly different.

Robin, Marc,

Adding just this patch on top of v5.17 (w/ vendor dtb) isn't enough to
fix m400 networking:

  https://paste.ubuntu.com/p/H5ZNbRvP8V/

  -dann
Rob Herring March 24, 2022, 12:55 a.m. UTC | #3
On Wed, Mar 23, 2022 at 5:15 PM dann frazier <dann.frazier@canonical.com> wrote:
>
> On Wed, Mar 23, 2022 at 09:49:04AM +0000, Marc Zyngier wrote:
> > On Tue, 22 Mar 2022 17:27:36 +0000,
> > Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > Originally, creating the dma_ranges resource list in pre-sorted fashion
> > > was the simplest and most efficient way to enforce the order required by
> > > iova_reserve_pci_windows(). However since then at least one PCI host
> > > driver is now re-sorting the list for its own probe-time processing,
> > > which doesn't seem entirely unreasonable, so that basic assumption no
> > > longer holds. Make iommu-dma robust and get the sort order it needs by
> > > explicitly sorting, which means we can also save the effort at creation
> > > time and just build the list in whatever natural order the DT had.
> > >
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >
> > > Looking at this area off the back of the XGene thread[1] made me realise
> > > that we need to do it anyway, regardless of whether it might also happen
> > > to restore the previous XGene behaviour or not. Presumably nobody's
> > > tried to use pcie-cadence-host behind an IOMMU yet...
> >
> > This definitely restores PCIe functionality on my Mustang (XGene-1).
> > Hopefully dann can comment on whether this addresses his own issue, as
> > his firmware is significantly different.
>
> Robin, Marc,
>
> Adding just this patch on top of v5.17 (w/ vendor dtb) isn't enough to
> fix m400 networking:

I wouldn't expect it to given both the IB register selection changed
and the 2nd dma-ranges entry is ignored.

Can you (and others) try out this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git xgene-pci-fix

It should maintain the same IB register usage for both cases and
handle the error in 'dma-ranges'.

Rob
Rob Herring March 24, 2022, 12:56 a.m. UTC | #4
On Tue, Mar 22, 2022 at 12:27 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Originally, creating the dma_ranges resource list in pre-sorted fashion
> was the simplest and most efficient way to enforce the order required by
> iova_reserve_pci_windows(). However since then at least one PCI host
> driver is now re-sorting the list for its own probe-time processing,
> which doesn't seem entirely unreasonable, so that basic assumption no
> longer holds. Make iommu-dma robust and get the sort order it needs by
> explicitly sorting, which means we can also save the effort at creation
> time and just build the list in whatever natural order the DT had.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> Looking at this area off the back of the XGene thread[1] made me realise
> that we need to do it anyway, regardless of whether it might also happen
> to restore the previous XGene behaviour or not. Presumably nobody's
> tried to use pcie-cadence-host behind an IOMMU yet...
>
> Boot-tested on Juno to make sure I hadn't got the sort comparison
> backwards.
>
> Robin.
>
> [1] https://lore.kernel.org/linux-pci/20220321104843.949645-1-maz@kernel.org/
>
>  drivers/iommu/dma-iommu.c | 13 ++++++++++++-
>  drivers/pci/of.c          |  7 +------
>  2 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index b22034975301..91d134c0c9b1 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -20,6 +20,7 @@
>  #include <linux/iommu.h>
>  #include <linux/iova.h>
>  #include <linux/irq.h>
> +#include <linux/list_sort.h>
>  #include <linux/mm.h>
>  #include <linux/mutex.h>
>  #include <linux/pci.h>
> @@ -414,6 +415,15 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>         return 0;
>  }
>
> +static int iommu_dma_ranges_sort(void *priv, const struct list_head *a,
> +               const struct list_head *b)
> +{
> +       struct resource_entry *res_a = list_entry(a, typeof(*res_a), node);
> +       struct resource_entry *res_b = list_entry(b, typeof(*res_b), node);
> +
> +       return res_a->res->start > res_b->res->start;
> +}
> +
>  static int iova_reserve_pci_windows(struct pci_dev *dev,
>                 struct iova_domain *iovad)
>  {
> @@ -432,6 +442,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>         }
>
>         /* Get reserved DMA windows from host bridge */
> +       list_sort(NULL, &bridge->dma_ranges, iommu_dma_ranges_sort);
>         resource_list_for_each_entry(window, &bridge->dma_ranges) {
>                 end = window->res->start - window->offset;
>  resv_iova:
> @@ -440,7 +451,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>                         hi = iova_pfn(iovad, end);
>                         reserve_iova(iovad, lo, hi);
>                 } else if (end < start) {
> -                       /* dma_ranges list should be sorted */
> +                       /* DMA ranges should be non-overlapping */
>                         dev_err(&dev->dev,
>                                 "Failed to reserve IOVA [%pa-%pa]\n",
>                                 &start, &end);
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index cb2e8351c2cc..d176b4bc6193 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -393,12 +393,7 @@ static int devm_of_pci_get_host_bridge_resources(struct device *dev,
>                         goto failed;
>                 }
>
> -               /* Keep the resource list sorted */
> -               resource_list_for_each_entry(entry, ib_resources)
> -                       if (entry->res->start > res->start)
> -                               break;
> -
> -               pci_add_resource_offset(&entry->node, res,

entry is now unused and causes a warning.

> +               pci_add_resource_offset(ib_resources, res,
>                                         res->start - range.pci_addr);
>         }
>
> --
> 2.28.0.dirty
>
dann frazier March 24, 2022, 3:09 a.m. UTC | #5
On Wed, Mar 23, 2022 at 07:55:23PM -0500, Rob Herring wrote:
> On Wed, Mar 23, 2022 at 5:15 PM dann frazier <dann.frazier@canonical.com> wrote:
> >
> > On Wed, Mar 23, 2022 at 09:49:04AM +0000, Marc Zyngier wrote:
> > > On Tue, 22 Mar 2022 17:27:36 +0000,
> > > Robin Murphy <robin.murphy@arm.com> wrote:
> > > >
> > > > Originally, creating the dma_ranges resource list in pre-sorted fashion
> > > > was the simplest and most efficient way to enforce the order required by
> > > > iova_reserve_pci_windows(). However since then at least one PCI host
> > > > driver is now re-sorting the list for its own probe-time processing,
> > > > which doesn't seem entirely unreasonable, so that basic assumption no
> > > > longer holds. Make iommu-dma robust and get the sort order it needs by
> > > > explicitly sorting, which means we can also save the effort at creation
> > > > time and just build the list in whatever natural order the DT had.
> > > >
> > > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > > ---
> > > >
> > > > Looking at this area off the back of the XGene thread[1] made me realise
> > > > that we need to do it anyway, regardless of whether it might also happen
> > > > to restore the previous XGene behaviour or not. Presumably nobody's
> > > > tried to use pcie-cadence-host behind an IOMMU yet...
> > >
> > > This definitely restores PCIe functionality on my Mustang (XGene-1).
> > > Hopefully dann can comment on whether this addresses his own issue, as
> > > his firmware is significantly different.
> >
> > Robin, Marc,
> >
> > Adding just this patch on top of v5.17 (w/ vendor dtb) isn't enough to
> > fix m400 networking:
> 
> I wouldn't expect it to given both the IB register selection changed
> and the 2nd dma-ranges entry is ignored.
> 
> Can you (and others) try out this branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git xgene-pci-fix
> 
> It should maintain the same IB register usage for both cases and
> handle the error in 'dma-ranges'.

Looks good Rob :)

https://paste.ubuntu.com/p/zJF9PKhQpS/


  -dann
Robin Murphy March 24, 2022, 10:08 a.m. UTC | #6
On 2022-03-24 00:56, Rob Herring wrote:
> On Tue, Mar 22, 2022 at 12:27 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Originally, creating the dma_ranges resource list in pre-sorted fashion
>> was the simplest and most efficient way to enforce the order required by
>> iova_reserve_pci_windows(). However since then at least one PCI host
>> driver is now re-sorting the list for its own probe-time processing,
>> which doesn't seem entirely unreasonable, so that basic assumption no
>> longer holds. Make iommu-dma robust and get the sort order it needs by
>> explicitly sorting, which means we can also save the effort at creation
>> time and just build the list in whatever natural order the DT had.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> Looking at this area off the back of the XGene thread[1] made me realise
>> that we need to do it anyway, regardless of whether it might also happen
>> to restore the previous XGene behaviour or not. Presumably nobody's
>> tried to use pcie-cadence-host behind an IOMMU yet...
>>
>> Boot-tested on Juno to make sure I hadn't got the sort comparison
>> backwards.
>>
>> Robin.
>>
>> [1] https://lore.kernel.org/linux-pci/20220321104843.949645-1-maz@kernel.org/
>>
>>   drivers/iommu/dma-iommu.c | 13 ++++++++++++-
>>   drivers/pci/of.c          |  7 +------
>>   2 files changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index b22034975301..91d134c0c9b1 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -20,6 +20,7 @@
>>   #include <linux/iommu.h>
>>   #include <linux/iova.h>
>>   #include <linux/irq.h>
>> +#include <linux/list_sort.h>
>>   #include <linux/mm.h>
>>   #include <linux/mutex.h>
>>   #include <linux/pci.h>
>> @@ -414,6 +415,15 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>>          return 0;
>>   }
>>
>> +static int iommu_dma_ranges_sort(void *priv, const struct list_head *a,
>> +               const struct list_head *b)
>> +{
>> +       struct resource_entry *res_a = list_entry(a, typeof(*res_a), node);
>> +       struct resource_entry *res_b = list_entry(b, typeof(*res_b), node);
>> +
>> +       return res_a->res->start > res_b->res->start;
>> +}
>> +
>>   static int iova_reserve_pci_windows(struct pci_dev *dev,
>>                  struct iova_domain *iovad)
>>   {
>> @@ -432,6 +442,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>>          }
>>
>>          /* Get reserved DMA windows from host bridge */
>> +       list_sort(NULL, &bridge->dma_ranges, iommu_dma_ranges_sort);
>>          resource_list_for_each_entry(window, &bridge->dma_ranges) {
>>                  end = window->res->start - window->offset;
>>   resv_iova:
>> @@ -440,7 +451,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>>                          hi = iova_pfn(iovad, end);
>>                          reserve_iova(iovad, lo, hi);
>>                  } else if (end < start) {
>> -                       /* dma_ranges list should be sorted */
>> +                       /* DMA ranges should be non-overlapping */
>>                          dev_err(&dev->dev,
>>                                  "Failed to reserve IOVA [%pa-%pa]\n",
>>                                  &start, &end);
>> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
>> index cb2e8351c2cc..d176b4bc6193 100644
>> --- a/drivers/pci/of.c
>> +++ b/drivers/pci/of.c
>> @@ -393,12 +393,7 @@ static int devm_of_pci_get_host_bridge_resources(struct device *dev,
>>                          goto failed;
>>                  }
>>
>> -               /* Keep the resource list sorted */
>> -               resource_list_for_each_entry(entry, ib_resources)
>> -                       if (entry->res->start > res->start)
>> -                               break;
>> -
>> -               pci_add_resource_offset(&entry->node, res,
> 
> entry is now unused and causes a warning.

Sigh, seems the problem with CONFIG_WERROR is that once you think it's 
enabled, you then stop paying much attention to the build log...

Thanks for the catch,
Robin.

> 
>> +               pci_add_resource_offset(ib_resources, res,
>>                                          res->start - range.pci_addr);
>>          }
>>
>> --
>> 2.28.0.dirty
>>
diff mbox series

Patch

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b22034975301..91d134c0c9b1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@ 
 #include <linux/iommu.h>
 #include <linux/iova.h>
 #include <linux/irq.h>
+#include <linux/list_sort.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
@@ -414,6 +415,15 @@  static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
 	return 0;
 }
 
+static int iommu_dma_ranges_sort(void *priv, const struct list_head *a,
+		const struct list_head *b)
+{
+	struct resource_entry *res_a = list_entry(a, typeof(*res_a), node);
+	struct resource_entry *res_b = list_entry(b, typeof(*res_b), node);
+
+	return res_a->res->start > res_b->res->start;
+}
+
 static int iova_reserve_pci_windows(struct pci_dev *dev,
 		struct iova_domain *iovad)
 {
@@ -432,6 +442,7 @@  static int iova_reserve_pci_windows(struct pci_dev *dev,
 	}
 
 	/* Get reserved DMA windows from host bridge */
+	list_sort(NULL, &bridge->dma_ranges, iommu_dma_ranges_sort);
 	resource_list_for_each_entry(window, &bridge->dma_ranges) {
 		end = window->res->start - window->offset;
 resv_iova:
@@ -440,7 +451,7 @@  static int iova_reserve_pci_windows(struct pci_dev *dev,
 			hi = iova_pfn(iovad, end);
 			reserve_iova(iovad, lo, hi);
 		} else if (end < start) {
-			/* dma_ranges list should be sorted */
+			/* DMA ranges should be non-overlapping */
 			dev_err(&dev->dev,
 				"Failed to reserve IOVA [%pa-%pa]\n",
 				&start, &end);
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index cb2e8351c2cc..d176b4bc6193 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -393,12 +393,7 @@  static int devm_of_pci_get_host_bridge_resources(struct device *dev,
 			goto failed;
 		}
 
-		/* Keep the resource list sorted */
-		resource_list_for_each_entry(entry, ib_resources)
-			if (entry->res->start > res->start)
-				break;
-
-		pci_add_resource_offset(&entry->node, res,
+		pci_add_resource_offset(ib_resources, res,
 					res->start - range.pci_addr);
 	}