diff mbox series

xen/arm: domain_build: Ignore device nodes with invalid addresses

Message ID YBmQQ3Tzu++AadKx@mattapan.m5p.com (mailing list archive)
State New
Headers show
Series xen/arm: domain_build: Ignore device nodes with invalid addresses | expand

Commit Message

Elliott Mitchell Feb. 2, 2021, 5:47 p.m. UTC
The handle_device() function has been returning failure upon
encountering a device address which was invalid.  A device tree which
had such an entry has now been seen in the wild.  As it causes no
failures to simply ignore the entries, ignore them.

Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>

---
I'm starting to suspect there are an awful lot of places in the various
domain_build.c files which should simply ignore errors.  This is now the
second place I've encountered in 2 months where ignoring errors was the
correct action.  I know failing in case of error is an engineer's
favorite approach, but there seem an awful lot of harmless failures
causing panics.

This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
empty memory bank".  Now it seems clear the correct approach is to simply
ignore these entries.

This seems a good candidate for backport to 4.14 and certainly should be
in 4.15.
---
 xen/arch/arm/domain_build.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Julien Grall Feb. 2, 2021, 6:12 p.m. UTC | #1
Hi,

On 02/02/2021 17:47, Elliott Mitchell wrote:
> The handle_device() function has been returning failure upon
> encountering a device address which was invalid.  A device tree which
> had such an entry has now been seen in the wild.  As it causes no
> failures to simply ignore the entries, ignore them. >
> Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>
> 
> ---
> I'm starting to suspect there are an awful lot of places in the various
> domain_build.c files which should simply ignore errors.  This is now the
> second place I've encountered in 2 months where ignoring errors was the
> correct action.

Right, as a counterpoint, we run Xen on Arm HW for several years now and 
this is the first time I heard about issue parsing the DT. So while I 
appreciate that you are eager to run Xen on the RPI...

>  I know failing in case of error is an engineer's
> favorite approach, but there seem an awful lot of harmless failures
> causing panics.
> 
> This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
> empty memory bank".  Now it seems clear the correct approach is to simply
> ignore these entries.

... we first need to fully understand the issues. Here a few questions:
    1) Can you provide more information why you believe the address is 
invalid?
    2) How does Linux use the node?
    3) Is it happening with all the RPI DT? If not, what are the 
differences?

> 
> This seems a good candidate for backport to 4.14 and certainly should be
> in 4.15.
> ---
>   xen/arch/arm/domain_build.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 374bf655ee..c0568b7579 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1407,9 +1407,9 @@ static int __init handle_device(struct domain *d, struct dt_device_node *dev,
>           res = dt_device_get_address(dev, i, &addr, &size);
>           if ( res )
>           {
> -            printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> -                   i, dt_node_full_name(dev));
> -            return res;
> +            printk(XENLOG_ERR "Unable to retrieve address of %s, index %u\n",
> +                   dt_node_full_name(dev), i);
> +            continue;
>           }
>   
>           res = map_range_to_domain(dev, addr, size, &mr_data);
> 

Cheers,
Julien Grall Feb. 2, 2021, 7:21 p.m. UTC | #2
On 02/02/2021 18:12, Julien Grall wrote:
> Hi,
> 
> On 02/02/2021 17:47, Elliott Mitchell wrote:
>> The handle_device() function has been returning failure upon
>> encountering a device address which was invalid.  A device tree which
>> had such an entry has now been seen in the wild.  As it causes no
>> failures to simply ignore the entries, ignore them. >
>> Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>
>>
>> ---
>> I'm starting to suspect there are an awful lot of places in the various
>> domain_build.c files which should simply ignore errors.  This is now the
>> second place I've encountered in 2 months where ignoring errors was the
>> correct action.
> 
> Right, as a counterpoint, we run Xen on Arm HW for several years now and 
> this is the first time I heard about issue parsing the DT. So while I 
> appreciate that you are eager to run Xen on the RPI...
> 
>>  I know failing in case of error is an engineer's
>> favorite approach, but there seem an awful lot of harmless failures
>> causing panics.
>>
>> This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
>> empty memory bank".  Now it seems clear the correct approach is to simply
>> ignore these entries.
> 
> ... we first need to fully understand the issues. Here a few questions:
>     1) Can you provide more information why you believe the address is 
> invalid?
>     2) How does Linux use the node?
>     3) Is it happening with all the RPI DT? If not, what are the 
> differences?

So I had another look at the device-tree you provided earlier on. The 
node is the following (copied directly from the DTS):

&pcie0 {
         pci@1,0 {
                 #address-cells = <3>;
                 #size-cells = <2>;
                 ranges;

                 reg = <0 0 0 0 0>;

                 usb@1,0 {
                         reg = <0x10000 0 0 0 0>;
                         resets = <&reset 
RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
                 };
         };
};

pcie0: pcie@7d500000 {
    compatible = "brcm,bcm2711-pcie";
    reg = <0x0 0x7d500000  0x0 0x9310>;
    device_type = "pci";
    #address-cells = <3>;
    #interrupt-cells = <1>;
    #size-cells = <2>;
    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
    interrupt-names = "pcie", "msi";
    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
                                                      IRQ_TYPE_LEVEL_HIGH>;
    msi-controller;
    msi-parent = <&pcie0>;

    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
              0x0 0x40000000>;
    /*
     * The wrapper around the PCIe block has a bug
     * preventing it from accessing beyond the first 3GB of
     * memory.
     */
    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
                  0x0 0xc0000000>;
    brcm,enable-ssc;
};

The interpretation of "reg" depends on the context. In this case, we are 
trying to interpret as a memory address from the CPU PoV when it has a 
different meaning (I am not exactly sure what).

In fact, you are lucky that Xen doesn't manage to interpret it. Xen 
should really stop trying to look region to map when it discover a PCI 
bus. I wrote a quick hack patch that should ignore it:

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 374bf655ee34..937fd1e387b7 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1426,7 +1426,7 @@ static int __init handle_device(struct domain *d, 
struct dt_device_node *dev,

  static int __init handle_node(struct domain *d, struct kernel_info *kinfo,
                                struct dt_device_node *node,
-                              p2m_type_t p2mt)
+                              p2m_type_t p2mt, bool pci_bus)
  {
      static const struct dt_device_match skip_matches[] __initconst =
      {
@@ -1532,9 +1532,14 @@ static int __init handle_node(struct domain *d, 
struct kernel_info *kinfo,
                 "WARNING: Path %s is reserved, skip the node as we may 
re-use the path.\n",
                 path);

-    res = handle_device(d, node, p2mt);
-    if ( res)
-        return res;
+    if ( !pci_bus )
+    {
+        res = handle_device(d, node, p2mt);
+        if ( res)
+           return res;
+
+        pci_bus = dt_device_type_is_equal(node, "pci");
+    }

      /*
       * The property "name" is used to have a different name on older FDT
@@ -1554,7 +1559,7 @@ static int __init handle_node(struct domain *d, 
struct kernel_info *kinfo,

      for ( child = node->child; child != NULL; child = child->sibling )
      {
-        res = handle_node(d, kinfo, child, p2mt);
+        res = handle_node(d, kinfo, child, p2mt, pci_bus);
          if ( res )
              return res;
      }
@@ -2192,7 +2197,7 @@ static int __init prepare_dtb_hwdom(struct domain 
*d, struct kernel_info *kinfo)

      fdt_finish_reservemap(kinfo->fdt);

-    ret = handle_node(d, kinfo, dt_host, default_p2mt);
+    ret = handle_node(d, kinfo, dt_host, default_p2mt, false);
      if ( ret )
          goto err;

A less hackish possibility would be to modify dt_number_of_address() and 
return 0 when the device is a child of a PCI below.

Stefano, do you have any opinions?

Cheers,
Stefano Stabellini Feb. 3, 2021, 12:18 a.m. UTC | #3
On Tue, 2 Feb 2021, Julien Grall wrote:
> On 02/02/2021 18:12, Julien Grall wrote:
> > On 02/02/2021 17:47, Elliott Mitchell wrote:
> > > The handle_device() function has been returning failure upon
> > > encountering a device address which was invalid.  A device tree which
> > > had such an entry has now been seen in the wild.  As it causes no
> > > failures to simply ignore the entries, ignore them. >
> > > Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>
> > > 
> > > ---
> > > I'm starting to suspect there are an awful lot of places in the various
> > > domain_build.c files which should simply ignore errors.  This is now the
> > > second place I've encountered in 2 months where ignoring errors was the
> > > correct action.
> > 
> > Right, as a counterpoint, we run Xen on Arm HW for several years now and
> > this is the first time I heard about issue parsing the DT. So while I
> > appreciate that you are eager to run Xen on the RPI...
> > 
> > >  I know failing in case of error is an engineer's
> > > favorite approach, but there seem an awful lot of harmless failures
> > > causing panics.
> > > 
> > > This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
> > > empty memory bank".  Now it seems clear the correct approach is to simply
> > > ignore these entries.
> > 
> > ... we first need to fully understand the issues. Here a few questions:
> >     1) Can you provide more information why you believe the address is
> > invalid?
> >     2) How does Linux use the node?
> >     3) Is it happening with all the RPI DT? If not, what are the
> > differences?
> 
> So I had another look at the device-tree you provided earlier on. The node is
> the following (copied directly from the DTS):
> 
> &pcie0 {
>         pci@1,0 {
>                 #address-cells = <3>;
>                 #size-cells = <2>;
>                 ranges;
> 
>                 reg = <0 0 0 0 0>;
> 
>                 usb@1,0 {
>                         reg = <0x10000 0 0 0 0>;
>                         resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>                 };
>         };
> };
> 
> pcie0: pcie@7d500000 {
>    compatible = "brcm,bcm2711-pcie";
>    reg = <0x0 0x7d500000  0x0 0x9310>;
>    device_type = "pci";
>    #address-cells = <3>;
>    #interrupt-cells = <1>;
>    #size-cells = <2>;
>    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
>                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
>    interrupt-names = "pcie", "msi";
>    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
>    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
>                                                      IRQ_TYPE_LEVEL_HIGH>;
>    msi-controller;
>    msi-parent = <&pcie0>;
> 
>    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
>              0x0 0x40000000>;
>    /*
>     * The wrapper around the PCIe block has a bug
>     * preventing it from accessing beyond the first 3GB of
>     * memory.
>     */
>    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
>                  0x0 0xc0000000>;
>    brcm,enable-ssc;
> };
> 
> The interpretation of "reg" depends on the context. In this case, we are
> trying to interpret as a memory address from the CPU PoV when it has a
> different meaning (I am not exactly sure what).
> 
> In fact, you are lucky that Xen doesn't manage to interpret it. Xen should
> really stop trying to look region to map when it discover a PCI bus. I wrote a
> quick hack patch that should ignore it:

Yes, I think you are right. There are a few instances where "reg" is not
a address ready to be remapped. It is not just PCI, although that's the
most common.  Maybe we need a list, like skip_matches in handle_node.


> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 374bf655ee34..937fd1e387b7 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1426,7 +1426,7 @@ static int __init handle_device(struct domain *d, struct
> dt_device_node *dev,
> 
>  static int __init handle_node(struct domain *d, struct kernel_info *kinfo,
>                                struct dt_device_node *node,
> -                              p2m_type_t p2mt)
> +                              p2m_type_t p2mt, bool pci_bus)
>  {
>      static const struct dt_device_match skip_matches[] __initconst =
>      {
> @@ -1532,9 +1532,14 @@ static int __init handle_node(struct domain *d, struct
> kernel_info *kinfo,
>                 "WARNING: Path %s is reserved, skip the node as we may re-use
> the path.\n",
>                 path);
> 
> -    res = handle_device(d, node, p2mt);
> -    if ( res)
> -        return res;
> +    if ( !pci_bus )
> +    {
> +        res = handle_device(d, node, p2mt);
> +        if ( res)
> +           return res;
> +
> +        pci_bus = dt_device_type_is_equal(node, "pci");
> +    }
> 
>      /*
>       * The property "name" is used to have a different name on older FDT
> @@ -1554,7 +1559,7 @@ static int __init handle_node(struct domain *d, struct
> kernel_info *kinfo,
> 
>      for ( child = node->child; child != NULL; child = child->sibling )
>      {
> -        res = handle_node(d, kinfo, child, p2mt);
> +        res = handle_node(d, kinfo, child, p2mt, pci_bus);
>          if ( res )
>              return res;
>      }
> @@ -2192,7 +2197,7 @@ static int __init prepare_dtb_hwdom(struct domain *d,
> struct kernel_info *kinfo)
> 
>      fdt_finish_reservemap(kinfo->fdt);
> 
> -    ret = handle_node(d, kinfo, dt_host, default_p2mt);
> +    ret = handle_node(d, kinfo, dt_host, default_p2mt, false);
>      if ( ret )
>          goto err;
> 
> A less hackish possibility would be to modify dt_number_of_address() and
> return 0 when the device is a child of a PCI below.
> 
> Stefano, do you have any opinions?

Would PCIe even work today? Because if it doesn't, we could just add it
to skip_matches until we get PCI passthrough properly supported.

But aside from PCIe, let's say that we know of a few nodes for which
"reg" needs a special treatment. I am not sure it makes sense to proceed
with parsing those nodes without knowing how to deal with that. So maybe
we should add those nodes to skip_matches until we know what to do with
them. At that point, I would imagine we would introduce a special
handle_device function that knows what to do. In the case of PCIe,
something like "handle_device_pcie".
Julien Grall Feb. 3, 2021, 5:44 p.m. UTC | #4
On 03/02/2021 00:18, Stefano Stabellini wrote:
> On Tue, 2 Feb 2021, Julien Grall wrote:
>> On 02/02/2021 18:12, Julien Grall wrote:
>>> On 02/02/2021 17:47, Elliott Mitchell wrote:
>>>> The handle_device() function has been returning failure upon
>>>> encountering a device address which was invalid.  A device tree which
>>>> had such an entry has now been seen in the wild.  As it causes no
>>>> failures to simply ignore the entries, ignore them. >
>>>> Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>
>>>>
>>>> ---
>>>> I'm starting to suspect there are an awful lot of places in the various
>>>> domain_build.c files which should simply ignore errors.  This is now the
>>>> second place I've encountered in 2 months where ignoring errors was the
>>>> correct action.
>>>
>>> Right, as a counterpoint, we run Xen on Arm HW for several years now and
>>> this is the first time I heard about issue parsing the DT. So while I
>>> appreciate that you are eager to run Xen on the RPI...
>>>
>>>>   I know failing in case of error is an engineer's
>>>> favorite approach, but there seem an awful lot of harmless failures
>>>> causing panics.
>>>>
>>>> This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
>>>> empty memory bank".  Now it seems clear the correct approach is to simply
>>>> ignore these entries.
>>>
>>> ... we first need to fully understand the issues. Here a few questions:
>>>      1) Can you provide more information why you believe the address is
>>> invalid?
>>>      2) How does Linux use the node?
>>>      3) Is it happening with all the RPI DT? If not, what are the
>>> differences?
>>
>> So I had another look at the device-tree you provided earlier on. The node is
>> the following (copied directly from the DTS):
>>
>> &pcie0 {
>>          pci@1,0 {
>>                  #address-cells = <3>;
>>                  #size-cells = <2>;
>>                  ranges;
>>
>>                  reg = <0 0 0 0 0>;
>>
>>                  usb@1,0 {
>>                          reg = <0x10000 0 0 0 0>;
>>                          resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>>                  };
>>          };
>> };
>>
>> pcie0: pcie@7d500000 {
>>     compatible = "brcm,bcm2711-pcie";
>>     reg = <0x0 0x7d500000  0x0 0x9310>;
>>     device_type = "pci";
>>     #address-cells = <3>;
>>     #interrupt-cells = <1>;
>>     #size-cells = <2>;
>>     interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
>>                  <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
>>     interrupt-names = "pcie", "msi";
>>     interrupt-map-mask = <0x0 0x0 0x0 0x7>;
>>     interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
>>                                                       IRQ_TYPE_LEVEL_HIGH>;
>>     msi-controller;
>>     msi-parent = <&pcie0>;
>>
>>     ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
>>               0x0 0x40000000>;
>>     /*
>>      * The wrapper around the PCIe block has a bug
>>      * preventing it from accessing beyond the first 3GB of
>>      * memory.
>>      */
>>     dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
>>                   0x0 0xc0000000>;
>>     brcm,enable-ssc;
>> };
>>
>> The interpretation of "reg" depends on the context. In this case, we are
>> trying to interpret as a memory address from the CPU PoV when it has a
>> different meaning (I am not exactly sure what).
>>
>> In fact, you are lucky that Xen doesn't manage to interpret it. Xen should
>> really stop trying to look region to map when it discover a PCI bus. I wrote a
>> quick hack patch that should ignore it:
> 
> Yes, I think you are right. There are a few instances where "reg" is not
> a address ready to be remapped. It is not just PCI, although that's the
> most common.  Maybe we need a list, like skip_matches in handle_node.

 From my understanding, "reg" can be considered as an MMIO region only 
if all the *parents up to the root have the property "ranges" and they 
are not on a different bus (e.g. pci).

Do you have example where this is not the case?

Whether Xen does it correctly is another question :).

> 
> 
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 374bf655ee34..937fd1e387b7 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -1426,7 +1426,7 @@ static int __init handle_device(struct domain *d, struct
>> dt_device_node *dev,
>>
>>   static int __init handle_node(struct domain *d, struct kernel_info *kinfo,
>>                                 struct dt_device_node *node,
>> -                              p2m_type_t p2mt)
>> +                              p2m_type_t p2mt, bool pci_bus)
>>   {
>>       static const struct dt_device_match skip_matches[] __initconst =
>>       {
>> @@ -1532,9 +1532,14 @@ static int __init handle_node(struct domain *d, struct
>> kernel_info *kinfo,
>>                  "WARNING: Path %s is reserved, skip the node as we may re-use
>> the path.\n",
>>                  path);
>>
>> -    res = handle_device(d, node, p2mt);
>> -    if ( res)
>> -        return res;
>> +    if ( !pci_bus )
>> +    {
>> +        res = handle_device(d, node, p2mt);
>> +        if ( res)
>> +           return res;
>> +
>> +        pci_bus = dt_device_type_is_equal(node, "pci");
>> +    }
>>
>>       /*
>>        * The property "name" is used to have a different name on older FDT
>> @@ -1554,7 +1559,7 @@ static int __init handle_node(struct domain *d, struct
>> kernel_info *kinfo,
>>
>>       for ( child = node->child; child != NULL; child = child->sibling )
>>       {
>> -        res = handle_node(d, kinfo, child, p2mt);
>> +        res = handle_node(d, kinfo, child, p2mt, pci_bus);
>>           if ( res )
>>               return res;
>>       }
>> @@ -2192,7 +2197,7 @@ static int __init prepare_dtb_hwdom(struct domain *d,
>> struct kernel_info *kinfo)
>>
>>       fdt_finish_reservemap(kinfo->fdt);
>>
>> -    ret = handle_node(d, kinfo, dt_host, default_p2mt);
>> +    ret = handle_node(d, kinfo, dt_host, default_p2mt, false);
>>       if ( ret )
>>           goto err;
>>
>> A less hackish possibility would be to modify dt_number_of_address() and
>> return 0 when the device is a child of a PCI below.
>>
>> Stefano, do you have any opinions?
> 
> Would PCIe even work today? Because if it doesn't, we could just add it
> to skip_matches until we get PCI passthrough properly supported.
PCIe (or PCI) definitely works in dom0 today but Xen is not aware of the 
hostbridge. So you would break quite a few uses cases by skipping the nodes.

> 
> But aside from PCIe, let's say that we know of a few nodes for which
> "reg" needs a special treatment. I am not sure it makes sense to proceed
> with parsing those nodes without knowing how to deal with that.

I believe that most of the time the "special" treatment would be to 
ignore the property "regs" as it will not be an CPU memory address.

> So maybe
> we should add those nodes to skip_matches until we know what to do with
> them. At that point, I would imagine we would introduce a special
> handle_device function that knows what to do. In the case of PCIe,
> something like "handle_device_pcie".
Could you outline how "handle_device_pcie()" will differ with handle_node()?

In fact, the problem is not the PCIe node directly. Instead, it is the 
second level of nodes below it (i.e usb@...).

The current implementation of dt_number_of_address() only look at the 
bus type of the parent. As the parent has no bus type and "ranges" then 
it thinks this is something we can translate to a CPU address.

However, this is below a PCI bus so the meaning of "reg" is completely 
different. In this case, we only need to ignore "reg".

Cheers,
Stefano Stabellini Feb. 3, 2021, 10:18 p.m. UTC | #5
On Wed, 3 Feb 2021, Julien Grall wrote:
> On 03/02/2021 00:18, Stefano Stabellini wrote:
> > On Tue, 2 Feb 2021, Julien Grall wrote:
> > > On 02/02/2021 18:12, Julien Grall wrote:
> > > > On 02/02/2021 17:47, Elliott Mitchell wrote:
> > > > > The handle_device() function has been returning failure upon
> > > > > encountering a device address which was invalid.  A device tree which
> > > > > had such an entry has now been seen in the wild.  As it causes no
> > > > > failures to simply ignore the entries, ignore them. >
> > > > > Signed-off-by: Elliott Mitchell <ehem+xenn@m5p.com>
> > > > > 
> > > > > ---
> > > > > I'm starting to suspect there are an awful lot of places in the
> > > > > various
> > > > > domain_build.c files which should simply ignore errors.  This is now
> > > > > the
> > > > > second place I've encountered in 2 months where ignoring errors was
> > > > > the
> > > > > correct action.
> > > > 
> > > > Right, as a counterpoint, we run Xen on Arm HW for several years now and
> > > > this is the first time I heard about issue parsing the DT. So while I
> > > > appreciate that you are eager to run Xen on the RPI...
> > > > 
> > > > >   I know failing in case of error is an engineer's
> > > > > favorite approach, but there seem an awful lot of harmless failures
> > > > > causing panics.
> > > > > 
> > > > > This started as the thread "[RFC PATCH] xen/arm: domain_build: Ignore
> > > > > empty memory bank".  Now it seems clear the correct approach is to
> > > > > simply
> > > > > ignore these entries.
> > > > 
> > > > ... we first need to fully understand the issues. Here a few questions:
> > > >      1) Can you provide more information why you believe the address is
> > > > invalid?
> > > >      2) How does Linux use the node?
> > > >      3) Is it happening with all the RPI DT? If not, what are the
> > > > differences?
> > > 
> > > So I had another look at the device-tree you provided earlier on. The node
> > > is
> > > the following (copied directly from the DTS):
> > > 
> > > &pcie0 {
> > >          pci@1,0 {
> > >                  #address-cells = <3>;
> > >                  #size-cells = <2>;
> > >                  ranges;
> > > 
> > >                  reg = <0 0 0 0 0>;
> > > 
> > >                  usb@1,0 {
> > >                          reg = <0x10000 0 0 0 0>;
> > >                          resets = <&reset
> > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > >                  };
> > >          };
> > > };
> > > 
> > > pcie0: pcie@7d500000 {
> > >     compatible = "brcm,bcm2711-pcie";
> > >     reg = <0x0 0x7d500000  0x0 0x9310>;
> > >     device_type = "pci";
> > >     #address-cells = <3>;
> > >     #interrupt-cells = <1>;
> > >     #size-cells = <2>;
> > >     interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > >                  <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > >     interrupt-names = "pcie", "msi";
> > >     interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > >     interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > >                                                       IRQ_TYPE_LEVEL_HIGH>;
> > >     msi-controller;
> > >     msi-parent = <&pcie0>;
> > > 
> > >     ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > >               0x0 0x40000000>;
> > >     /*
> > >      * The wrapper around the PCIe block has a bug
> > >      * preventing it from accessing beyond the first 3GB of
> > >      * memory.
> > >      */
> > >     dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > >                   0x0 0xc0000000>;
> > >     brcm,enable-ssc;
> > > };
> > > 
> > > The interpretation of "reg" depends on the context. In this case, we are
> > > trying to interpret as a memory address from the CPU PoV when it has a
> > > different meaning (I am not exactly sure what).
> > > 
> > > In fact, you are lucky that Xen doesn't manage to interpret it. Xen should
> > > really stop trying to look region to map when it discover a PCI bus. I
> > > wrote a
> > > quick hack patch that should ignore it:
> > 
> > Yes, I think you are right. There are a few instances where "reg" is not
> > a address ready to be remapped. It is not just PCI, although that's the
> > most common.  Maybe we need a list, like skip_matches in handle_node.
> 
> From my understanding, "reg" can be considered as an MMIO region only if all
> the *parents up to the root have the property "ranges" and they are not on a
> different bus (e.g. pci).
> 
> Do you have example where this is not the case?

The famous one is CPUs. These are some other examples I could find with
a quick search:

			nvmem_firmware {
				compatible = "xlnx,zynqmp-nvmem-fw";
				#address-cells = <0x1>;
				#size-cells = <0x1>;

				soc_revision@0 {
					reg = <0x0 0x4>;
				};
			};

		cci@fd6e0000 {
			compatible = "arm,cci-400";
			reg = <0x0 0xfd6e0000 0x0 0x9000>;
			ranges = <0x0 0x0 0xfd6e0000 0x10000>;
			#address-cells = <0x1>;
			#size-cells = <0x1>;

			pmu@9000 {
				compatible = "arm,cci-400-pmu,r1";
				reg = <0x9000 0x5000>;
				interrupt-parent = <0x4>;
				interrupts = <0x0 0x7b 0x4 0x0 0x7b 0x4 0x0 0x7b 0x4 0x0 0x7b 0x4 0x0 0x7b 0x4>;
			};
		};


		i2c@ff020000 {
			compatible = "cdns,i2c-r1p14";
			status = "okay";
			interrupt-parent = <0x4>;
			interrupts = <0x0 0x11 0x4>;
			reg = <0x0 0xff020000 0x0 0x1000>;
			#address-cells = <0x1>;
			#size-cells = <0x0>;
			clocks = <0x3 0x3d>;
			clock-frequency = <0x61a80>;

			gpio@20 {
				compatible = "ti,tca6416";
				reg = <0x20>;


		ethernet@ff0e0000 {
			compatible = "cdns,zynqmp-gem", "cdns,gem";
			status = "okay";
			interrupt-parent = <0x4>;
			interrupts = <0x0 0x3f 0x4 0x0 0x3f 0x4>;
			reg = <0x0 0xff0e0000 0x0 0x1000>;
			clock-names = "pclk", "hclk", "tx_clk";
			#address-cells = <0x1>;
			#size-cells = <0x0>;
			iommus = <0xd 0x877>;
			clocks = <0xc 0xc 0xc>;
			phy-handle = <0xe>;
			phy-mode = "rgmii-id";
			#stream-id-cells = <0x1>;
			phandle = <0x16>;

			ethernet-phy@c {
				reg = <0xc>;
				ti,rx-internal-delay = <0x8>;
				ti,tx-internal-delay = <0xa>;
				ti,fifo-depth = <0x1>;
				ti,dp83867-rxctrl-strap-quirk;
				phandle = <0xe>;
			};
		};


The rule that *parents up to the root have the property "ranges"* seems
to apply correctly to all these cases. Good.


> Whether Xen does it correctly is another question :).
> 
> > 
> > 
> > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > index 374bf655ee34..937fd1e387b7 100644
> > > --- a/xen/arch/arm/domain_build.c
> > > +++ b/xen/arch/arm/domain_build.c
> > > @@ -1426,7 +1426,7 @@ static int __init handle_device(struct domain *d,
> > > struct
> > > dt_device_node *dev,
> > > 
> > >   static int __init handle_node(struct domain *d, struct kernel_info
> > > *kinfo,
> > >                                 struct dt_device_node *node,
> > > -                              p2m_type_t p2mt)
> > > +                              p2m_type_t p2mt, bool pci_bus)
> > >   {
> > >       static const struct dt_device_match skip_matches[] __initconst =
> > >       {
> > > @@ -1532,9 +1532,14 @@ static int __init handle_node(struct domain *d,
> > > struct
> > > kernel_info *kinfo,
> > >                  "WARNING: Path %s is reserved, skip the node as we may
> > > re-use
> > > the path.\n",
> > >                  path);
> > > 
> > > -    res = handle_device(d, node, p2mt);
> > > -    if ( res)
> > > -        return res;
> > > +    if ( !pci_bus )
> > > +    {
> > > +        res = handle_device(d, node, p2mt);
> > > +        if ( res)
> > > +           return res;
> > > +
> > > +        pci_bus = dt_device_type_is_equal(node, "pci");
> > > +    }
> > > 
> > >       /*
> > >        * The property "name" is used to have a different name on older FDT
> > > @@ -1554,7 +1559,7 @@ static int __init handle_node(struct domain *d,
> > > struct
> > > kernel_info *kinfo,
> > > 
> > >       for ( child = node->child; child != NULL; child = child->sibling )
> > >       {
> > > -        res = handle_node(d, kinfo, child, p2mt);
> > > +        res = handle_node(d, kinfo, child, p2mt, pci_bus);
> > >           if ( res )
> > >               return res;
> > >       }
> > > @@ -2192,7 +2197,7 @@ static int __init prepare_dtb_hwdom(struct domain
> > > *d,
> > > struct kernel_info *kinfo)
> > > 
> > >       fdt_finish_reservemap(kinfo->fdt);
> > > 
> > > -    ret = handle_node(d, kinfo, dt_host, default_p2mt);
> > > +    ret = handle_node(d, kinfo, dt_host, default_p2mt, false);
> > >       if ( ret )
> > >           goto err;
> > > 
> > > A less hackish possibility would be to modify dt_number_of_address() and
> > > return 0 when the device is a child of a PCI below.
> > > 
> > > Stefano, do you have any opinions?
> > 
> > Would PCIe even work today? Because if it doesn't, we could just add it
> > to skip_matches until we get PCI passthrough properly supported.
> PCIe (or PCI) definitely works in dom0 today but Xen is not aware of the
> hostbridge. So you would break quite a few uses cases by skipping the nodes.

Never mind my suggestion


> > But aside from PCIe, let's say that we know of a few nodes for which
> > "reg" needs a special treatment. I am not sure it makes sense to proceed
> > with parsing those nodes without knowing how to deal with that.
> 
> I believe that most of the time the "special" treatment would be to ignore the
> property "regs" as it will not be an CPU memory address.
> 
> > So maybe
> > we should add those nodes to skip_matches until we know what to do with
> > them. At that point, I would imagine we would introduce a special
> > handle_device function that knows what to do. In the case of PCIe,
> > something like "handle_device_pcie".
> Could you outline how "handle_device_pcie()" will differ with handle_node()?
> 
> In fact, the problem is not the PCIe node directly. Instead, it is the second
> level of nodes below it (i.e usb@...).
> 
> The current implementation of dt_number_of_address() only look at the bus type
> of the parent. As the parent has no bus type and "ranges" then it thinks this
> is something we can translate to a CPU address.
> 
> However, this is below a PCI bus so the meaning of "reg" is completely
> different. In this case, we only need to ignore "reg".

I see what you are saying and I agree: if we had to introduce a special
case for PCI, then  dt_number_of_address() seems to be a good place.  In
fact, we already have special PCI handling, see our
__dt_translate_address function and xen/common/device_tree.c:dt_busses.

Which brings the question: why is this actually failing?

pcie0 {
     ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;

Which means that PCI addresses 0xc0000000-0x100000000 become 0x600000000-0x700000000.

The offending DT is:

&pcie0 {
         pci@1,0 {
                 #address-cells = <3>;
                 #size-cells = <2>;
                 ranges;

                 reg = <0 0 0 0 0>;

                 usb@1,0 {
                         reg = <0x10000 0 0 0 0>;
                         resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
                 };
         };
};


reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
However, the rest of the regs cells are left as zero. It shouldn't be an
issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus. So
in theory dt_number_of_address() should already return 0 for it.

Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
add a check to skip 0 size ranges. Just a hack to explain what I mean:

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 18825e333e..236b30675b 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -866,6 +866,9 @@ int dt_device_get_address(const struct dt_device_node *dev, unsigned int index,
     unsigned int flags;
 
     addrp = dt_get_address(dev, index, size, &flags);
+    if ( *size == 0 )
+        return 0;
+
     if ( addrp == NULL )
         return -EINVAL;
Julien Grall Feb. 3, 2021, 10:52 p.m. UTC | #6
On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > > But aside from PCIe, let's say that we know of a few nodes for which
> > > "reg" needs a special treatment. I am not sure it makes sense to proceed
> > > with parsing those nodes without knowing how to deal with that.
> >
> > I believe that most of the time the "special" treatment would be to ignore the
> > property "regs" as it will not be an CPU memory address.
> >
> > > So maybe
> > > we should add those nodes to skip_matches until we know what to do with
> > > them. At that point, I would imagine we would introduce a special
> > > handle_device function that knows what to do. In the case of PCIe,
> > > something like "handle_device_pcie".
> > Could you outline how "handle_device_pcie()" will differ with handle_node()?
> >
> > In fact, the problem is not the PCIe node directly. Instead, it is the second
> > level of nodes below it (i.e usb@...).
> >
> > The current implementation of dt_number_of_address() only look at the bus type
> > of the parent. As the parent has no bus type and "ranges" then it thinks this
> > is something we can translate to a CPU address.
> >
> > However, this is below a PCI bus so the meaning of "reg" is completely
> > different. In this case, we only need to ignore "reg".
>
> I see what you are saying and I agree: if we had to introduce a special
> case for PCI, then  dt_number_of_address() seems to be a good place.  In
> fact, we already have special PCI handling, see our
> __dt_translate_address function and xen/common/device_tree.c:dt_busses.
>
> Which brings the question: why is this actually failing?

I already hinted at the reason in my previous e-mail :). Let me expand
a bit more.

>
> pcie0 {
>      ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;
>
> Which means that PCI addresses 0xc0000000-0x100000000 become 0x600000000-0x700000000.
>
> The offending DT is:
>
> &pcie0 {
>          pci@1,0 {
>                  #address-cells = <3>;
>                  #size-cells = <2>;
>                  ranges;
>
>                  reg = <0 0 0 0 0>;
>
>                  usb@1,0 {
>                          reg = <0x10000 0 0 0 0>;
>                          resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>                  };
>          };
> };
>
>
> reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> However, the rest of the regs cells are left as zero. It shouldn't be an
> issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.

The property "ranges" is used to define a mapping or translation
between the address space of the "bus" (here pci@1,0) and the address
space of the bus node's parent (&pcie0).
IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).

The problem is dt_number_of_address() will only look at the "bus" type
of the parent using dt_match_bus(). This will return the default bus
(see dt_bus_default_match()), because this is a property "ranges" in
the parent node (i.e. pci@1,0). Therefore...

> So
> in theory dt_number_of_address() should already return 0 for it.

... dt_number_of_address() will return 1 even if the address is not a
CPU address. So when Xen will try to translate it, it will fail.

>
> Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> add a check to skip 0 size ranges. Just a hack to explain what I mean:

The parent of pci@1,0 is a PCI bridge (see the property type), so the
CPU addresses are found not via "regs" but "assigned-addresses".

In this situation, "regs" will have a different meaning and therefore
there is no promise that the size will be 0.

Cheers,
Stefano Stabellini Feb. 4, 2021, 12:13 a.m. UTC | #7
On Wed, 3 Feb 2021, Julien Grall wrote:
> On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > > > But aside from PCIe, let's say that we know of a few nodes for which
> > > > "reg" needs a special treatment. I am not sure it makes sense to proceed
> > > > with parsing those nodes without knowing how to deal with that.
> > >
> > > I believe that most of the time the "special" treatment would be to ignore the
> > > property "regs" as it will not be an CPU memory address.
> > >
> > > > So maybe
> > > > we should add those nodes to skip_matches until we know what to do with
> > > > them. At that point, I would imagine we would introduce a special
> > > > handle_device function that knows what to do. In the case of PCIe,
> > > > something like "handle_device_pcie".
> > > Could you outline how "handle_device_pcie()" will differ with handle_node()?
> > >
> > > In fact, the problem is not the PCIe node directly. Instead, it is the second
> > > level of nodes below it (i.e usb@...).
> > >
> > > The current implementation of dt_number_of_address() only look at the bus type
> > > of the parent. As the parent has no bus type and "ranges" then it thinks this
> > > is something we can translate to a CPU address.
> > >
> > > However, this is below a PCI bus so the meaning of "reg" is completely
> > > different. In this case, we only need to ignore "reg".
> >
> > I see what you are saying and I agree: if we had to introduce a special
> > case for PCI, then  dt_number_of_address() seems to be a good place.  In
> > fact, we already have special PCI handling, see our
> > __dt_translate_address function and xen/common/device_tree.c:dt_busses.
> >
> > Which brings the question: why is this actually failing?
> 
> I already hinted at the reason in my previous e-mail :). Let me expand
> a bit more.
> 
> >
> > pcie0 {
> >      ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;
> >
> > Which means that PCI addresses 0xc0000000-0x100000000 become 0x600000000-0x700000000.
> >
> > The offending DT is:
> >
> > &pcie0 {
> >          pci@1,0 {
> >                  #address-cells = <3>;
> >                  #size-cells = <2>;
> >                  ranges;
> >
> >                  reg = <0 0 0 0 0>;
> >
> >                  usb@1,0 {
> >                          reg = <0x10000 0 0 0 0>;
> >                          resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> >                  };
> >          };
> > };
> >
> >
> > reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> > However, the rest of the regs cells are left as zero. It shouldn't be an
> > issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
> 
> The property "ranges" is used to define a mapping or translation
> between the address space of the "bus" (here pci@1,0) and the address
> space of the bus node's parent (&pcie0).
> IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
> 
> The problem is dt_number_of_address() will only look at the "bus" type
> of the parent using dt_match_bus(). This will return the default bus
> (see dt_bus_default_match()), because this is a property "ranges" in
> the parent node (i.e. pci@1,0). Therefore...
> 
> > So
> > in theory dt_number_of_address() should already return 0 for it.
> 
> ... dt_number_of_address() will return 1 even if the address is not a
> CPU address. So when Xen will try to translate it, it will fail.
> 
> >
> > Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> > add a check to skip 0 size ranges. Just a hack to explain what I mean:
> 
> The parent of pci@1,0 is a PCI bridge (see the property type), so the
> CPU addresses are found not via "regs" but "assigned-addresses".
> 
> In this situation, "regs" will have a different meaning and therefore
> there is no promise that the size will be 0.

I copy/pasted the following:

       pci@1,0 {
               #address-cells = <3>;
               #size-cells = <2>;
               ranges;

               reg = <0 0 0 0 0>;

               usb@1,0 {
                       reg = <0x10000 0 0 0 0>;
                       resets = <&reset
                       RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
               };
       };

under pcie0 in my DTS to see what happens (the node is not there in the
device tree for the rpi-5.9.y kernel.) It results in the expected error:

(XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
(XEN) Device tree generation failed (-22).

I could verify that pci@1,0 is seen as "default" bus due to the range
property, thus dt_number_of_address() returns 1.


I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
given that the parent is a PCI bus. assigned-addresses is the one that
is read.


But from a device tree perspective I am actually confused by the
presence of the "ranges" property under pci@1,0. Is that correct? It is
stating that addresses of children devices will be translated to the
address space of the parent (pcie0) using the parent translation rules.
I mean -- it looks like Xen is right in trying to translate reg =
<0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
0x00000000 0x0 0x40000000>.

Or maybe since pcie0 is a PCI bus all the children addresses, even
grand-children, are expected to be specified using "assigned-addresses"?


Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
missing device_type = "pci"?  Of course, if I add that, the error
disappear.

[1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
[2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt

For the sake of making Xen more resilient to possible DTSes, maybe we
should try to extend the dt_bus_pci_match check? See for instance the
change below, but we might be able to come up with better ideas.


diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 18825e333e..24d998f725 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const __be32 *addr)
 
 static bool_t dt_bus_pci_match(const struct dt_device_node *np)
 {
+    bool ret = false;
+
     /*
      * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen PCI
      * powermacs "ht" is hypertransport
      */
-    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
+    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
         !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
+    
+    if ( ret ) return ret;
+
+    if ( !strcmp(np->name, "pci") )
+        ret = dt_bus_pci_match(dt_get_parent(np));
+
+    return ret;
 }
 
 static void dt_bus_pci_count_cells(const struct dt_device_node *np,
Julien Grall Feb. 4, 2021, 4:29 p.m. UTC | #8
On 04/02/2021 00:13, Stefano Stabellini wrote:
> On Wed, 3 Feb 2021, Julien Grall wrote:
>> On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>>> But aside from PCIe, let's say that we know of a few nodes for which
>>>>> "reg" needs a special treatment. I am not sure it makes sense to proceed
>>>>> with parsing those nodes without knowing how to deal with that.
>>>>
>>>> I believe that most of the time the "special" treatment would be to ignore the
>>>> property "regs" as it will not be an CPU memory address.
>>>>
>>>>> So maybe
>>>>> we should add those nodes to skip_matches until we know what to do with
>>>>> them. At that point, I would imagine we would introduce a special
>>>>> handle_device function that knows what to do. In the case of PCIe,
>>>>> something like "handle_device_pcie".
>>>> Could you outline how "handle_device_pcie()" will differ with handle_node()?
>>>>
>>>> In fact, the problem is not the PCIe node directly. Instead, it is the second
>>>> level of nodes below it (i.e usb@...).
>>>>
>>>> The current implementation of dt_number_of_address() only look at the bus type
>>>> of the parent. As the parent has no bus type and "ranges" then it thinks this
>>>> is something we can translate to a CPU address.
>>>>
>>>> However, this is below a PCI bus so the meaning of "reg" is completely
>>>> different. In this case, we only need to ignore "reg".
>>>
>>> I see what you are saying and I agree: if we had to introduce a special
>>> case for PCI, then  dt_number_of_address() seems to be a good place.  In
>>> fact, we already have special PCI handling, see our
>>> __dt_translate_address function and xen/common/device_tree.c:dt_busses.
>>>
>>> Which brings the question: why is this actually failing?
>>
>> I already hinted at the reason in my previous e-mail :). Let me expand
>> a bit more.
>>
>>>
>>> pcie0 {
>>>       ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;
>>>
>>> Which means that PCI addresses 0xc0000000-0x100000000 become 0x600000000-0x700000000.
>>>
>>> The offending DT is:
>>>
>>> &pcie0 {
>>>           pci@1,0 {
>>>                   #address-cells = <3>;
>>>                   #size-cells = <2>;
>>>                   ranges;
>>>
>>>                   reg = <0 0 0 0 0>;
>>>
>>>                   usb@1,0 {
>>>                           reg = <0x10000 0 0 0 0>;
>>>                           resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>>>                   };
>>>           };
>>> };
>>>
>>>
>>> reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
>>> However, the rest of the regs cells are left as zero. It shouldn't be an
>>> issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
>>
>> The property "ranges" is used to define a mapping or translation
>> between the address space of the "bus" (here pci@1,0) and the address
>> space of the bus node's parent (&pcie0).
>> IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
>>
>> The problem is dt_number_of_address() will only look at the "bus" type
>> of the parent using dt_match_bus(). This will return the default bus
>> (see dt_bus_default_match()), because this is a property "ranges" in
>> the parent node (i.e. pci@1,0). Therefore...
>>
>>> So
>>> in theory dt_number_of_address() should already return 0 for it.
>>
>> ... dt_number_of_address() will return 1 even if the address is not a
>> CPU address. So when Xen will try to translate it, it will fail.
>>
>>>
>>> Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
>>> add a check to skip 0 size ranges. Just a hack to explain what I mean:
>>
>> The parent of pci@1,0 is a PCI bridge (see the property type), so the
>> CPU addresses are found not via "regs" but "assigned-addresses".
>>
>> In this situation, "regs" will have a different meaning and therefore
>> there is no promise that the size will be 0.
> 
> I copy/pasted the following:
> 
>         pci@1,0 {
>                 #address-cells = <3>;
>                 #size-cells = <2>;
>                 ranges;
> 
>                 reg = <0 0 0 0 0>;
> 
>                 usb@1,0 {
>                         reg = <0x10000 0 0 0 0>;
>                         resets = <&reset
>                         RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>                 };
>         };
> 
> under pcie0 in my DTS to see what happens (the node is not there in the
> device tree for the rpi-5.9.y kernel.) It results in the expected error:
> 
> (XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
> (XEN) Device tree generation failed (-22).
> 
> I could verify that pci@1,0 is seen as "default" bus due to the range
> property, thus dt_number_of_address() returns 1.
> 
> 
> I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
> given that the parent is a PCI bus. assigned-addresses is the one that
> is read.
> 
> 
> But from a device tree perspective I am actually confused by the
> presence of the "ranges" property under pci@1,0. Is that correct? It is
> stating that addresses of children devices will be translated to the
> address space of the parent (pcie0) using the parent translation rules.
> I mean -- it looks like Xen is right in trying to translate reg =
> <0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
> 0x00000000 0x0 0x40000000>.
> 
> Or maybe since pcie0 is a PCI bus all the children addresses, even
> grand-children, are expected to be specified using "assigned-addresses"?
> 
> 
> Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
> missing device_type = "pci"?  Of course, if I add that, the error
> disappear.

I am afraid, I don't know the answer. I think it would be best to ask 
the Linux DT folks about it.

> 
> [1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
> [2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
> 
> For the sake of making Xen more resilient to possible DTSes, maybe we
> should try to extend the dt_bus_pci_match check? See for instance the
> change below, but we might be able to come up with better ideas.
> 
> 
> diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> index 18825e333e..24d998f725 100644
> --- a/xen/common/device_tree.c
> +++ b/xen/common/device_tree.c
> @@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const __be32 *addr)
>   
>   static bool_t dt_bus_pci_match(const struct dt_device_node *np)
>   {
> +    bool ret = false;
> +
>       /*
>        * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen PCI
>        * powermacs "ht" is hypertransport
>        */
> -    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> +    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
>           !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
> +
> +    if ( ret ) return ret;
> +
> +    if ( !strcmp(np->name, "pci") )
> +        ret = dt_bus_pci_match(dt_get_parent(np));

It is probably safe to assume that a PCI device (not hostbridge) will 
start with "pci". Although, I don't much like the idea because the name 
is not meant to be stable.

AFAICT, we can only rely on "compatible" and "type".

Cheers,
Stefano Stabellini Feb. 4, 2021, 5:56 p.m. UTC | #9
Hi Rob,

We have a question on the PCIe device tree bindings. In summary, we have
come across the Raspberry Pi 4 PCIe description below:


pcie0: pcie@7d500000 {
   compatible = "brcm,bcm2711-pcie";
   reg = <0x0 0x7d500000  0x0 0x9310>;
   device_type = "pci";
   #address-cells = <3>;
   #interrupt-cells = <1>;
   #size-cells = <2>;
   interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
                <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
   interrupt-names = "pcie", "msi";
   interrupt-map-mask = <0x0 0x0 0x0 0x7>;
   interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
                                                     IRQ_TYPE_LEVEL_HIGH>;
   msi-controller;
   msi-parent = <&pcie0>;

   ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
             0x0 0x40000000>;
   /*
    * The wrapper around the PCIe block has a bug
    * preventing it from accessing beyond the first 3GB of
    * memory.
    */
   dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
                 0x0 0xc0000000>;
   brcm,enable-ssc;

   pci@1,0 {
           #address-cells = <3>;
           #size-cells = <2>;
           ranges;

           reg = <0 0 0 0 0>;

           usb@1,0 {
                   reg = <0x10000 0 0 0 0>;
                   resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
           };
   };
};


Xen fails to parse it with an error because it tries to remap reg =
<0x10000 0 0 0 0> as if it was a CPU address and of course it fails.

Reading the device tree description in details, I cannot tell if Xen has
a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
like a default bus (not a PCI bus), hence, the children regs are
translated using the ranges property of the parent (pcie@7d500000).

Is it possible that the device tree is missing device_type =
"pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
pcie@7d500000?

I'd like to make Xen able to parse this device tree without errors but I
am not sure what is the best way to fix it.

Thanks for any help you can provide!

Cheers,

Stefano



On Thu, 4 Feb 2021, Julien Grall wrote:
> On 04/02/2021 00:13, Stefano Stabellini wrote:
> > On Wed, 3 Feb 2021, Julien Grall wrote:
> > > On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org>
> > > wrote:
> > > > > > But aside from PCIe, let's say that we know of a few nodes for which
> > > > > > "reg" needs a special treatment. I am not sure it makes sense to
> > > > > > proceed
> > > > > > with parsing those nodes without knowing how to deal with that.
> > > > > 
> > > > > I believe that most of the time the "special" treatment would be to
> > > > > ignore the
> > > > > property "regs" as it will not be an CPU memory address.
> > > > > 
> > > > > > So maybe
> > > > > > we should add those nodes to skip_matches until we know what to do
> > > > > > with
> > > > > > them. At that point, I would imagine we would introduce a special
> > > > > > handle_device function that knows what to do. In the case of PCIe,
> > > > > > something like "handle_device_pcie".
> > > > > Could you outline how "handle_device_pcie()" will differ with
> > > > > handle_node()?
> > > > > 
> > > > > In fact, the problem is not the PCIe node directly. Instead, it is the
> > > > > second
> > > > > level of nodes below it (i.e usb@...).
> > > > > 
> > > > > The current implementation of dt_number_of_address() only look at the
> > > > > bus type
> > > > > of the parent. As the parent has no bus type and "ranges" then it
> > > > > thinks this
> > > > > is something we can translate to a CPU address.
> > > > > 
> > > > > However, this is below a PCI bus so the meaning of "reg" is completely
> > > > > different. In this case, we only need to ignore "reg".
> > > > 
> > > > I see what you are saying and I agree: if we had to introduce a special
> > > > case for PCI, then  dt_number_of_address() seems to be a good place.  In
> > > > fact, we already have special PCI handling, see our
> > > > __dt_translate_address function and xen/common/device_tree.c:dt_busses.
> > > > 
> > > > Which brings the question: why is this actually failing?
> > > 
> > > I already hinted at the reason in my previous e-mail :). Let me expand
> > > a bit more.
> > > 
> > > > 
> > > > pcie0 {
> > > >       ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0
> > > > 0x40000000>;
> > > > 
> > > > Which means that PCI addresses 0xc0000000-0x100000000 become
> > > > 0x600000000-0x700000000.
> > > > 
> > > > The offending DT is:
> > > > 
> > > > &pcie0 {
> > > >           pci@1,0 {
> > > >                   #address-cells = <3>;
> > > >                   #size-cells = <2>;
> > > >                   ranges;
> > > > 
> > > >                   reg = <0 0 0 0 0>;
> > > > 
> > > >                   usb@1,0 {
> > > >                           reg = <0x10000 0 0 0 0>;
> > > >                           resets = <&reset
> > > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > >                   };
> > > >           };
> > > > };
> > > > 
> > > > 
> > > > reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> > > > However, the rest of the regs cells are left as zero. It shouldn't be an
> > > > issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
> > > 
> > > The property "ranges" is used to define a mapping or translation
> > > between the address space of the "bus" (here pci@1,0) and the address
> > > space of the bus node's parent (&pcie0).
> > > IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
> > > 
> > > The problem is dt_number_of_address() will only look at the "bus" type
> > > of the parent using dt_match_bus(). This will return the default bus
> > > (see dt_bus_default_match()), because this is a property "ranges" in
> > > the parent node (i.e. pci@1,0). Therefore...
> > > 
> > > > So
> > > > in theory dt_number_of_address() should already return 0 for it.
> > > 
> > > ... dt_number_of_address() will return 1 even if the address is not a
> > > CPU address. So when Xen will try to translate it, it will fail.
> > > 
> > > > 
> > > > Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> > > > add a check to skip 0 size ranges. Just a hack to explain what I mean:
> > > 
> > > The parent of pci@1,0 is a PCI bridge (see the property type), so the
> > > CPU addresses are found not via "regs" but "assigned-addresses".
> > > 
> > > In this situation, "regs" will have a different meaning and therefore
> > > there is no promise that the size will be 0.
> > 
> > I copy/pasted the following:
> > 
> >         pci@1,0 {
> >                 #address-cells = <3>;
> >                 #size-cells = <2>;
> >                 ranges;
> > 
> >                 reg = <0 0 0 0 0>;
> > 
> >                 usb@1,0 {
> >                         reg = <0x10000 0 0 0 0>;
> >                         resets = <&reset
> >                         RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> >                 };
> >         };
> > 
> > under pcie0 in my DTS to see what happens (the node is not there in the
> > device tree for the rpi-5.9.y kernel.) It results in the expected error:
> > 
> > (XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
> > (XEN) Device tree generation failed (-22).
> > 
> > I could verify that pci@1,0 is seen as "default" bus due to the range
> > property, thus dt_number_of_address() returns 1.
> > 
> > 
> > I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
> > given that the parent is a PCI bus. assigned-addresses is the one that
> > is read.
> > 
> > 
> > But from a device tree perspective I am actually confused by the
> > presence of the "ranges" property under pci@1,0. Is that correct? It is
> > stating that addresses of children devices will be translated to the
> > address space of the parent (pcie0) using the parent translation rules.
> > I mean -- it looks like Xen is right in trying to translate reg =
> > <0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
> > 0x00000000 0x0 0x40000000>.
> > 
> > Or maybe since pcie0 is a PCI bus all the children addresses, even
> > grand-children, are expected to be specified using "assigned-addresses"?
> > 
> > 
> > Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
> > missing device_type = "pci"?  Of course, if I add that, the error
> > disappear.
> 
> I am afraid, I don't know the answer. I think it would be best to ask the
> Linux DT folks about it.
> 
> > 
> > [1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
> > [2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
> > 
> > For the sake of making Xen more resilient to possible DTSes, maybe we
> > should try to extend the dt_bus_pci_match check? See for instance the
> > change below, but we might be able to come up with better ideas.
> > 
> > 
> > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > index 18825e333e..24d998f725 100644
> > --- a/xen/common/device_tree.c
> > +++ b/xen/common/device_tree.c
> > @@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const
> > __be32 *addr)
> >     static bool_t dt_bus_pci_match(const struct dt_device_node *np)
> >   {
> > +    bool ret = false;
> > +
> >       /*
> >        * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen
> > PCI
> >        * powermacs "ht" is hypertransport
> >        */
> > -    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > +    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> >           !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
> > +
> > +    if ( ret ) return ret;
> > +
> > +    if ( !strcmp(np->name, "pci") )
> > +        ret = dt_bus_pci_match(dt_get_parent(np));
> 
> It is probably safe to assume that a PCI device (not hostbridge) will start
> with "pci". Although, I don't much like the idea because the name is not meant
> to be stable.
> 
> AFAICT, we can only rely on "compatible" and "type".
> 
> Cheers,
> 
> -- 
> Julien Grall
>
Rob Herring Feb. 4, 2021, 6:31 p.m. UTC | #10
On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> Hi Rob,
>
> We have a question on the PCIe device tree bindings. In summary, we have
> come across the Raspberry Pi 4 PCIe description below:
>
>
> pcie0: pcie@7d500000 {
>    compatible = "brcm,bcm2711-pcie";
>    reg = <0x0 0x7d500000  0x0 0x9310>;
>    device_type = "pci";
>    #address-cells = <3>;
>    #interrupt-cells = <1>;
>    #size-cells = <2>;
>    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
>                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
>    interrupt-names = "pcie", "msi";
>    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
>    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
>                                                      IRQ_TYPE_LEVEL_HIGH>;
>    msi-controller;
>    msi-parent = <&pcie0>;
>
>    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
>              0x0 0x40000000>;
>    /*
>     * The wrapper around the PCIe block has a bug
>     * preventing it from accessing beyond the first 3GB of
>     * memory.
>     */
>    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
>                  0x0 0xc0000000>;
>    brcm,enable-ssc;
>
>    pci@1,0 {
>            #address-cells = <3>;
>            #size-cells = <2>;
>            ranges;
>
>            reg = <0 0 0 0 0>;
>
>            usb@1,0 {
>                    reg = <0x10000 0 0 0 0>;
>                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
>            };
>    };
> };
>
>
> Xen fails to parse it with an error because it tries to remap reg =
> <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
>
> Reading the device tree description in details, I cannot tell if Xen has
> a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> like a default bus (not a PCI bus), hence, the children regs are
> translated using the ranges property of the parent (pcie@7d500000).
>
> Is it possible that the device tree is missing device_type =
> "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> pcie@7d500000?

Indeed, it should have device_type. Linux (only recently due to
another missing device_type case) will also look at node name, but
only 'pcie'.

We should be able to create (or extend pci-bus.yaml) a schema to catch
this case.

Rob

> I'd like to make Xen able to parse this device tree without errors but I
> am not sure what is the best way to fix it.
>
> Thanks for any help you can provide!
>
> Cheers,
>
> Stefano
>
>
>
> On Thu, 4 Feb 2021, Julien Grall wrote:
> > On 04/02/2021 00:13, Stefano Stabellini wrote:
> > > On Wed, 3 Feb 2021, Julien Grall wrote:
> > > > On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org>
> > > > wrote:
> > > > > > > But aside from PCIe, let's say that we know of a few nodes for which
> > > > > > > "reg" needs a special treatment. I am not sure it makes sense to
> > > > > > > proceed
> > > > > > > with parsing those nodes without knowing how to deal with that.
> > > > > >
> > > > > > I believe that most of the time the "special" treatment would be to
> > > > > > ignore the
> > > > > > property "regs" as it will not be an CPU memory address.
> > > > > >
> > > > > > > So maybe
> > > > > > > we should add those nodes to skip_matches until we know what to do
> > > > > > > with
> > > > > > > them. At that point, I would imagine we would introduce a special
> > > > > > > handle_device function that knows what to do. In the case of PCIe,
> > > > > > > something like "handle_device_pcie".
> > > > > > Could you outline how "handle_device_pcie()" will differ with
> > > > > > handle_node()?
> > > > > >
> > > > > > In fact, the problem is not the PCIe node directly. Instead, it is the
> > > > > > second
> > > > > > level of nodes below it (i.e usb@...).
> > > > > >
> > > > > > The current implementation of dt_number_of_address() only look at the
> > > > > > bus type
> > > > > > of the parent. As the parent has no bus type and "ranges" then it
> > > > > > thinks this
> > > > > > is something we can translate to a CPU address.
> > > > > >
> > > > > > However, this is below a PCI bus so the meaning of "reg" is completely
> > > > > > different. In this case, we only need to ignore "reg".
> > > > >
> > > > > I see what you are saying and I agree: if we had to introduce a special
> > > > > case for PCI, then  dt_number_of_address() seems to be a good place.  In
> > > > > fact, we already have special PCI handling, see our
> > > > > __dt_translate_address function and xen/common/device_tree.c:dt_busses.
> > > > >
> > > > > Which brings the question: why is this actually failing?
> > > >
> > > > I already hinted at the reason in my previous e-mail :). Let me expand
> > > > a bit more.
> > > >
> > > > >
> > > > > pcie0 {
> > > > >       ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0
> > > > > 0x40000000>;
> > > > >
> > > > > Which means that PCI addresses 0xc0000000-0x100000000 become
> > > > > 0x600000000-0x700000000.
> > > > >
> > > > > The offending DT is:
> > > > >
> > > > > &pcie0 {
> > > > >           pci@1,0 {
> > > > >                   #address-cells = <3>;
> > > > >                   #size-cells = <2>;
> > > > >                   ranges;
> > > > >
> > > > >                   reg = <0 0 0 0 0>;
> > > > >
> > > > >                   usb@1,0 {
> > > > >                           reg = <0x10000 0 0 0 0>;
> > > > >                           resets = <&reset
> > > > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > >                   };
> > > > >           };
> > > > > };
> > > > >
> > > > >
> > > > > reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> > > > > However, the rest of the regs cells are left as zero. It shouldn't be an
> > > > > issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
> > > >
> > > > The property "ranges" is used to define a mapping or translation
> > > > between the address space of the "bus" (here pci@1,0) and the address
> > > > space of the bus node's parent (&pcie0).
> > > > IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
> > > >
> > > > The problem is dt_number_of_address() will only look at the "bus" type
> > > > of the parent using dt_match_bus(). This will return the default bus
> > > > (see dt_bus_default_match()), because this is a property "ranges" in
> > > > the parent node (i.e. pci@1,0). Therefore...
> > > >
> > > > > So
> > > > > in theory dt_number_of_address() should already return 0 for it.
> > > >
> > > > ... dt_number_of_address() will return 1 even if the address is not a
> > > > CPU address. So when Xen will try to translate it, it will fail.
> > > >
> > > > >
> > > > > Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> > > > > add a check to skip 0 size ranges. Just a hack to explain what I mean:
> > > >
> > > > The parent of pci@1,0 is a PCI bridge (see the property type), so the
> > > > CPU addresses are found not via "regs" but "assigned-addresses".
> > > >
> > > > In this situation, "regs" will have a different meaning and therefore
> > > > there is no promise that the size will be 0.
> > >
> > > I copy/pasted the following:
> > >
> > >         pci@1,0 {
> > >                 #address-cells = <3>;
> > >                 #size-cells = <2>;
> > >                 ranges;
> > >
> > >                 reg = <0 0 0 0 0>;
> > >
> > >                 usb@1,0 {
> > >                         reg = <0x10000 0 0 0 0>;
> > >                         resets = <&reset
> > >                         RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > >                 };
> > >         };
> > >
> > > under pcie0 in my DTS to see what happens (the node is not there in the
> > > device tree for the rpi-5.9.y kernel.) It results in the expected error:
> > >
> > > (XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
> > > (XEN) Device tree generation failed (-22).
> > >
> > > I could verify that pci@1,0 is seen as "default" bus due to the range
> > > property, thus dt_number_of_address() returns 1.
> > >
> > >
> > > I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
> > > given that the parent is a PCI bus. assigned-addresses is the one that
> > > is read.
> > >
> > >
> > > But from a device tree perspective I am actually confused by the
> > > presence of the "ranges" property under pci@1,0. Is that correct? It is
> > > stating that addresses of children devices will be translated to the
> > > address space of the parent (pcie0) using the parent translation rules.
> > > I mean -- it looks like Xen is right in trying to translate reg =
> > > <0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
> > > 0x00000000 0x0 0x40000000>.
> > >
> > > Or maybe since pcie0 is a PCI bus all the children addresses, even
> > > grand-children, are expected to be specified using "assigned-addresses"?
> > >
> > >
> > > Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
> > > missing device_type = "pci"?  Of course, if I add that, the error
> > > disappear.
> >
> > I am afraid, I don't know the answer. I think it would be best to ask the
> > Linux DT folks about it.
> >
> > >
> > > [1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
> > > [2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
> > >
> > > For the sake of making Xen more resilient to possible DTSes, maybe we
> > > should try to extend the dt_bus_pci_match check? See for instance the
> > > change below, but we might be able to come up with better ideas.
> > >
> > >
> > > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > > index 18825e333e..24d998f725 100644
> > > --- a/xen/common/device_tree.c
> > > +++ b/xen/common/device_tree.c
> > > @@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const
> > > __be32 *addr)
> > >     static bool_t dt_bus_pci_match(const struct dt_device_node *np)
> > >   {
> > > +    bool ret = false;
> > > +
> > >       /*
> > >        * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen
> > > PCI
> > >        * powermacs "ht" is hypertransport
> > >        */
> > > -    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > > +    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > >           !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
> > > +
> > > +    if ( ret ) return ret;
> > > +
> > > +    if ( !strcmp(np->name, "pci") )
> > > +        ret = dt_bus_pci_match(dt_get_parent(np));
> >
> > It is probably safe to assume that a PCI device (not hostbridge) will start
> > with "pci". Although, I don't much like the idea because the name is not meant
> > to be stable.
> >
> > AFAICT, we can only rely on "compatible" and "type".
> >
> > Cheers,
> >
> > --
> > Julien Grall
> >
Stefano Stabellini Feb. 4, 2021, 8:36 p.m. UTC | #11
On Thu, 4 Feb 2021, Rob Herring wrote:
> On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
> >
> > Hi Rob,
> >
> > We have a question on the PCIe device tree bindings. In summary, we have
> > come across the Raspberry Pi 4 PCIe description below:
> >
> >
> > pcie0: pcie@7d500000 {
> >    compatible = "brcm,bcm2711-pcie";
> >    reg = <0x0 0x7d500000  0x0 0x9310>;
> >    device_type = "pci";
> >    #address-cells = <3>;
> >    #interrupt-cells = <1>;
> >    #size-cells = <2>;
> >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> >    interrupt-names = "pcie", "msi";
> >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> >                                                      IRQ_TYPE_LEVEL_HIGH>;
> >    msi-controller;
> >    msi-parent = <&pcie0>;
> >
> >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> >              0x0 0x40000000>;
> >    /*
> >     * The wrapper around the PCIe block has a bug
> >     * preventing it from accessing beyond the first 3GB of
> >     * memory.
> >     */
> >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> >                  0x0 0xc0000000>;
> >    brcm,enable-ssc;
> >
> >    pci@1,0 {
> >            #address-cells = <3>;
> >            #size-cells = <2>;
> >            ranges;
> >
> >            reg = <0 0 0 0 0>;
> >
> >            usb@1,0 {
> >                    reg = <0x10000 0 0 0 0>;
> >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> >            };
> >    };
> > };
> >
> >
> > Xen fails to parse it with an error because it tries to remap reg =
> > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> >
> > Reading the device tree description in details, I cannot tell if Xen has
> > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > like a default bus (not a PCI bus), hence, the children regs are
> > translated using the ranges property of the parent (pcie@7d500000).
> >
> > Is it possible that the device tree is missing device_type =
> > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > pcie@7d500000?
> 
> Indeed, it should have device_type. Linux (only recently due to
> another missing device_type case) will also look at node name, but
> only 'pcie'.
>
> We should be able to create (or extend pci-bus.yaml) a schema to catch
> this case.

Ah, that is what I needed to know, thank you!  Is Linux considering a
node named "pcie" as if it has device_type = "pci"?

In Xen, also to cover the RPi4 case, maybe I could add a check for the
node name to be "pci" or "pcie" and if so Xen could assume device_type =
"pci".


> > I'd like to make Xen able to parse this device tree without errors but I
> > am not sure what is the best way to fix it.
> >
> > Thanks for any help you can provide!
> >
> > Cheers,
> >
> > Stefano
> >
> >
> >
> > On Thu, 4 Feb 2021, Julien Grall wrote:
> > > On 04/02/2021 00:13, Stefano Stabellini wrote:
> > > > On Wed, 3 Feb 2021, Julien Grall wrote:
> > > > > On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org>
> > > > > wrote:
> > > > > > > > But aside from PCIe, let's say that we know of a few nodes for which
> > > > > > > > "reg" needs a special treatment. I am not sure it makes sense to
> > > > > > > > proceed
> > > > > > > > with parsing those nodes without knowing how to deal with that.
> > > > > > >
> > > > > > > I believe that most of the time the "special" treatment would be to
> > > > > > > ignore the
> > > > > > > property "regs" as it will not be an CPU memory address.
> > > > > > >
> > > > > > > > So maybe
> > > > > > > > we should add those nodes to skip_matches until we know what to do
> > > > > > > > with
> > > > > > > > them. At that point, I would imagine we would introduce a special
> > > > > > > > handle_device function that knows what to do. In the case of PCIe,
> > > > > > > > something like "handle_device_pcie".
> > > > > > > Could you outline how "handle_device_pcie()" will differ with
> > > > > > > handle_node()?
> > > > > > >
> > > > > > > In fact, the problem is not the PCIe node directly. Instead, it is the
> > > > > > > second
> > > > > > > level of nodes below it (i.e usb@...).
> > > > > > >
> > > > > > > The current implementation of dt_number_of_address() only look at the
> > > > > > > bus type
> > > > > > > of the parent. As the parent has no bus type and "ranges" then it
> > > > > > > thinks this
> > > > > > > is something we can translate to a CPU address.
> > > > > > >
> > > > > > > However, this is below a PCI bus so the meaning of "reg" is completely
> > > > > > > different. In this case, we only need to ignore "reg".
> > > > > >
> > > > > > I see what you are saying and I agree: if we had to introduce a special
> > > > > > case for PCI, then  dt_number_of_address() seems to be a good place.  In
> > > > > > fact, we already have special PCI handling, see our
> > > > > > __dt_translate_address function and xen/common/device_tree.c:dt_busses.
> > > > > >
> > > > > > Which brings the question: why is this actually failing?
> > > > >
> > > > > I already hinted at the reason in my previous e-mail :). Let me expand
> > > > > a bit more.
> > > > >
> > > > > >
> > > > > > pcie0 {
> > > > > >       ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0
> > > > > > 0x40000000>;
> > > > > >
> > > > > > Which means that PCI addresses 0xc0000000-0x100000000 become
> > > > > > 0x600000000-0x700000000.
> > > > > >
> > > > > > The offending DT is:
> > > > > >
> > > > > > &pcie0 {
> > > > > >           pci@1,0 {
> > > > > >                   #address-cells = <3>;
> > > > > >                   #size-cells = <2>;
> > > > > >                   ranges;
> > > > > >
> > > > > >                   reg = <0 0 0 0 0>;
> > > > > >
> > > > > >                   usb@1,0 {
> > > > > >                           reg = <0x10000 0 0 0 0>;
> > > > > >                           resets = <&reset
> > > > > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > > >                   };
> > > > > >           };
> > > > > > };
> > > > > >
> > > > > >
> > > > > > reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> > > > > > However, the rest of the regs cells are left as zero. It shouldn't be an
> > > > > > issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
> > > > >
> > > > > The property "ranges" is used to define a mapping or translation
> > > > > between the address space of the "bus" (here pci@1,0) and the address
> > > > > space of the bus node's parent (&pcie0).
> > > > > IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
> > > > >
> > > > > The problem is dt_number_of_address() will only look at the "bus" type
> > > > > of the parent using dt_match_bus(). This will return the default bus
> > > > > (see dt_bus_default_match()), because this is a property "ranges" in
> > > > > the parent node (i.e. pci@1,0). Therefore...
> > > > >
> > > > > > So
> > > > > > in theory dt_number_of_address() should already return 0 for it.
> > > > >
> > > > > ... dt_number_of_address() will return 1 even if the address is not a
> > > > > CPU address. So when Xen will try to translate it, it will fail.
> > > > >
> > > > > >
> > > > > > Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> > > > > > add a check to skip 0 size ranges. Just a hack to explain what I mean:
> > > > >
> > > > > The parent of pci@1,0 is a PCI bridge (see the property type), so the
> > > > > CPU addresses are found not via "regs" but "assigned-addresses".
> > > > >
> > > > > In this situation, "regs" will have a different meaning and therefore
> > > > > there is no promise that the size will be 0.
> > > >
> > > > I copy/pasted the following:
> > > >
> > > >         pci@1,0 {
> > > >                 #address-cells = <3>;
> > > >                 #size-cells = <2>;
> > > >                 ranges;
> > > >
> > > >                 reg = <0 0 0 0 0>;
> > > >
> > > >                 usb@1,0 {
> > > >                         reg = <0x10000 0 0 0 0>;
> > > >                         resets = <&reset
> > > >                         RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > >                 };
> > > >         };
> > > >
> > > > under pcie0 in my DTS to see what happens (the node is not there in the
> > > > device tree for the rpi-5.9.y kernel.) It results in the expected error:
> > > >
> > > > (XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
> > > > (XEN) Device tree generation failed (-22).
> > > >
> > > > I could verify that pci@1,0 is seen as "default" bus due to the range
> > > > property, thus dt_number_of_address() returns 1.
> > > >
> > > >
> > > > I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
> > > > given that the parent is a PCI bus. assigned-addresses is the one that
> > > > is read.
> > > >
> > > >
> > > > But from a device tree perspective I am actually confused by the
> > > > presence of the "ranges" property under pci@1,0. Is that correct? It is
> > > > stating that addresses of children devices will be translated to the
> > > > address space of the parent (pcie0) using the parent translation rules.
> > > > I mean -- it looks like Xen is right in trying to translate reg =
> > > > <0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
> > > > 0x00000000 0x0 0x40000000>.
> > > >
> > > > Or maybe since pcie0 is a PCI bus all the children addresses, even
> > > > grand-children, are expected to be specified using "assigned-addresses"?
> > > >
> > > >
> > > > Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
> > > > missing device_type = "pci"?  Of course, if I add that, the error
> > > > disappear.
> > >
> > > I am afraid, I don't know the answer. I think it would be best to ask the
> > > Linux DT folks about it.
> > >
> > > >
> > > > [1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
> > > > [2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
> > > >
> > > > For the sake of making Xen more resilient to possible DTSes, maybe we
> > > > should try to extend the dt_bus_pci_match check? See for instance the
> > > > change below, but we might be able to come up with better ideas.
> > > >
> > > >
> > > > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > > > index 18825e333e..24d998f725 100644
> > > > --- a/xen/common/device_tree.c
> > > > +++ b/xen/common/device_tree.c
> > > > @@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const
> > > > __be32 *addr)
> > > >     static bool_t dt_bus_pci_match(const struct dt_device_node *np)
> > > >   {
> > > > +    bool ret = false;
> > > > +
> > > >       /*
> > > >        * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen
> > > > PCI
> > > >        * powermacs "ht" is hypertransport
> > > >        */
> > > > -    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > > > +    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > > >           !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
> > > > +
> > > > +    if ( ret ) return ret;
> > > > +
> > > > +    if ( !strcmp(np->name, "pci") )
> > > > +        ret = dt_bus_pci_match(dt_get_parent(np));
> > >
> > > It is probably safe to assume that a PCI device (not hostbridge) will start
> > > with "pci". Although, I don't much like the idea because the name is not meant
> > > to be stable.
> > >
> > > AFAICT, we can only rely on "compatible" and "type".
> > >
> > > Cheers,
> > >
> > > --
> > > Julien Grall
> > >
>
Rob Herring Feb. 4, 2021, 9:08 p.m. UTC | #12
On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> On Thu, 4 Feb 2021, Rob Herring wrote:
> > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > <sstabellini@kernel.org> wrote:
> > >
> > > Hi Rob,
> > >
> > > We have a question on the PCIe device tree bindings. In summary, we have
> > > come across the Raspberry Pi 4 PCIe description below:
> > >
> > >
> > > pcie0: pcie@7d500000 {
> > >    compatible = "brcm,bcm2711-pcie";
> > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > >    device_type = "pci";
> > >    #address-cells = <3>;
> > >    #interrupt-cells = <1>;
> > >    #size-cells = <2>;
> > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > >    interrupt-names = "pcie", "msi";
> > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > >    msi-controller;
> > >    msi-parent = <&pcie0>;
> > >
> > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > >              0x0 0x40000000>;
> > >    /*
> > >     * The wrapper around the PCIe block has a bug
> > >     * preventing it from accessing beyond the first 3GB of
> > >     * memory.
> > >     */
> > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > >                  0x0 0xc0000000>;
> > >    brcm,enable-ssc;
> > >
> > >    pci@1,0 {
> > >            #address-cells = <3>;
> > >            #size-cells = <2>;
> > >            ranges;
> > >
> > >            reg = <0 0 0 0 0>;
> > >
> > >            usb@1,0 {
> > >                    reg = <0x10000 0 0 0 0>;
> > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > >            };
> > >    };
> > > };
> > >
> > >
> > > Xen fails to parse it with an error because it tries to remap reg =
> > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > >
> > > Reading the device tree description in details, I cannot tell if Xen has
> > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > like a default bus (not a PCI bus), hence, the children regs are
> > > translated using the ranges property of the parent (pcie@7d500000).
> > >
> > > Is it possible that the device tree is missing device_type =
> > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > pcie@7d500000?
> >
> > Indeed, it should have device_type. Linux (only recently due to
> > another missing device_type case) will also look at node name, but
> > only 'pcie'.
> >
> > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > this case.
>
> Ah, that is what I needed to know, thank you!  Is Linux considering a
> node named "pcie" as if it has device_type = "pci"?

Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.

> In Xen, also to cover the RPi4 case, maybe I could add a check for the
> node name to be "pci" or "pcie" and if so Xen could assume device_type =
> "pci".

I assume this never worked for RPi4 (and Linux will have the same
issue), so can't we just update the DT in this case?

Rob
Stefano Stabellini Feb. 4, 2021, 9:33 p.m. UTC | #13
On Thu, 4 Feb 2021, Rob Herring wrote:
> On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
> >
> > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > > <sstabellini@kernel.org> wrote:
> > > >
> > > > Hi Rob,
> > > >
> > > > We have a question on the PCIe device tree bindings. In summary, we have
> > > > come across the Raspberry Pi 4 PCIe description below:
> > > >
> > > >
> > > > pcie0: pcie@7d500000 {
> > > >    compatible = "brcm,bcm2711-pcie";
> > > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > > >    device_type = "pci";
> > > >    #address-cells = <3>;
> > > >    #interrupt-cells = <1>;
> > > >    #size-cells = <2>;
> > > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > > >    interrupt-names = "pcie", "msi";
> > > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > > >    msi-controller;
> > > >    msi-parent = <&pcie0>;
> > > >
> > > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > > >              0x0 0x40000000>;
> > > >    /*
> > > >     * The wrapper around the PCIe block has a bug
> > > >     * preventing it from accessing beyond the first 3GB of
> > > >     * memory.
> > > >     */
> > > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > > >                  0x0 0xc0000000>;
> > > >    brcm,enable-ssc;
> > > >
> > > >    pci@1,0 {
> > > >            #address-cells = <3>;
> > > >            #size-cells = <2>;
> > > >            ranges;
> > > >
> > > >            reg = <0 0 0 0 0>;
> > > >
> > > >            usb@1,0 {
> > > >                    reg = <0x10000 0 0 0 0>;
> > > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > >            };
> > > >    };
> > > > };
> > > >
> > > >
> > > > Xen fails to parse it with an error because it tries to remap reg =
> > > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > > >
> > > > Reading the device tree description in details, I cannot tell if Xen has
> > > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > > like a default bus (not a PCI bus), hence, the children regs are
> > > > translated using the ranges property of the parent (pcie@7d500000).
> > > >
> > > > Is it possible that the device tree is missing device_type =
> > > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > > pcie@7d500000?
> > >
> > > Indeed, it should have device_type. Linux (only recently due to
> > > another missing device_type case) will also look at node name, but
> > > only 'pcie'.
> > >
> > > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > > this case.
> >
> > Ah, that is what I needed to know, thank you!  Is Linux considering a
> > node named "pcie" as if it has device_type = "pci"?
> 
> Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.
> 
> > In Xen, also to cover the RPi4 case, maybe I could add a check for the
> > node name to be "pci" or "pcie" and if so Xen could assume device_type =
> > "pci".
> 
> I assume this never worked for RPi4 (and Linux will have the same
> issue), so can't we just update the DT in this case?

I am not sure where the DT is coming from, probably from the RPi4 kernel
trees or firmware. I think it would be good if somebody got in touch to
tell them they have an issue.

Elliot, where was that device tree coming from originally?


From a Xen perspective, for the sake of minimizing user pains (given
that it might take a while to update those DTs) and to introduce as few
ties as possible with kernel versions, it might be best to add the
"pci" name workaround maybe with a /* HACK */ comment on top.
Rob Herring Feb. 4, 2021, 9:52 p.m. UTC | #14
On Thu, Feb 4, 2021 at 3:33 PM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> On Thu, 4 Feb 2021, Rob Herring wrote:
> > On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
> > <sstabellini@kernel.org> wrote:
> > >
> > > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > > > <sstabellini@kernel.org> wrote:
> > > > >
> > > > > Hi Rob,
> > > > >
> > > > > We have a question on the PCIe device tree bindings. In summary, we have
> > > > > come across the Raspberry Pi 4 PCIe description below:
> > > > >
> > > > >
> > > > > pcie0: pcie@7d500000 {
> > > > >    compatible = "brcm,bcm2711-pcie";
> > > > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > > > >    device_type = "pci";
> > > > >    #address-cells = <3>;
> > > > >    #interrupt-cells = <1>;
> > > > >    #size-cells = <2>;
> > > > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > > > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > > > >    interrupt-names = "pcie", "msi";
> > > > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > > > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > > > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > > > >    msi-controller;
> > > > >    msi-parent = <&pcie0>;
> > > > >
> > > > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > > > >              0x0 0x40000000>;
> > > > >    /*
> > > > >     * The wrapper around the PCIe block has a bug
> > > > >     * preventing it from accessing beyond the first 3GB of
> > > > >     * memory.
> > > > >     */
> > > > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > > > >                  0x0 0xc0000000>;
> > > > >    brcm,enable-ssc;
> > > > >
> > > > >    pci@1,0 {
> > > > >            #address-cells = <3>;
> > > > >            #size-cells = <2>;
> > > > >            ranges;
> > > > >
> > > > >            reg = <0 0 0 0 0>;
> > > > >
> > > > >            usb@1,0 {
> > > > >                    reg = <0x10000 0 0 0 0>;
> > > > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > >            };
> > > > >    };
> > > > > };
> > > > >
> > > > >
> > > > > Xen fails to parse it with an error because it tries to remap reg =
> > > > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > > > >
> > > > > Reading the device tree description in details, I cannot tell if Xen has
> > > > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > > > like a default bus (not a PCI bus), hence, the children regs are
> > > > > translated using the ranges property of the parent (pcie@7d500000).
> > > > >
> > > > > Is it possible that the device tree is missing device_type =
> > > > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > > > pcie@7d500000?
> > > >
> > > > Indeed, it should have device_type. Linux (only recently due to
> > > > another missing device_type case) will also look at node name, but
> > > > only 'pcie'.
> > > >
> > > > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > > > this case.
> > >
> > > Ah, that is what I needed to know, thank you!  Is Linux considering a
> > > node named "pcie" as if it has device_type = "pci"?
> >
> > Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.
> >
> > > In Xen, also to cover the RPi4 case, maybe I could add a check for the
> > > node name to be "pci" or "pcie" and if so Xen could assume device_type =
> > > "pci".
> >
> > I assume this never worked for RPi4 (and Linux will have the same
> > issue), so can't we just update the DT in this case?
>
> I am not sure where the DT is coming from, probably from the RPi4 kernel
> trees or firmware. I think it would be good if somebody got in touch to
> tell them they have an issue.

So you just take whatever downstream RPi invents? Good luck.

> Elliot, where was that device tree coming from originally?
>
>
> From a Xen perspective, for the sake of minimizing user pains (given
> that it might take a while to update those DTs) and to introduce as few
> ties as possible with kernel versions, it might be best to add the
> "pci" name workaround maybe with a /* HACK */ comment on top.

There is some possibility of that causing a regression on another
platform. That's why we limited Linux as much as possible and also
print a warning. But we have to worry about 20 year old PowerMacs and
such.

Rob
Stefano Stabellini Feb. 4, 2021, 10:02 p.m. UTC | #15
On Thu, 4 Feb 2021, Rob Herring wrote:
> On Thu, Feb 4, 2021 at 3:33 PM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
> >
> > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
> > > <sstabellini@kernel.org> wrote:
> > > >
> > > > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > > > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > > > > <sstabellini@kernel.org> wrote:
> > > > > >
> > > > > > Hi Rob,
> > > > > >
> > > > > > We have a question on the PCIe device tree bindings. In summary, we have
> > > > > > come across the Raspberry Pi 4 PCIe description below:
> > > > > >
> > > > > >
> > > > > > pcie0: pcie@7d500000 {
> > > > > >    compatible = "brcm,bcm2711-pcie";
> > > > > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > > > > >    device_type = "pci";
> > > > > >    #address-cells = <3>;
> > > > > >    #interrupt-cells = <1>;
> > > > > >    #size-cells = <2>;
> > > > > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > > > > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > > > > >    interrupt-names = "pcie", "msi";
> > > > > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > > > > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > > > > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > > > > >    msi-controller;
> > > > > >    msi-parent = <&pcie0>;
> > > > > >
> > > > > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > > > > >              0x0 0x40000000>;
> > > > > >    /*
> > > > > >     * The wrapper around the PCIe block has a bug
> > > > > >     * preventing it from accessing beyond the first 3GB of
> > > > > >     * memory.
> > > > > >     */
> > > > > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > > > > >                  0x0 0xc0000000>;
> > > > > >    brcm,enable-ssc;
> > > > > >
> > > > > >    pci@1,0 {
> > > > > >            #address-cells = <3>;
> > > > > >            #size-cells = <2>;
> > > > > >            ranges;
> > > > > >
> > > > > >            reg = <0 0 0 0 0>;
> > > > > >
> > > > > >            usb@1,0 {
> > > > > >                    reg = <0x10000 0 0 0 0>;
> > > > > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > > >            };
> > > > > >    };
> > > > > > };
> > > > > >
> > > > > >
> > > > > > Xen fails to parse it with an error because it tries to remap reg =
> > > > > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > > > > >
> > > > > > Reading the device tree description in details, I cannot tell if Xen has
> > > > > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > > > > like a default bus (not a PCI bus), hence, the children regs are
> > > > > > translated using the ranges property of the parent (pcie@7d500000).
> > > > > >
> > > > > > Is it possible that the device tree is missing device_type =
> > > > > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > > > > pcie@7d500000?
> > > > >
> > > > > Indeed, it should have device_type. Linux (only recently due to
> > > > > another missing device_type case) will also look at node name, but
> > > > > only 'pcie'.
> > > > >
> > > > > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > > > > this case.
> > > >
> > > > Ah, that is what I needed to know, thank you!  Is Linux considering a
> > > > node named "pcie" as if it has device_type = "pci"?
> > >
> > > Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.
> > >
> > > > In Xen, also to cover the RPi4 case, maybe I could add a check for the
> > > > node name to be "pci" or "pcie" and if so Xen could assume device_type =
> > > > "pci".
> > >
> > > I assume this never worked for RPi4 (and Linux will have the same
> > > issue), so can't we just update the DT in this case?
> >
> > I am not sure where the DT is coming from, probably from the RPi4 kernel
> > trees or firmware. I think it would be good if somebody got in touch to
> > tell them they have an issue.
> 
> So you just take whatever downstream RPi invents? Good luck.

Ehm, yes, I know what you are talking about :-D

We don't promise to support downstream kernels and device trees but
fortunately they work most of the time.


> > Elliot, where was that device tree coming from originally?
> >
> >
> > From a Xen perspective, for the sake of minimizing user pains (given
> > that it might take a while to update those DTs) and to introduce as few
> > ties as possible with kernel versions, it might be best to add the
> > "pci" name workaround maybe with a /* HACK */ comment on top.
> 
> There is some possibility of that causing a regression on another
> platform. That's why we limited Linux as much as possible and also
> print a warning. But we have to worry about 20 year old PowerMacs and
> such.

I see, that makes sense. Printing a warning is very sensible.
Stefano Stabellini Feb. 4, 2021, 10:39 p.m. UTC | #16
On Thu, 4 Feb 2021, Julien Grall wrote:
> On 04/02/2021 00:13, Stefano Stabellini wrote:
> > On Wed, 3 Feb 2021, Julien Grall wrote:
> > > On Wed, 3 Feb 2021 at 22:18, Stefano Stabellini <sstabellini@kernel.org>
> > > wrote:
> > > > > > But aside from PCIe, let's say that we know of a few nodes for which
> > > > > > "reg" needs a special treatment. I am not sure it makes sense to
> > > > > > proceed
> > > > > > with parsing those nodes without knowing how to deal with that.
> > > > > 
> > > > > I believe that most of the time the "special" treatment would be to
> > > > > ignore the
> > > > > property "regs" as it will not be an CPU memory address.
> > > > > 
> > > > > > So maybe
> > > > > > we should add those nodes to skip_matches until we know what to do
> > > > > > with
> > > > > > them. At that point, I would imagine we would introduce a special
> > > > > > handle_device function that knows what to do. In the case of PCIe,
> > > > > > something like "handle_device_pcie".
> > > > > Could you outline how "handle_device_pcie()" will differ with
> > > > > handle_node()?
> > > > > 
> > > > > In fact, the problem is not the PCIe node directly. Instead, it is the
> > > > > second
> > > > > level of nodes below it (i.e usb@...).
> > > > > 
> > > > > The current implementation of dt_number_of_address() only look at the
> > > > > bus type
> > > > > of the parent. As the parent has no bus type and "ranges" then it
> > > > > thinks this
> > > > > is something we can translate to a CPU address.
> > > > > 
> > > > > However, this is below a PCI bus so the meaning of "reg" is completely
> > > > > different. In this case, we only need to ignore "reg".
> > > > 
> > > > I see what you are saying and I agree: if we had to introduce a special
> > > > case for PCI, then  dt_number_of_address() seems to be a good place.  In
> > > > fact, we already have special PCI handling, see our
> > > > __dt_translate_address function and xen/common/device_tree.c:dt_busses.
> > > > 
> > > > Which brings the question: why is this actually failing?
> > > 
> > > I already hinted at the reason in my previous e-mail :). Let me expand
> > > a bit more.
> > > 
> > > > 
> > > > pcie0 {
> > > >       ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0
> > > > 0x40000000>;
> > > > 
> > > > Which means that PCI addresses 0xc0000000-0x100000000 become
> > > > 0x600000000-0x700000000.
> > > > 
> > > > The offending DT is:
> > > > 
> > > > &pcie0 {
> > > >           pci@1,0 {
> > > >                   #address-cells = <3>;
> > > >                   #size-cells = <2>;
> > > >                   ranges;
> > > > 
> > > >                   reg = <0 0 0 0 0>;
> > > > 
> > > >                   usb@1,0 {
> > > >                           reg = <0x10000 0 0 0 0>;
> > > >                           resets = <&reset
> > > > RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > >                   };
> > > >           };
> > > > };
> > > > 
> > > > 
> > > > reg = <0x10000 0 0 0 0> means that usb@1,0 is PCI device 01:00.0.
> > > > However, the rest of the regs cells are left as zero. It shouldn't be an
> > > > issue because usb@1,0 is a child of pci@1,0 but pci@1,0 is not a bus.
> > > 
> > > The property "ranges" is used to define a mapping or translation
> > > between the address space of the "bus" (here pci@1,0) and the address
> > > space of the bus node's parent (&pcie0).
> > > IOW, it means "reg" in usb@1,0 is an address on the PCI bus (i.e. BDF).
> > > 
> > > The problem is dt_number_of_address() will only look at the "bus" type
> > > of the parent using dt_match_bus(). This will return the default bus
> > > (see dt_bus_default_match()), because this is a property "ranges" in
> > > the parent node (i.e. pci@1,0). Therefore...
> > > 
> > > > So
> > > > in theory dt_number_of_address() should already return 0 for it.
> > > 
> > > ... dt_number_of_address() will return 1 even if the address is not a
> > > CPU address. So when Xen will try to translate it, it will fail.
> > > 
> > > > 
> > > > Maybe reg = <0 0 0 0 0> is the problem. In that case, we could simply
> > > > add a check to skip 0 size ranges. Just a hack to explain what I mean:
> > > 
> > > The parent of pci@1,0 is a PCI bridge (see the property type), so the
> > > CPU addresses are found not via "regs" but "assigned-addresses".
> > > 
> > > In this situation, "regs" will have a different meaning and therefore
> > > there is no promise that the size will be 0.
> > 
> > I copy/pasted the following:
> > 
> >         pci@1,0 {
> >                 #address-cells = <3>;
> >                 #size-cells = <2>;
> >                 ranges;
> > 
> >                 reg = <0 0 0 0 0>;
> > 
> >                 usb@1,0 {
> >                         reg = <0x10000 0 0 0 0>;
> >                         resets = <&reset
> >                         RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> >                 };
> >         };
> > 
> > under pcie0 in my DTS to see what happens (the node is not there in the
> > device tree for the rpi-5.9.y kernel.) It results in the expected error:
> > 
> > (XEN) Unable to retrieve address 0 for /scb/pcie@7d500000/pci@1,0/usb@1,0
> > (XEN) Device tree generation failed (-22).
> > 
> > I could verify that pci@1,0 is seen as "default" bus due to the range
> > property, thus dt_number_of_address() returns 1.
> > 
> > 
> > I can see that reg = <0 0 0 0 0> is not a problem because it is ignored
> > given that the parent is a PCI bus. assigned-addresses is the one that
> > is read.
> > 
> > 
> > But from a device tree perspective I am actually confused by the
> > presence of the "ranges" property under pci@1,0. Is that correct? It is
> > stating that addresses of children devices will be translated to the
> > address space of the parent (pcie0) using the parent translation rules.
> > I mean -- it looks like Xen is right in trying to translate reg =
> > <0x10000 0 0 0 0> using ranges = <0x02000000 0x0 0xc0000000 0x6
> > 0x00000000 0x0 0x40000000>.
> > 
> > Or maybe since pcie0 is a PCI bus all the children addresses, even
> > grand-children, are expected to be specified using "assigned-addresses"?
> > 
> > 
> > Looking at other examples [1][2] maybe the mistake is that pci@1,0 is
> > missing device_type = "pci"?  Of course, if I add that, the error
> > disappear.
> 
> I am afraid, I don't know the answer. I think it would be best to ask the
> Linux DT folks about it.
> 
> > 
> > [1] Documentation/devicetree/bindings/pci/mvebu-pci.txt
> > [2] Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
> > 
> > For the sake of making Xen more resilient to possible DTSes, maybe we
> > should try to extend the dt_bus_pci_match check? See for instance the
> > change below, but we might be able to come up with better ideas.
> > 
> > 
> > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > index 18825e333e..24d998f725 100644
> > --- a/xen/common/device_tree.c
> > +++ b/xen/common/device_tree.c
> > @@ -565,12 +565,21 @@ static unsigned int dt_bus_default_get_flags(const
> > __be32 *addr)
> >     static bool_t dt_bus_pci_match(const struct dt_device_node *np)
> >   {
> > +    bool ret = false;
> > +
> >       /*
> >        * "pciex" is PCI Express "vci" is for the /chaos bridge on 1st-gen
> > PCI
> >        * powermacs "ht" is hypertransport
> >        */
> > -    return !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> > +    ret = !strcmp(np->type, "pci") || !strcmp(np->type, "pciex") ||
> >           !strcmp(np->type, "vci") || !strcmp(np->type, "ht");
> > +
> > +    if ( ret ) return ret;
> > +
> > +    if ( !strcmp(np->name, "pci") )
> > +        ret = dt_bus_pci_match(dt_get_parent(np));
> 
> It is probably safe to assume that a PCI device (not hostbridge) will start
> with "pci". Although, I don't much like the idea because the name is not meant
> to be stable.
> 
> AFAICT, we can only rely on "compatible" and "type".

After the discussion with Rob, it is clear that we have to add a check
on the node name for "pcie" in dt_bus_pci_match. However, that wouldn't
solve the problem reported by Elliot, because in this case the node name
is "pci" not "pcie".

I suggest that we add a check for "pci" too in dt_bus_pci_match,
although that means that our check will be slightly different from the
equivalent Linux check. The "pci" check should come with an in-code
comment to explain the situation and the reasons for it to be.

What do you think?
Elliott Mitchell Feb. 5, 2021, 3:32 a.m. UTC | #17
On Thu, Feb 04, 2021 at 03:52:26PM -0600, Rob Herring wrote:
> On Thu, Feb 4, 2021 at 3:33 PM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
> >
> > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
> > > <sstabellini@kernel.org> wrote:
> > > >
> > > > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > > > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > > > > <sstabellini@kernel.org> wrote:
> > > > > >
> > > > > > Hi Rob,
> > > > > >
> > > > > > We have a question on the PCIe device tree bindings. In summary, we have
> > > > > > come across the Raspberry Pi 4 PCIe description below:
> > > > > >
> > > > > >
> > > > > > pcie0: pcie@7d500000 {
> > > > > >    compatible = "brcm,bcm2711-pcie";
> > > > > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > > > > >    device_type = "pci";
> > > > > >    #address-cells = <3>;
> > > > > >    #interrupt-cells = <1>;
> > > > > >    #size-cells = <2>;
> > > > > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > > > > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > > > > >    interrupt-names = "pcie", "msi";
> > > > > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > > > > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > > > > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > > > > >    msi-controller;
> > > > > >    msi-parent = <&pcie0>;
> > > > > >
> > > > > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > > > > >              0x0 0x40000000>;
> > > > > >    /*
> > > > > >     * The wrapper around the PCIe block has a bug
> > > > > >     * preventing it from accessing beyond the first 3GB of
> > > > > >     * memory.
> > > > > >     */
> > > > > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > > > > >                  0x0 0xc0000000>;
> > > > > >    brcm,enable-ssc;
> > > > > >
> > > > > >    pci@1,0 {
> > > > > >            #address-cells = <3>;
> > > > > >            #size-cells = <2>;
> > > > > >            ranges;
> > > > > >
> > > > > >            reg = <0 0 0 0 0>;
> > > > > >
> > > > > >            usb@1,0 {
> > > > > >                    reg = <0x10000 0 0 0 0>;
> > > > > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > > >            };
> > > > > >    };
> > > > > > };
> > > > > >
> > > > > >
> > > > > > Xen fails to parse it with an error because it tries to remap reg =
> > > > > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > > > > >
> > > > > > Reading the device tree description in details, I cannot tell if Xen has
> > > > > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > > > > like a default bus (not a PCI bus), hence, the children regs are
> > > > > > translated using the ranges property of the parent (pcie@7d500000).
> > > > > >
> > > > > > Is it possible that the device tree is missing device_type =
> > > > > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > > > > pcie@7d500000?
> > > > >
> > > > > Indeed, it should have device_type. Linux (only recently due to
> > > > > another missing device_type case) will also look at node name, but
> > > > > only 'pcie'.
> > > > >
> > > > > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > > > > this case.
> > > >
> > > > Ah, that is what I needed to know, thank you!  Is Linux considering a
> > > > node named "pcie" as if it has device_type = "pci"?
> > >
> > > Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.
> > >
> > > > In Xen, also to cover the RPi4 case, maybe I could add a check for the
> > > > node name to be "pci" or "pcie" and if so Xen could assume device_type =
> > > > "pci".
> > >
> > > I assume this never worked for RPi4 (and Linux will have the same
> > > issue), so can't we just update the DT in this case?
> >
> > I am not sure where the DT is coming from, probably from the RPi4 kernel
> > trees or firmware. I think it would be good if somebody got in touch to
> > tell them they have an issue.
> 
> So you just take whatever downstream RPi invents? Good luck.
> 
> > Elliot, where was that device tree coming from originally?

Please excuse my very weak device-tree fu...

I'm unsure which section is the problem, but looking at `git blame` what
shows is commt d5c8dc0d4c880fbde5293cc186b1ab23466254c4.

This commit is present in the Linux master branch and the linux-5.10.y
branch.

You were saying?
Rob Herring Feb. 5, 2021, 1:56 p.m. UTC | #18
On Thu, Feb 4, 2021 at 9:51 PM Elliott Mitchell <ehem+undef@m5p.com> wrote:
>
> On Thu, Feb 04, 2021 at 03:52:26PM -0600, Rob Herring wrote:
> > On Thu, Feb 4, 2021 at 3:33 PM Stefano Stabellini
> > <sstabellini@kernel.org> wrote:
> > >
> > > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > > On Thu, Feb 4, 2021 at 2:36 PM Stefano Stabellini
> > > > <sstabellini@kernel.org> wrote:
> > > > >
> > > > > On Thu, 4 Feb 2021, Rob Herring wrote:
> > > > > > On Thu, Feb 4, 2021 at 11:56 AM Stefano Stabellini
> > > > > > <sstabellini@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi Rob,
> > > > > > >
> > > > > > > We have a question on the PCIe device tree bindings. In summary, we have
> > > > > > > come across the Raspberry Pi 4 PCIe description below:
> > > > > > >
> > > > > > >
> > > > > > > pcie0: pcie@7d500000 {
> > > > > > >    compatible = "brcm,bcm2711-pcie";
> > > > > > >    reg = <0x0 0x7d500000  0x0 0x9310>;
> > > > > > >    device_type = "pci";
> > > > > > >    #address-cells = <3>;
> > > > > > >    #interrupt-cells = <1>;
> > > > > > >    #size-cells = <2>;
> > > > > > >    interrupts = <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
> > > > > > >                 <GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
> > > > > > >    interrupt-names = "pcie", "msi";
> > > > > > >    interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> > > > > > >    interrupt-map = <0 0 0 1 &gicv2 GIC_SPI 143
> > > > > > >                                                      IRQ_TYPE_LEVEL_HIGH>;
> > > > > > >    msi-controller;
> > > > > > >    msi-parent = <&pcie0>;
> > > > > > >
> > > > > > >    ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000
> > > > > > >              0x0 0x40000000>;
> > > > > > >    /*
> > > > > > >     * The wrapper around the PCIe block has a bug
> > > > > > >     * preventing it from accessing beyond the first 3GB of
> > > > > > >     * memory.
> > > > > > >     */
> > > > > > >    dma-ranges = <0x02000000 0x0 0x00000000 0x0 0x00000000
> > > > > > >                  0x0 0xc0000000>;
> > > > > > >    brcm,enable-ssc;
> > > > > > >
> > > > > > >    pci@1,0 {
> > > > > > >            #address-cells = <3>;
> > > > > > >            #size-cells = <2>;
> > > > > > >            ranges;
> > > > > > >
> > > > > > >            reg = <0 0 0 0 0>;
> > > > > > >
> > > > > > >            usb@1,0 {
> > > > > > >                    reg = <0x10000 0 0 0 0>;
> > > > > > >                    resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
> > > > > > >            };
> > > > > > >    };
> > > > > > > };
> > > > > > >
> > > > > > >
> > > > > > > Xen fails to parse it with an error because it tries to remap reg =
> > > > > > > <0x10000 0 0 0 0> as if it was a CPU address and of course it fails.
> > > > > > >
> > > > > > > Reading the device tree description in details, I cannot tell if Xen has
> > > > > > > a bug: the ranges property under pci@1,0 means that pci@1,0 is treated
> > > > > > > like a default bus (not a PCI bus), hence, the children regs are
> > > > > > > translated using the ranges property of the parent (pcie@7d500000).
> > > > > > >
> > > > > > > Is it possible that the device tree is missing device_type =
> > > > > > > "pci" under pci@1,0? Or is it just implied because pci@1,0 is a child of
> > > > > > > pcie@7d500000?
> > > > > >
> > > > > > Indeed, it should have device_type. Linux (only recently due to
> > > > > > another missing device_type case) will also look at node name, but
> > > > > > only 'pcie'.
> > > > > >
> > > > > > We should be able to create (or extend pci-bus.yaml) a schema to catch
> > > > > > this case.
> > > > >
> > > > > Ah, that is what I needed to know, thank you!  Is Linux considering a
> > > > > node named "pcie" as if it has device_type = "pci"?
> > > >
> > > > Yes, it was added for Rockchip RK3399 to avoid a DT update and regression.
> > > >
> > > > > In Xen, also to cover the RPi4 case, maybe I could add a check for the
> > > > > node name to be "pci" or "pcie" and if so Xen could assume device_type =
> > > > > "pci".
> > > >
> > > > I assume this never worked for RPi4 (and Linux will have the same
> > > > issue), so can't we just update the DT in this case?
> > >
> > > I am not sure where the DT is coming from, probably from the RPi4 kernel
> > > trees or firmware. I think it would be good if somebody got in touch to
> > > tell them they have an issue.
> >
> > So you just take whatever downstream RPi invents? Good luck.
> >
> > > Elliot, where was that device tree coming from originally?
>
> Please excuse my very weak device-tree fu...
>
> I'm unsure which section is the problem, but looking at `git blame` what
> shows is commt d5c8dc0d4c880fbde5293cc186b1ab23466254c4.
>
> This commit is present in the Linux master branch and the linux-5.10.y
> branch.
>
> You were saying?

That commit looks perfectly fine. The problem is the PCI bridge node
shown above which doesn't exist upstream.

Rob
Julien Grall Feb. 6, 2021, 8:47 a.m. UTC | #19
Hi Stefano,

On 04/02/2021 22:39, Stefano Stabellini wrote:
> After the discussion with Rob, it is clear that we have to add a check
> on the node name for "pcie" in dt_bus_pci_match. However, that wouldn't
> solve the problem reported by Elliot, because in this case the node name
> is "pci" not "pcie".

I'd like to point out that in the Linux case, the problem was in the 
hostbridge and not the PCI device.

> 
> I suggest that we add a check for "pci" too in dt_bus_pci_match,
> although that means that our check will be slightly different from the
> equivalent Linux check. The "pci" check should come with an in-code
> comment to explain the situation and the reasons for it to be.

I'd like to follow the same approach as a Linux commit:

commit d1ac0002dd297069bb8448c2764c9c31c4668441
Author: Marc Zyngier <maz@kernel.org>
Date:   Wed Aug 19 10:42:55 2020 +0100

     of: address: Work around missing device_type property in pcie nodes

This means a warning should also be added.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 374bf655ee..c0568b7579 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1407,9 +1407,9 @@  static int __init handle_device(struct domain *d, struct dt_device_node *dev,
         res = dt_device_get_address(dev, i, &addr, &size);
         if ( res )
         {
-            printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
-                   i, dt_node_full_name(dev));
-            return res;
+            printk(XENLOG_ERR "Unable to retrieve address of %s, index %u\n",
+                   dt_node_full_name(dev), i);
+            continue;
         }
 
         res = map_range_to_domain(dev, addr, size, &mr_data);