diff mbox series

[1/3] xen/ppc: Enable Boot Allocator

Message ID 5ed3351f7824a5d0a1ff29c17cb55b2608f28109.1701384928.git.sanastasio@raptorengineering.com (mailing list archive)
State Superseded
Headers show
Series Early Boot Allocation on Power | expand

Commit Message

Shawn Anastasio Dec. 1, 2023, 7:59 p.m. UTC
Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
allocator. Routines for parsing arm-specific devicetree nodes (e.g.
multiboot) were excluded, reducing the overall footprint of code that
was copied.

Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>
---
 xen/arch/ppc/Makefile            |   1 +
 xen/arch/ppc/bootfdt.c           | 507 +++++++++++++++++++++++++++++++
 xen/arch/ppc/include/asm/setup.h | 113 +++++++
 xen/arch/ppc/setup.c             | 109 ++++++-
 4 files changed, 729 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/ppc/bootfdt.c

Comments

Julien Grall Dec. 1, 2023, 8:56 p.m. UTC | #1
(+ Arm and RISC-V folks)

Hi Shawn,

On 01/12/2023 20:59, Shawn Anastasio wrote:
> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
> multiboot) were excluded, reducing the overall footprint of code that
> was copied.

I expect RISC-V to want similar code. I am not really thrilled in the 
idea of having 3 similar copy of the parsing. So can we extract the 
common bits (or harmonize it) so it can be shared?

Maybe Oleksii has already a version doing that.

Cheers,
Timothy Pearson Dec. 1, 2023, 9:10 p.m. UTC | #2
> (+ Arm and RISC-V folks)
>
> Hi Shawn,
>
> On 01/12/2023 20:59, Shawn Anastasio wrote:
>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>> multiboot) were excluded, reducing the overall footprint of code that
>> was copied.
>
> I expect RISC-V to want similar code. I am not really thrilled in the
> idea of having 3 similar copy of the parsing. So can we extract the
> common bits (or harmonize it) so it can be shared?
>
> Maybe Oleksii has already a version doing that.

Just my $0.02, but wouldn't it make more sense to have the RISC-V port
handle the deduplication, seeing as the POWER support came first here?  We
don't know if/when the RISC-V port will be ready for submission, so I'm
not sure why we should be on the hook for this particular work.

Thanks!
Julien Grall Dec. 1, 2023, 10:35 p.m. UTC | #3
Hi,

On 01/12/2023 22:10, tpearson@raptorengineering.com wrote:
>> (+ Arm and RISC-V folks)
>>
>> Hi Shawn,
>>
>> On 01/12/2023 20:59, Shawn Anastasio wrote:
>>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>>> multiboot) were excluded, reducing the overall footprint of code that
>>> was copied.
>>
>> I expect RISC-V to want similar code. I am not really thrilled in the
>> idea of having 3 similar copy of the parsing. So can we extract the
>> common bits (or harmonize it) so it can be shared?
>>
>> Maybe Oleksii has already a version doing that.
> 
> Just my $0.02, but wouldn't it make more sense to have the RISC-V port
> handle the deduplication, seeing as the POWER support came first here?  We
> don't know if/when the RISC-V port will be ready for submission, so I'm
> not sure why we should be on the hook for this particular work.

That would have been a valid point if you were writing a brand new 
implementation. But this was *copied* from Arm.

Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to 
have:
    - As well copied some code from arm/setup.c
    - Re-order some statement (not clear why)
    - Remove some code which you say are Arm specific. Yet some is part 
of the Device-Tree spec and I would expect to be used in the future.

So my question here stands. Why are you mainly copying verbatimly the 
Arm code rather than consolidating in one place?

Cheers,
Timothy Pearson Dec. 1, 2023, 11:01 p.m. UTC | #4
----- Original Message -----
> From: "Julien Grall" <julien@xen.org>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "Shawn Anastasio" <sanastasio@raptorengineering.com>, "xen-devel" <xen-devel@lists.xenproject.org>, "Jan Beulich"
> <jbeulich@suse.com>, "Daniel P. Smith" <dpsmith@apertussolutions.com>, "Stefano Stabellini" <sstabellini@kernel.org>,
> "Bertrand Marquis" <bertrand.marquis@arm.com>, "Michal Orzel" <michal.orzel@amd.com>, "Oleksii"
> <oleksii.kurochko@gmail.com>
> Sent: Friday, December 1, 2023 4:35:55 PM
> Subject: Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator

> Hi,
> 
> On 01/12/2023 22:10, tpearson@raptorengineering.com wrote:
>>> (+ Arm and RISC-V folks)
>>>
>>> Hi Shawn,
>>>
>>> On 01/12/2023 20:59, Shawn Anastasio wrote:
>>>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>>>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>>>> multiboot) were excluded, reducing the overall footprint of code that
>>>> was copied.
>>>
>>> I expect RISC-V to want similar code. I am not really thrilled in the
>>> idea of having 3 similar copy of the parsing. So can we extract the
>>> common bits (or harmonize it) so it can be shared?
>>>
>>> Maybe Oleksii has already a version doing that.
>> 
>> Just my $0.02, but wouldn't it make more sense to have the RISC-V port
>> handle the deduplication, seeing as the POWER support came first here?  We
>> don't know if/when the RISC-V port will be ready for submission, so I'm
>> not sure why we should be on the hook for this particular work.
> 
> That would have been a valid point if you were writing a brand new
> implementation. But this was *copied* from Arm.
> 
> Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to
> have:
>    - As well copied some code from arm/setup.c
>    - Re-order some statement (not clear why)
>    - Remove some code which you say are Arm specific. Yet some is part
> of the Device-Tree spec and I would expect to be used in the future.
> 
> So my question here stands. Why are you mainly copying verbatimly the
> Arm code rather than consolidating in one place?

That's fair, with the future RISC-V port removed from the discussion and good reasons still being put forth it makes more sense to deduplicate now.  Thank you for clarifying the objection! :)
Julien Grall Dec. 1, 2023, 11:23 p.m. UTC | #5
Hi,

On 01/12/2023 00:01, Timothy Pearson wrote:
> 
> 
> ----- Original Message -----
>> From: "Julien Grall" <julien@xen.org>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "Shawn Anastasio" <sanastasio@raptorengineering.com>, "xen-devel" <xen-devel@lists.xenproject.org>, "Jan Beulich"
>> <jbeulich@suse.com>, "Daniel P. Smith" <dpsmith@apertussolutions.com>, "Stefano Stabellini" <sstabellini@kernel.org>,
>> "Bertrand Marquis" <bertrand.marquis@arm.com>, "Michal Orzel" <michal.orzel@amd.com>, "Oleksii"
>> <oleksii.kurochko@gmail.com>
>> Sent: Friday, December 1, 2023 4:35:55 PM
>> Subject: Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator
> 
>> Hi,
>>
>> On 01/12/2023 22:10, tpearson@raptorengineering.com wrote:
>>>> (+ Arm and RISC-V folks)
>>>>
>>>> Hi Shawn,
>>>>
>>>> On 01/12/2023 20:59, Shawn Anastasio wrote:
>>>>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>>>>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>>>>> multiboot) were excluded, reducing the overall footprint of code that
>>>>> was copied.
>>>>
>>>> I expect RISC-V to want similar code. I am not really thrilled in the
>>>> idea of having 3 similar copy of the parsing. So can we extract the
>>>> common bits (or harmonize it) so it can be shared?
>>>>
>>>> Maybe Oleksii has already a version doing that.
>>>
>>> Just my $0.02, but wouldn't it make more sense to have the RISC-V port
>>> handle the deduplication, seeing as the POWER support came first here?  We
>>> don't know if/when the RISC-V port will be ready for submission, so I'm
>>> not sure why we should be on the hook for this particular work.
>>
>> That would have been a valid point if you were writing a brand new
>> implementation. But this was *copied* from Arm.
>>
>> Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to
>> have:
>>     - As well copied some code from arm/setup.c
>>     - Re-order some statement (not clear why)
>>     - Remove some code which you say are Arm specific. Yet some is part
>> of the Device-Tree spec and I would expect to be used in the future.
>>
>> So my question here stands. Why are you mainly copying verbatimly the
>> Arm code rather than consolidating in one place?
> 
> That's fair, with the future RISC-V port removed from the discussion and good reasons still being put forth it makes more sense to deduplicate now.  Thank you for clarifying the objection! :)

I have had a brief look at the differences. I moved some of the 
functions to bootfdt.c in order to match PPC. Below the diff after that.

Leaving aside some of the clean-up, it sounds like you:
* removed BOOTMOD_{RAMDISK, XSM...}. So how do you plan to pass XSM blob 
and ramdisk in the future?
* split long prink-messages. For Xen, we keep them in one line even if 
it is over 80 characters to facilite grepping.
* Remove device_tree_node_is_available(), I believe you still need it 
because the property is not Arm specific.
* Remove process_multiboot(), how do you plan to handle dom0less domain 
in the future?
* Likewise for xen,static-mem and boot_fdt_cmdline()?
* fdt_get_mem_rsv_paddr(), this is part of the DT is used to reserve 
memory. This was superseed to /reserved-memory, but I wonder how 
widespread this is on PPC?
* You are removing the sorting of the memory bank. We had to do the 
sorting on Arm because some DT didn't have the banks sorted and this 
helped will the logic memory subystem logic. I can understand if you 
don't need it, but it would not hurt.
* If am not mistaken you are adding the Xen module as BOOTMOD_KERNEL, 
however this is meant to be used for the domain kernel. Xen should be 
BOOTMOD_XEN.

Overall, it feels to me that nearly (if not all) bootfdt.c can be moved 
to common (maybe under a new directory common/device-tree?) and #ifdef 
bits that are dom0less specific (we now have a Kconfig for that). You 
can do the clean-up as well, but they would belong to separate patches.

I hope that helps.

Cheers,

--- ../arm/bootfdt.c	2023-12-01 23:02:43.501050219 +0000
+++ bootfdt.c	2023-12-01 22:26:47.719734253 +0000
@@ -1,9 +1,12 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0-only */
  /*
- * Early Device Tree
+ * Early Device Tree and boot info bookkeeping.
+ * Derived from arch/arm/bootfdt.c and setup.c.
   *
   * Copyright (C) 2012-2014 Citrix Systems, Inc.
+ * Copyright (C) Raptor Engineering LLC
   */
+
  #include <xen/types.h>
  #include <xen/lib.h>
  #include <xen/kernel.h>
@@ -15,7 +18,8 @@
  #include <xen/sort.h>
  #include <xsm/xsm.h>
  #include <asm/setup.h>
-#include <asm/static-shmem.h>
+
+struct bootinfo __initdata bootinfo;

  struct bootmodule __init *add_boot_module(bootmodule_kind kind,
                                            paddr_t start, paddr_t size,
@@ -62,10 +66,6 @@
      case BOOTMOD_XEN:     return "Xen";
      case BOOTMOD_FDT:     return "Device Tree";
      case BOOTMOD_KERNEL:  return "Kernel";
-    case BOOTMOD_RAMDISK: return "Ramdisk";
-    case BOOTMOD_XSM:     return "XSM";
-    case BOOTMOD_GUEST_DTB:     return "DTB";
-    case BOOTMOD_UNKNOWN: return "Unknown";
      default: BUG();
      }
  }
@@ -91,8 +91,9 @@
              continue;
          else
          {
-            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping 
with mod[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n",
-                   region_start, region_end, i, mod_start, mod_end);
+            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+                   " mod[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", 
region_start,
+                   region_end, i, mod_start, mod_end);
              return true;
          }
      }
@@ -121,8 +122,9 @@
              continue;
          else
          {
-            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping 
with bank[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n",
-                   region_start, region_end, i, bank_start, bank_end);
+            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+                   " bank[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", 
region_start,
+                   region_end, i, bank_start, bank_end);
              return true;
          }
      }
@@ -149,30 +151,6 @@
                                     region_start, region_size) )
          return true;

-#ifdef CONFIG_ACPI
-    /* Check if input region is overlapping with ACPI 
EfiACPIReclaimMemory */
-    if ( meminfo_overlap_check(&bootinfo.acpi, region_start, region_size) )
-        return true;
-#endif
-
-    return false;
-}
-
-static bool __init device_tree_node_is_available(const void *fdt, int node)
-{
-    const char *status;
-    int len;
-
-    status = fdt_getprop(fdt, node, "status", &len);
-    if ( !status )
-        return true;
-
-    if ( len > 0 )
-    {
-        if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
-            return true;
-    }
-
      return false;
  }

@@ -185,33 +163,14 @@
      name = fdt_get_name(fdt, node, NULL);
      match_len = strlen(match);

-    /* Match both "match" and "match@..." patterns but not
-       "match-foo". */
+    /*
+     * Match both "match" and "match@..." patterns but not
+     * "match-foo".
+     */
      return strncmp(name, match, match_len) == 0
          && (name[match_len] == '@' || name[match_len] == '\0');
  }

-static bool __init device_tree_node_compatible(const void *fdt, int node,
-                                               const char *match)
-{
-    int len, l;
-    const void *prop;
-
-    prop = fdt_getprop(fdt, node, "compatible", &len);
-    if ( prop == NULL )
-        return false;
-
-    while ( len > 0 ) {
-        if ( !dt_compat_cmp(prop, match) )
-            return true;
-        l = strlen(prop) + 1;
-        prop += l;
-        len -= l;
-    }
-
-    return false;
-}
-
  void __init device_tree_get_reg(const __be32 **cell, uint32_t 
address_cells,
                                  uint32_t size_cells, paddr_t *start,
                                  paddr_t *size)
@@ -247,19 +206,16 @@

  static int __init device_tree_get_meminfo(const void *fdt, int node,
                                            const char *prop_name,
-                                          u32 address_cells, u32 
size_cells,
+                                          uint32_t address_cells, 
uint32_t size_cells,
                                            void *data, enum 
membank_type type)
  {
      const struct fdt_property *prop;
      unsigned int i, banks;
      const __be32 *cell;
-    u32 reg_cells = address_cells + size_cells;
+    uint32_t reg_cells = address_cells + size_cells;
      paddr_t start, size;
      struct meminfo *mem = data;

-    if ( !device_tree_node_is_available(fdt, node) )
-        return 0;
-
      if ( address_cells < 1 || size_cells < 1 )
      {
          printk("fdt: property `%s': invalid #address-cells or 
#size-cells",
@@ -272,7 +228,7 @@
          return -ENOENT;

      cell = (const __be32 *)prop->data;
-    banks = fdt32_to_cpu(prop->len) / (reg_cells * sizeof (u32));
+    banks = fdt32_to_cpu(prop->len) / (reg_cells * sizeof(uint32_t));

      for ( i = 0; i < banks && mem->nr_banks < NR_MEM_BANKS; i++ )
      {
@@ -298,13 +254,13 @@
      return 0;
  }

-u32 __init device_tree_get_u32(const void *fdt, int node,
-                               const char *prop_name, u32 dflt)
+uint32_t __init device_tree_get_uint32_t(const void *fdt, int node,
+                               const char *prop_name, uint32_t dflt)
  {
      const struct fdt_property *prop;

      prop = fdt_get_property(fdt, node, prop_name, NULL);
-    if ( !prop || prop->len < sizeof(u32) )
+    if ( !prop || prop->len < sizeof(uint32_t) )
          return dflt;

      return fdt32_to_cpu(*(uint32_t*)prop->data);
@@ -332,13 +288,13 @@
       */
      int depth = 0;
      const int first_node = node;
-    u32 address_cells[DEVICE_TREE_MAX_DEPTH];
-    u32 size_cells[DEVICE_TREE_MAX_DEPTH];
+    uint32_t address_cells[DEVICE_TREE_MAX_DEPTH];
+    uint32_t size_cells[DEVICE_TREE_MAX_DEPTH];
      int ret;

      do {
          const char *name = fdt_get_name(fdt, node, NULL);
-        u32 as, ss;
+        uint32_t as, ss;

          if ( depth >= DEVICE_TREE_MAX_DEPTH )
          {
@@ -350,9 +306,9 @@
          as = depth > 0 ? address_cells[depth-1] : 
DT_ROOT_NODE_ADDR_CELLS_DEFAULT;
          ss = depth > 0 ? size_cells[depth-1] : 
DT_ROOT_NODE_SIZE_CELLS_DEFAULT;

-        address_cells[depth] = device_tree_get_u32(fdt, node,
+        address_cells[depth] = device_tree_get_uint32_t(fdt, node,
                                                     "#address-cells", as);
-        size_cells[depth] = device_tree_get_u32(fdt, node,
+        size_cells[depth] = device_tree_get_uint32_t(fdt, node,
                                                  "#size-cells", ss);

          /* skip the first node */
@@ -371,7 +327,7 @@

  static int __init process_memory_node(const void *fdt, int node,
                                        const char *name, int depth,
-                                      u32 address_cells, u32 size_cells,
+                                      uint32_t address_cells, uint32_t 
size_cells,
                                        void *data)
  {
      return device_tree_get_meminfo(fdt, node, "reg", address_cells, 
size_cells,
@@ -380,13 +336,16 @@

  static int __init process_reserved_memory_node(const void *fdt, int node,
                                                 const char *name, int 
depth,
-                                               u32 address_cells,
-                                               u32 size_cells,
+                                               uint32_t address_cells,
+                                               uint32_t size_cells,
                                                 void *data)
  {
-    int rc = process_memory_node(fdt, node, name, depth, address_cells,
+    int rc;
+
+    rc = process_memory_node(fdt, node, name, depth, address_cells,
                                   size_cells, data);

+
      if ( rc == -ENOSPC )
          panic("Max number of supported reserved-memory regions 
reached.\n");
      else if ( rc != -ENOENT )
@@ -396,124 +355,28 @@

  static int __init process_reserved_memory(const void *fdt, int node,
                                            const char *name, int depth,
-                                          u32 address_cells, u32 
size_cells)
+                                          uint32_t address_cells, 
uint32_t size_cells)
  {
      return device_tree_for_each_node(fdt, node,
                                       process_reserved_memory_node,
                                       &bootinfo.reserved_mem);
  }

-static void __init process_multiboot_node(const void *fdt, int node,
-                                          const char *name,
-                                          u32 address_cells, u32 
size_cells)
-{
-    static int __initdata kind_guess = 0;
-    const struct fdt_property *prop;
-    const __be32 *cell;
-    bootmodule_kind kind;
-    paddr_t start, size;
-    int len;
-    /* sizeof("/chosen/") + DT_MAX_NAME + '/' + DT_MAX_NAME + '/0' => 92 */
-    char path[92];
-    int parent_node, ret;
-    bool domU;
-
-    parent_node = fdt_parent_offset(fdt, node);
-    ASSERT(parent_node >= 0);
-
-    /* Check that the node is under "/chosen" (first 7 chars of path) */
-    ret = fdt_get_path(fdt, node, path, sizeof (path));
-    if ( ret != 0 || strncmp(path, "/chosen", 7) )
-        return;
-
-    prop = fdt_get_property(fdt, node, "reg", &len);
-    if ( !prop )
-        panic("node %s missing `reg' property\n", name);
-
-    if ( len < dt_cells_to_size(address_cells + size_cells) )
-        panic("fdt: node `%s': `reg` property length is too short\n",
-                    name);
-
-    cell = (const __be32 *)prop->data;
-    device_tree_get_reg(&cell, address_cells, size_cells, &start, &size);
-
-    if ( fdt_node_check_compatible(fdt, node, "xen,linux-zimage") == 0 ||
-         fdt_node_check_compatible(fdt, node, "multiboot,kernel") == 0 )
-        kind = BOOTMOD_KERNEL;
-    else if ( fdt_node_check_compatible(fdt, node, "xen,linux-initrd") 
== 0 ||
-              fdt_node_check_compatible(fdt, node, "multiboot,ramdisk") 
== 0 )
-        kind = BOOTMOD_RAMDISK;
-    else if ( fdt_node_check_compatible(fdt, node, "xen,xsm-policy") == 0 )
-        kind = BOOTMOD_XSM;
-    else if ( fdt_node_check_compatible(fdt, node, 
"multiboot,device-tree") == 0 )
-        kind = BOOTMOD_GUEST_DTB;
-    else
-        kind = BOOTMOD_UNKNOWN;
-
-    /**
-     * Guess the kind of these first two unknowns respectively:
-     * (1) The first unknown must be kernel.
-     * (2) Detect the XSM Magic from the 2nd unknown:
-     *     a. If it's XSM, set the kind as XSM, and that also means we
-     *     won't load ramdisk;
-     *     b. if it's not XSM, set the kind as ramdisk.
-     *     So if user want to load ramdisk, it must be the 2nd unknown.
-     * We also detect the XSM Magic for the following unknowns,
-     * then set its kind according to the return value of has_xsm_magic.
-     */
-    if ( kind == BOOTMOD_UNKNOWN )
-    {
-        switch ( kind_guess++ )
-        {
-        case 0: kind = BOOTMOD_KERNEL; break;
-        case 1: kind = BOOTMOD_RAMDISK; break;
-        default: break;
-        }
-        if ( kind_guess > 1 && has_xsm_magic(start) )
-            kind = BOOTMOD_XSM;
-    }
-
-    domU = fdt_node_check_compatible(fdt, parent_node, "xen,domain") == 0;
-    add_boot_module(kind, start, size, domU);
-
-    prop = fdt_get_property(fdt, node, "bootargs", &len);
-    if ( !prop )
-        return;
-    add_boot_cmdline(fdt_get_name(fdt, parent_node, &len), prop->data,
-                     kind, start, domU);
-}
-
  static int __init process_chosen_node(const void *fdt, int node,
                                        const char *name,
-                                      u32 address_cells, u32 size_cells)
+                                      uint32_t address_cells, uint32_t 
size_cells)
  {
      const struct fdt_property *prop;
      paddr_t start, end;
      int len;

-    if ( fdt_get_property(fdt, node, "xen,static-heap", NULL) )
-    {
-        int rc;
-
-        printk("Checking for static heap in /chosen\n");
-
-        rc = device_tree_get_meminfo(fdt, node, "xen,static-heap",
-                                     address_cells, size_cells,
-                                     &bootinfo.reserved_mem,
-                                     MEMBANK_STATIC_HEAP);
-        if ( rc )
-            return rc;
-
-        bootinfo.static_heap = true;
-    }
-
      printk("Checking for initrd in /chosen\n");

      prop = fdt_get_property(fdt, node, "linux,initrd-start", &len);
      if ( !prop )
          /* No initrd present. */
          return 0;
-    if ( len != sizeof(u32) && len != sizeof(u64) )
+    if ( len != sizeof(uint32_t) && len != sizeof(uint64_t) )
      {
          printk("linux,initrd-start property has invalid length %d\n", 
len);
          return -EINVAL;
@@ -526,7 +389,7 @@
          printk("linux,initrd-end not present but -start was\n");
          return -EINVAL;
      }
-    if ( len != sizeof(u32) && len != sizeof(u64) )
+    if ( len != sizeof(uint32_t) && len != sizeof(uint64_t) )
      {
          printk("linux,initrd-end property has invalid length %d\n", len);
          return -EINVAL;
@@ -547,51 +410,21 @@
      return 0;
  }

-static int __init process_domain_node(const void *fdt, int node,
-                                      const char *name,
-                                      u32 address_cells, u32 size_cells)
-{
-    const struct fdt_property *prop;
-
-    printk("Checking for \"xen,static-mem\" in domain node\n");
-
-    prop = fdt_get_property(fdt, node, "xen,static-mem", NULL);
-    if ( !prop )
-        /* No "xen,static-mem" present. */
-        return 0;
-
-    return device_tree_get_meminfo(fdt, node, "xen,static-mem", 
address_cells,
-                                   size_cells, &bootinfo.reserved_mem,
-                                   MEMBANK_STATIC_DOMAIN);
-}
-
  static int __init early_scan_node(const void *fdt,
                                    int node, const char *name, int depth,
-                                  u32 address_cells, u32 size_cells,
+                                  uint32_t address_cells, uint32_t 
size_cells,
                                    void *data)
  {
      int rc = 0;

-    /*
-     * If Xen has been booted via UEFI, the memory banks are
-     * populated. So we should skip the parsing.
-     */
-    if ( !efi_enabled(EFI_BOOT) &&
-         device_tree_node_matches(fdt, node, "memory") )
+    if( device_tree_node_matches(fdt, node, "memory") )
          rc = process_memory_node(fdt, node, name, depth,
                                   address_cells, size_cells, 
&bootinfo.mem);
      else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
          rc = process_reserved_memory(fdt, node, name, depth,
                                       address_cells, size_cells);
-    else if ( depth <= 3 && (device_tree_node_compatible(fdt, node, 
"xen,multiboot-module" ) ||
-              device_tree_node_compatible(fdt, node, "multiboot,module" )))
-        process_multiboot_node(fdt, node, name, address_cells, size_cells);
      else if ( depth == 1 && device_tree_node_matches(fdt, node, 
"chosen") )
          rc = process_chosen_node(fdt, node, name, address_cells, 
size_cells);
-    else if ( depth == 2 && device_tree_node_compatible(fdt, node, 
"xen,domain") )
-        rc = process_domain_node(fdt, node, name, address_cells, 
size_cells);
-    else if ( depth <= 3 && device_tree_node_compatible(fdt, node, 
"xen,domain-shared-memory-v1") )
-        rc = process_shm_node(fdt, node, address_cells, size_cells);

      if ( rc < 0 )
          printk("fdt: node `%s': parsing failed\n", name);
@@ -604,7 +437,7 @@
      struct meminfo *mem_resv = &bootinfo.reserved_mem;
      struct bootmodules *mods = &bootinfo.modules;
      struct bootcmdlines *cmds = &bootinfo.cmdlines;
-    unsigned int i, j, nr_rsvd;
+    unsigned int i, j;

      for ( i = 0; i < mi->nr_banks; i++ )
          printk("RAM: %"PRIpaddr" - %"PRIpaddr"\n",
@@ -618,21 +451,9 @@
                  mods->module[i].start + mods->module[i].size,
                  boot_module_kind_as_string(mods->module[i].kind));

-    nr_rsvd = fdt_num_mem_rsv(device_tree_flattened);
-    for ( i = 0; i < nr_rsvd; i++ )
-    {
-        paddr_t s, e;
-
-        if ( fdt_get_mem_rsv_paddr(device_tree_flattened, i, &s, &e) < 0 )
-            continue;
-
-        /* fdt_get_mem_rsv_paddr returns length */
-        e += s;
-        printk(" RESVD[%u]: %"PRIpaddr" - %"PRIpaddr"\n", i, s, e);
-    }
      for ( j = 0; j < mem_resv->nr_banks; j++, i++ )
      {
-        printk(" RESVD[%u]: %"PRIpaddr" - %"PRIpaddr"\n", i,
+        printk(" RESVD_[%u]: %"PRIpaddr" - %"PRIpaddr"\n", i,
                 mem_resv->bank[j].start,
                 mem_resv->bank[j].start + mem_resv->bank[j].size - 1);
      }
@@ -644,89 +465,43 @@
      printk("\n");
  }

-/* This function assumes that memory regions are not overlapped */
-static int __init cmp_memory_node(const void *key, const void *elem)
-{
-    const struct membank *handler0 = key;
-    const struct membank *handler1 = elem;
-
-    if ( handler0->start < handler1->start )
-        return -1;
-
-    if ( handler0->start >= (handler1->start + handler1->size) )
-        return 1;
-
-    return 0;
-}
-
-static void __init swap_memory_node(void *_a, void *_b, size_t size)
-{
-    struct membank *a = _a, *b = _b;
-
-    SWAP(*a, *b);
-}
-
  /**
- * boot_fdt_info - initialize bootinfo from a DTB
+ * boot_fdt_init - initialize bootinfo from a DTB
   * @fdt: flattened device tree binary
+ * @paddr: physical address of device tree binary
   *
   * Returns the size of the DTB.
   */
-size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
+size_t __init boot_fdt_init(const void *fdt, paddr_t paddr)
  {
      int ret;
+    paddr_t xen_start, xen_end;

      ret = fdt_check_header(fdt);
      if ( ret < 0 )
          panic("No valid device tree\n");

-    add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
-
      device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);

      /*
-     * On Arm64 setup_directmap_mappings() expects to be called with 
the lowest
-     * bank in memory first. There is no requirement that the DT will 
provide
-     * the banks sorted in ascending order. So sort them through.
+     * The device tree passed to us may have been allocated by skiboot, 
in which
+     * case it will exist within a reserved region and this call will 
fail. This
+     * is fine, however, since either way the allocator will know not 
to step on
+     * the device tree.
       */
-    sort(bootinfo.mem.bank, bootinfo.mem.nr_banks, sizeof(struct membank),
-         cmp_memory_node, swap_memory_node);
+    add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
+
+    /*
+     * Xen relocates itself at the ppc64 entrypoint, so we need to 
manually mark
+     * the kernel module.
+     */
+    xen_start = __pa(_start);
+    xen_end = PAGE_ALIGN(__pa(_end));
+    if ( !add_boot_module(BOOTMOD_KERNEL, xen_start, xen_end, false) )
+        panic("Xen overlaps reserved memory! %016lx - %016lx\n", xen_start,
+              xen_end);

      early_print_info();

      return fdt_totalsize(fdt);
  }
-
-const __init char *boot_fdt_cmdline(const void *fdt)
-{
-    int node;
-    const struct fdt_property *prop;
-
-    node = fdt_path_offset(fdt, "/chosen");
-    if ( node < 0 )
-        return NULL;
-
-    prop = fdt_get_property(fdt, node, "xen,xen-bootargs", NULL);
-    if ( prop == NULL )
-    {
-        struct bootcmdline *dom0_cmdline =
-            boot_cmdline_find_by_kind(BOOTMOD_KERNEL);
-
-        if (fdt_get_property(fdt, node, "xen,dom0-bootargs", NULL) ||
-            ( dom0_cmdline && dom0_cmdline->cmdline[0] ) )
-            prop = fdt_get_property(fdt, node, "bootargs", NULL);
-    }
-    if ( prop == NULL )
-        return NULL;
-
-    return prop->data;
-}
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
- * End:
- */
Oleksii Kurochko Dec. 4, 2023, 9:47 a.m. UTC | #6
Hi Julien and Shawn,

On Fri, 2023-12-01 at 20:56 +0000, Julien Grall wrote:
> (+ Arm and RISC-V folks)
> 
> Hi Shawn,
> 
> On 01/12/2023 20:59, Shawn Anastasio wrote:
> > Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early
> > boot
> > allocator. Routines for parsing arm-specific devicetree nodes (e.g.
> > multiboot) were excluded, reducing the overall footprint of code
> > that
> > was copied.
> 
> I expect RISC-V to want similar code. I am not really thrilled in the
> idea of having 3 similar copy of the parsing. So can we extract the 
> common bits (or harmonize it) so it can be shared?
> 
> Maybe Oleksii has already a version doing that.
On RISC-V side I introduced the common bootfdt but probably it should
be aligned with the latest version ( my plan was to do that before
sending a patch to upstream ) of ARM's bootfdt.c.

https://gitlab.com/xen-project/people/olkur/xen/-/commit/ad2c8766a7d0886df84f6c60823275816c2115f5

I can re-work it this week and send it to the ML but I'll be happy if
Shawn can do it or his version as common so I'll be able to reuse it
soon.

~ Oleksii
Daniel P. Smith Dec. 7, 2023, 2:01 a.m. UTC | #7
On 12/1/23 15:56, Julien Grall wrote:
> (+ Arm and RISC-V folks)
> 
> Hi Shawn,
> 
> On 01/12/2023 20:59, Shawn Anastasio wrote:
>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>> multiboot) were excluded, reducing the overall footprint of code that
>> was copied.
> 
> I expect RISC-V to want similar code. I am not really thrilled in the 
> idea of having 3 similar copy of the parsing. So can we extract the 
> common bits (or harmonize it) so it can be shared?

That is actually 4, Hyperlaunch already did a common version for our usage.

> Maybe Oleksii has already a version doing that.
> 
> Cheers,
> 



V/r,
Daniel P. Smith
Apertus Solutions, LLC
Oleksii Kurochko Dec. 7, 2023, 9:07 a.m. UTC | #8
On Wed, 2023-12-06 at 21:01 -0500, Daniel P. Smith wrote:
> On 12/1/23 15:56, Julien Grall wrote:
> > (+ Arm and RISC-V folks)
> > 
> > Hi Shawn,
> > 
> > On 01/12/2023 20:59, Shawn Anastasio wrote:
> > > Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early
> > > boot
> > > allocator. Routines for parsing arm-specific devicetree nodes
> > > (e.g.
> > > multiboot) were excluded, reducing the overall footprint of code
> > > that
> > > was copied.
> > 
> > I expect RISC-V to want similar code. I am not really thrilled in
> > the 
> > idea of having 3 similar copy of the parsing. So can we extract the
> > common bits (or harmonize it) so it can be shared?
> 
> That is actually 4, Hyperlaunch already did a common version for our
> usage.
Sounds great. I'll look at and switch RISC-V to it. Thanks.
> 
> > Maybe Oleksii has already a version doing that.
> > 
> > Cheers,
> > 
> 
> 
> 
> V/r,
> Daniel P. Smith
> Apertus Solutions, LLC
>
Shawn Anastasio Dec. 15, 2023, 2:50 a.m. UTC | #9
Hi Julien,

Thank you for the feedback. Most of your points will be addressed by
following your suggestion of moving ARM's bootfdt.c to common, but I
wanted to respond with a few points of clarification.

On 12/1/23 5:23 PM, Julien Grall wrote:
> * fdt_get_mem_rsv_paddr(), this is part of the DT is used to reserve
> memory. This was superseed to /reserved-memory, but I wonder how
> widespread this is on PPC?

As far as I can tell, the DT reserve memory map is not used on
PPC/PowerNV. This information is instead communicated through
`reserved-memory` nodes in the DT itself, which the existing code
handles.

> * If am not mistaken you are adding the Xen module as BOOTMOD_KERNEL,
> however this is meant to be used for the domain kernel. Xen should be
> BOOTMOD_XEN.

Thank you for pointing this out -- BOOTMOD_XEN is indeed the correct
choice here.

> I hope that helps.
> 
> Cheers,
>

Thanks,
Shawn
Julien Grall Dec. 15, 2023, 6:52 p.m. UTC | #10
Hi Shawn,

On 15/12/2023 02:50, Shawn Anastasio wrote:
> Hi Julien,
> 
> Thank you for the feedback. Most of your points will be addressed by
> following your suggestion of moving ARM's bootfdt.c to common, but I
> wanted to respond with a few points of clarification.
> 
> On 12/1/23 5:23 PM, Julien Grall wrote:
>> * fdt_get_mem_rsv_paddr(), this is part of the DT is used to reserve
>> memory. This was superseed to /reserved-memory, but I wonder how
>> widespread this is on PPC?
> 
> As far as I can tell, the DT reserve memory map is not used on
> PPC/PowerNV. This information is instead communicated through
> `reserved-memory` nodes in the DT itself, which the existing code
> handles.

On Arm, there is a mix between using /memreserve (found by Xen via 
fdt_get_mem_rsv_paddr() and reserved-memory.

Looking at Linux, it seems that it will try to parse /memreserve in PPC 
and I have found a few DT using it:

42sh> ack memreserve
boot/dts/currituck.dts
13:/memreserve/ 0x01f00000 0x00100000;	// spin table

boot/dts/akebono.dts
14:/memreserve/ 0x01f00000 0x00100000;	// spin table

boot/dts/iss4xx-mpic.dts
17:/memreserve/ 0x01f00000 0x00100000;

boot/dts/mpc836x_mds.dts
10:/memreserve/	00000000 1000000;

boot/dts/wii.dts
17: * contiguous RAM range and will BUG() if the memreserve is outside
20:/*/memreserve/ 0x10000000 0x0004000;*/	/* DSP RAM */

I guess we are not targetting any of those hardwares. Even if this is 
the case, I don't think we can simply ignore the node. If you don't want 
to handle them, then I would at least suggest to add a check in Xen to 
confirm the /memreserve/ is empty.

Better to catch such DT early rather than debugging a memory corruption 
later on :).

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/ppc/Makefile b/xen/arch/ppc/Makefile
index 71feb5e2c4..8a2a809c70 100644
--- a/xen/arch/ppc/Makefile
+++ b/xen/arch/ppc/Makefile
@@ -1,5 +1,6 @@ 
 obj-$(CONFIG_PPC64) += ppc64/
 
+obj-y += bootfdt.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.init.o
 obj-y += mm-radix.o
 obj-y += opal.o
diff --git a/xen/arch/ppc/bootfdt.c b/xen/arch/ppc/bootfdt.c
new file mode 100644
index 0000000000..791e1ca61f
--- /dev/null
+++ b/xen/arch/ppc/bootfdt.c
@@ -0,0 +1,507 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Early Device Tree and boot info bookkeeping.
+ * Derived from arch/arm/bootfdt.c and setup.c.
+ *
+ * Copyright (C) 2012-2014 Citrix Systems, Inc.
+ * Copyright (C) Raptor Engineering LLC
+ */
+
+#include <xen/types.h>
+#include <xen/lib.h>
+#include <xen/kernel.h>
+#include <xen/init.h>
+#include <xen/efi.h>
+#include <xen/device_tree.h>
+#include <xen/lib.h>
+#include <xen/libfdt/libfdt-xen.h>
+#include <xen/sort.h>
+#include <xsm/xsm.h>
+#include <asm/setup.h>
+
+struct bootinfo __initdata bootinfo;
+
+struct bootmodule __init *add_boot_module(bootmodule_kind kind,
+                                          paddr_t start, paddr_t size,
+                                          bool domU)
+{
+    struct bootmodules *mods = &bootinfo.modules;
+    struct bootmodule *mod;
+    unsigned int i;
+
+    if ( mods->nr_mods == MAX_MODULES )
+    {
+        printk("Ignoring %s boot module at %"PRIpaddr"-%"PRIpaddr" (too many)\n",
+               boot_module_kind_as_string(kind), start, start + size);
+        return NULL;
+    }
+
+    if ( check_reserved_regions_overlap(start, size) )
+        return NULL;
+
+    for ( i = 0 ; i < mods->nr_mods ; i++ )
+    {
+        mod = &mods->module[i];
+        if ( mod->kind == kind && mod->start == start )
+        {
+            if ( !domU )
+                mod->domU = false;
+            return mod;
+        }
+    }
+
+    mod = &mods->module[mods->nr_mods++];
+    mod->kind = kind;
+    mod->start = start;
+    mod->size = size;
+    mod->domU = domU;
+
+    return mod;
+}
+
+const char * __init boot_module_kind_as_string(bootmodule_kind kind)
+{
+    switch ( kind )
+    {
+    case BOOTMOD_XEN:     return "Xen";
+    case BOOTMOD_FDT:     return "Device Tree";
+    case BOOTMOD_KERNEL:  return "Kernel";
+    default: BUG();
+    }
+}
+
+/*
+ * TODO: '*_end' could be 0 if the module/region is at the end of the physical
+ * address space. This is for now not handled as it requires more rework.
+ */
+static bool __init bootmodules_overlap_check(struct bootmodules *bootmodules,
+                                             paddr_t region_start,
+                                             paddr_t region_size)
+{
+    paddr_t mod_start = INVALID_PADDR, mod_end = 0;
+    paddr_t region_end = region_start + region_size;
+    unsigned int i, mod_num = bootmodules->nr_mods;
+
+    for ( i = 0; i < mod_num; i++ )
+    {
+        mod_start = bootmodules->module[i].start;
+        mod_end = mod_start + bootmodules->module[i].size;
+
+        if ( region_end <= mod_start || region_start >= mod_end )
+            continue;
+        else
+        {
+            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+                   " mod[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", region_start,
+                   region_end, i, mod_start, mod_end);
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * TODO: '*_end' could be 0 if the bank/region is at the end of the physical
+ * address space. This is for now not handled as it requires more rework.
+ */
+static bool __init meminfo_overlap_check(struct meminfo *meminfo,
+                                         paddr_t region_start,
+                                         paddr_t region_size)
+{
+    paddr_t bank_start = INVALID_PADDR, bank_end = 0;
+    paddr_t region_end = region_start + region_size;
+    unsigned int i, bank_num = meminfo->nr_banks;
+
+    for ( i = 0; i < bank_num; i++ )
+    {
+        bank_start = meminfo->bank[i].start;
+        bank_end = bank_start + meminfo->bank[i].size;
+
+        if ( region_end <= bank_start || region_start >= bank_end )
+            continue;
+        else
+        {
+            printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+                   " bank[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", region_start,
+                   region_end, i, bank_start, bank_end);
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * Given an input physical address range, check if this range is overlapping
+ * with the existing reserved memory regions defined in bootinfo.
+ * Return true if the input physical address range is overlapping with any
+ * existing reserved memory regions, otherwise false.
+ */
+bool __init check_reserved_regions_overlap(paddr_t region_start,
+                                           paddr_t region_size)
+{
+    /* Check if input region is overlapping with bootinfo.reserved_mem banks */
+    if ( meminfo_overlap_check(&bootinfo.reserved_mem,
+                               region_start, region_size) )
+        return true;
+
+    /* Check if input region is overlapping with bootmodules */
+    if ( bootmodules_overlap_check(&bootinfo.modules,
+                                   region_start, region_size) )
+        return true;
+
+    return false;
+}
+
+static bool __init device_tree_node_matches(const void *fdt, int node,
+                                            const char *match)
+{
+    const char *name;
+    size_t match_len;
+
+    name = fdt_get_name(fdt, node, NULL);
+    match_len = strlen(match);
+
+    /*
+     * Match both "match" and "match@..." patterns but not
+     * "match-foo".
+     */
+    return strncmp(name, match, match_len) == 0
+        && (name[match_len] == '@' || name[match_len] == '\0');
+}
+
+void __init device_tree_get_reg(const __be32 **cell, uint32_t address_cells,
+                                uint32_t size_cells, paddr_t *start,
+                                paddr_t *size)
+{
+    uint64_t dt_start, dt_size;
+
+    /*
+     * dt_next_cell will return uint64_t whereas paddr_t may not be 64-bit.
+     * Thus, there is an implicit cast from uint64_t to paddr_t.
+     */
+    dt_start = dt_next_cell(address_cells, cell);
+    dt_size = dt_next_cell(size_cells, cell);
+
+    if ( dt_start != (paddr_t)dt_start )
+    {
+        printk("Physical address greater than max width supported\n");
+        WARN();
+    }
+
+    if ( dt_size != (paddr_t)dt_size )
+    {
+        printk("Physical size greater than max width supported\n");
+        WARN();
+    }
+
+    /*
+     * Xen will truncate the address/size if it is greater than the maximum
+     * supported width and it will give an appropriate warning.
+     */
+    *start = dt_start;
+    *size = dt_size;
+}
+
+static int __init device_tree_get_meminfo(const void *fdt, int node,
+                                          const char *prop_name,
+                                          uint32_t address_cells, uint32_t size_cells,
+                                          void *data, enum membank_type type)
+{
+    const struct fdt_property *prop;
+    unsigned int i, banks;
+    const __be32 *cell;
+    uint32_t reg_cells = address_cells + size_cells;
+    paddr_t start, size;
+    struct meminfo *mem = data;
+
+    if ( address_cells < 1 || size_cells < 1 )
+    {
+        printk("fdt: property `%s': invalid #address-cells or #size-cells",
+               prop_name);
+        return -EINVAL;
+    }
+
+    prop = fdt_get_property(fdt, node, prop_name, NULL);
+    if ( !prop )
+        return -ENOENT;
+
+    cell = (const __be32 *)prop->data;
+    banks = fdt32_to_cpu(prop->len) / (reg_cells * sizeof(uint32_t));
+
+    for ( i = 0; i < banks && mem->nr_banks < NR_MEM_BANKS; i++ )
+    {
+        device_tree_get_reg(&cell, address_cells, size_cells, &start, &size);
+        if ( mem == &bootinfo.reserved_mem &&
+             check_reserved_regions_overlap(start, size) )
+            return -EINVAL;
+        /* Some DT may describe empty bank, ignore them */
+        if ( !size )
+            continue;
+        mem->bank[mem->nr_banks].start = start;
+        mem->bank[mem->nr_banks].size = size;
+        mem->bank[mem->nr_banks].type = type;
+        mem->nr_banks++;
+    }
+
+    if ( i < banks )
+    {
+        printk("Warning: Max number of supported memory regions reached.\n");
+        return -ENOSPC;
+    }
+
+    return 0;
+}
+
+uint32_t __init device_tree_get_uint32_t(const void *fdt, int node,
+                               const char *prop_name, uint32_t dflt)
+{
+    const struct fdt_property *prop;
+
+    prop = fdt_get_property(fdt, node, prop_name, NULL);
+    if ( !prop || prop->len < sizeof(uint32_t) )
+        return dflt;
+
+    return fdt32_to_cpu(*(uint32_t*)prop->data);
+}
+
+/**
+ * device_tree_for_each_node - iterate over all device tree sub-nodes
+ * @fdt: flat device tree.
+ * @node: parent node to start the search from
+ * @func: function to call for each sub-node.
+ * @data: data to pass to @func.
+ *
+ * Any nodes nested at DEVICE_TREE_MAX_DEPTH or deeper are ignored.
+ *
+ * Returns 0 if all nodes were iterated over successfully.  If @func
+ * returns a value different from 0, that value is returned immediately.
+ */
+int __init device_tree_for_each_node(const void *fdt, int node,
+                                     device_tree_node_func func,
+                                     void *data)
+{
+    /*
+     * We only care about relative depth increments, assume depth of
+     * node is 0 for simplicity.
+     */
+    int depth = 0;
+    const int first_node = node;
+    uint32_t address_cells[DEVICE_TREE_MAX_DEPTH];
+    uint32_t size_cells[DEVICE_TREE_MAX_DEPTH];
+    int ret;
+
+    do {
+        const char *name = fdt_get_name(fdt, node, NULL);
+        uint32_t as, ss;
+
+        if ( depth >= DEVICE_TREE_MAX_DEPTH )
+        {
+            printk("Warning: device tree node `%s' is nested too deep\n",
+                   name);
+            continue;
+        }
+
+        as = depth > 0 ? address_cells[depth-1] : DT_ROOT_NODE_ADDR_CELLS_DEFAULT;
+        ss = depth > 0 ? size_cells[depth-1] : DT_ROOT_NODE_SIZE_CELLS_DEFAULT;
+
+        address_cells[depth] = device_tree_get_uint32_t(fdt, node,
+                                                   "#address-cells", as);
+        size_cells[depth] = device_tree_get_uint32_t(fdt, node,
+                                                "#size-cells", ss);
+
+        /* skip the first node */
+        if ( node != first_node )
+        {
+            ret = func(fdt, node, name, depth, as, ss, data);
+            if ( ret != 0 )
+                return ret;
+        }
+
+        node = fdt_next_node(fdt, node, &depth);
+    } while ( node >= 0 && depth > 0 );
+
+    return 0;
+}
+
+static int __init process_memory_node(const void *fdt, int node,
+                                      const char *name, int depth,
+                                      uint32_t address_cells, uint32_t size_cells,
+                                      void *data)
+{
+    return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
+                                   data, MEMBANK_DEFAULT);
+}
+
+static int __init process_reserved_memory_node(const void *fdt, int node,
+                                               const char *name, int depth,
+                                               uint32_t address_cells,
+                                               uint32_t size_cells,
+                                               void *data)
+{
+    int rc;
+
+    rc = process_memory_node(fdt, node, name, depth, address_cells,
+                                 size_cells, data);
+
+
+    if ( rc == -ENOSPC )
+        panic("Max number of supported reserved-memory regions reached.\n");
+    else if ( rc != -ENOENT )
+        return rc;
+    return 0;
+}
+
+static int __init process_reserved_memory(const void *fdt, int node,
+                                          const char *name, int depth,
+                                          uint32_t address_cells, uint32_t size_cells)
+{
+    return device_tree_for_each_node(fdt, node,
+                                     process_reserved_memory_node,
+                                     &bootinfo.reserved_mem);
+}
+
+static int __init process_chosen_node(const void *fdt, int node,
+                                      const char *name,
+                                      uint32_t address_cells, uint32_t size_cells)
+{
+    const struct fdt_property *prop;
+    paddr_t start, end;
+    int len;
+
+    printk("Checking for initrd in /chosen\n");
+
+    prop = fdt_get_property(fdt, node, "linux,initrd-start", &len);
+    if ( !prop )
+        /* No initrd present. */
+        return 0;
+    if ( len != sizeof(uint32_t) && len != sizeof(uint64_t) )
+    {
+        printk("linux,initrd-start property has invalid length %d\n", len);
+        return -EINVAL;
+    }
+    start = dt_read_paddr((void *)&prop->data, dt_size_to_cells(len));
+
+    prop = fdt_get_property(fdt, node, "linux,initrd-end", &len);
+    if ( !prop )
+    {
+        printk("linux,initrd-end not present but -start was\n");
+        return -EINVAL;
+    }
+    if ( len != sizeof(uint32_t) && len != sizeof(uint64_t) )
+    {
+        printk("linux,initrd-end property has invalid length %d\n", len);
+        return -EINVAL;
+    }
+    end = dt_read_paddr((void *)&prop->data, dt_size_to_cells(len));
+
+    if ( start >= end )
+    {
+        printk("linux,initrd limits invalid: %"PRIpaddr" >= %"PRIpaddr"\n",
+                  start, end);
+        return -EINVAL;
+    }
+
+    printk("Initrd %"PRIpaddr"-%"PRIpaddr"\n", start, end);
+
+    add_boot_module(BOOTMOD_RAMDISK, start, end-start, false);
+
+    return 0;
+}
+
+static int __init early_scan_node(const void *fdt,
+                                  int node, const char *name, int depth,
+                                  uint32_t address_cells, uint32_t size_cells,
+                                  void *data)
+{
+    int rc = 0;
+
+    if( device_tree_node_matches(fdt, node, "memory") )
+        rc = process_memory_node(fdt, node, name, depth,
+                                 address_cells, size_cells, &bootinfo.mem);
+    else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
+        rc = process_reserved_memory(fdt, node, name, depth,
+                                     address_cells, size_cells);
+    else if ( depth == 1 && device_tree_node_matches(fdt, node, "chosen") )
+        rc = process_chosen_node(fdt, node, name, address_cells, size_cells);
+
+    if ( rc < 0 )
+        printk("fdt: node `%s': parsing failed\n", name);
+    return rc;
+}
+
+static void __init early_print_info(void)
+{
+    struct meminfo *mi = &bootinfo.mem;
+    struct meminfo *mem_resv = &bootinfo.reserved_mem;
+    struct bootmodules *mods = &bootinfo.modules;
+    struct bootcmdlines *cmds = &bootinfo.cmdlines;
+    unsigned int i, j;
+
+    for ( i = 0; i < mi->nr_banks; i++ )
+        printk("RAM: %"PRIpaddr" - %"PRIpaddr"\n",
+                mi->bank[i].start,
+                mi->bank[i].start + mi->bank[i].size - 1);
+    printk("\n");
+    for ( i = 0 ; i < mods->nr_mods; i++ )
+        printk("MODULE[%d]: %"PRIpaddr" - %"PRIpaddr" %-12s\n",
+                i,
+                mods->module[i].start,
+                mods->module[i].start + mods->module[i].size,
+                boot_module_kind_as_string(mods->module[i].kind));
+
+    for ( j = 0; j < mem_resv->nr_banks; j++, i++ )
+    {
+        printk(" RESVD_[%u]: %"PRIpaddr" - %"PRIpaddr"\n", i,
+               mem_resv->bank[j].start,
+               mem_resv->bank[j].start + mem_resv->bank[j].size - 1);
+    }
+    printk("\n");
+    for ( i = 0 ; i < cmds->nr_mods; i++ )
+        printk("CMDLINE[%"PRIpaddr"]:%s %s\n", cmds->cmdline[i].start,
+               cmds->cmdline[i].dt_name,
+               &cmds->cmdline[i].cmdline[0]);
+    printk("\n");
+}
+
+/**
+ * boot_fdt_init - initialize bootinfo from a DTB
+ * @fdt: flattened device tree binary
+ * @paddr: physical address of device tree binary
+ *
+ * Returns the size of the DTB.
+ */
+size_t __init boot_fdt_init(const void *fdt, paddr_t paddr)
+{
+    int ret;
+    paddr_t xen_start, xen_end;
+
+    ret = fdt_check_header(fdt);
+    if ( ret < 0 )
+        panic("No valid device tree\n");
+
+    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
+
+    /*
+     * The device tree passed to us may have been allocated by skiboot, in which
+     * case it will exist within a reserved region and this call will fail. This
+     * is fine, however, since either way the allocator will know not to step on
+     * the device tree.
+     */
+    add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
+
+    /*
+     * Xen relocates itself at the ppc64 entrypoint, so we need to manually mark
+     * the kernel module.
+     */
+    xen_start = __pa(_start);
+    xen_end = PAGE_ALIGN(__pa(_end));
+    if ( !add_boot_module(BOOTMOD_KERNEL, xen_start, xen_end, false) )
+        panic("Xen overlaps reserved memory! %016lx - %016lx\n", xen_start,
+              xen_end);
+
+    early_print_info();
+
+    return fdt_totalsize(fdt);
+}
diff --git a/xen/arch/ppc/include/asm/setup.h b/xen/arch/ppc/include/asm/setup.h
index e4f64879b6..f6e1940fa9 100644
--- a/xen/arch/ppc/include/asm/setup.h
+++ b/xen/arch/ppc/include/asm/setup.h
@@ -3,4 +3,117 @@ 
 
 #define max_init_domid (0)
 
+#include <public/version.h>
+#include <asm/p2m.h>
+#include <xen/device_tree.h>
+
+#define MIN_FDT_ALIGN 8
+#define MAX_FDT_SIZE SZ_2M
+
+#define NR_MEM_BANKS 256
+
+#define MAX_MODULES 32 /* Current maximum useful modules */
+
+typedef enum {
+    BOOTMOD_XEN,
+    BOOTMOD_FDT,
+    BOOTMOD_KERNEL,
+    BOOTMOD_RAMDISK,
+}  bootmodule_kind;
+
+enum membank_type {
+    /*
+     * The MEMBANK_DEFAULT type refers to either reserved memory for the
+     * device/firmware (when the bank is in 'reserved_mem') or any RAM (when
+     * the bank is in 'mem').
+     */
+    MEMBANK_DEFAULT,
+    /*
+     * The MEMBANK_STATIC_DOMAIN type is used to indicate whether the memory
+     * bank is bound to a static Xen domain. It is only valid when the bank
+     * is in reserved_mem.
+     */
+    MEMBANK_STATIC_DOMAIN,
+    /*
+     * The MEMBANK_STATIC_HEAP type is used to indicate whether the memory
+     * bank is reserved as static heap. It is only valid when the bank is
+     * in reserved_mem.
+     */
+    MEMBANK_STATIC_HEAP,
+};
+
+/* Indicates the maximum number of characters(\0 included) for shm_id */
+#define MAX_SHM_ID_LENGTH 16
+
+struct membank {
+    paddr_t start;
+    paddr_t size;
+    enum membank_type type;
+};
+
+struct meminfo {
+    unsigned int nr_banks;
+    struct membank bank[NR_MEM_BANKS];
+};
+
+/*
+ * The domU flag is set for kernels and ramdisks of "xen,domain" nodes.
+ * The purpose of the domU flag is to avoid getting confused in
+ * kernel_probe, where we try to guess which is the dom0 kernel and
+ * initrd to be compatible with all versions of the multiboot spec.
+ */
+#define BOOTMOD_MAX_CMDLINE 1024
+struct bootmodule {
+    bootmodule_kind kind;
+    bool domU;
+    paddr_t start;
+    paddr_t size;
+};
+
+/* DT_MAX_NAME is the node name max length according the DT spec */
+#define DT_MAX_NAME 41
+struct bootcmdline {
+    bootmodule_kind kind;
+    bool domU;
+    paddr_t start;
+    char dt_name[DT_MAX_NAME];
+    char cmdline[BOOTMOD_MAX_CMDLINE];
+};
+
+struct bootmodules {
+    int nr_mods;
+    struct bootmodule module[MAX_MODULES];
+};
+
+struct bootcmdlines {
+    unsigned int nr_mods;
+    struct bootcmdline cmdline[MAX_MODULES];
+};
+
+struct bootinfo {
+    struct meminfo mem;
+    struct meminfo reserved_mem;
+    struct bootmodules modules;
+    struct bootcmdlines cmdlines;
+    bool static_heap;
+};
+
+extern struct bootinfo bootinfo;
+
+/*
+ * setup.c
+ */
+
+bool check_reserved_regions_overlap(paddr_t region_start, paddr_t region_size);
+struct bootmodule *add_boot_module(bootmodule_kind kind,
+                                   paddr_t start, paddr_t size, bool domU);
+void add_boot_cmdline(const char *name, const char *cmdline,
+                      bootmodule_kind kind, paddr_t start, bool domU);
+const char *boot_module_kind_as_string(bootmodule_kind kind);
+
+/*
+ * bootfdt.c
+ */
+size_t boot_fdt_init(const void *fdt, paddr_t paddr);
+
 #endif /* __ASM_PPC_SETUP_H__ */
diff --git a/xen/arch/ppc/setup.c b/xen/arch/ppc/setup.c
index 101bdd8bb6..90de99051e 100644
--- a/xen/arch/ppc/setup.c
+++ b/xen/arch/ppc/setup.c
@@ -1,16 +1,116 @@ 
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 #include <xen/init.h>
 #include <xen/lib.h>
+#include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
 #include <public/version.h>
 #include <asm/boot.h>
 #include <asm/early_printk.h>
 #include <asm/mm.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 
 /* Xen stack for bringing up the first CPU. */
 unsigned char __initdata cpu0_boot_stack[STACK_SIZE] __aligned(STACK_SIZE);
 
+/*
+ * Return the end of the non-module region starting at s. In other
+ * words return s the start of the next modules after s.
+ *
+ * On input *end is the end of the region which should be considered
+ * and it is updated to reflect the end of the module, clipped to the
+ * end of the region if it would run over.
+ */
+static paddr_t __init next_module(paddr_t s, paddr_t *end)
+{
+    struct bootmodules *mi = &bootinfo.modules;
+    paddr_t lowest = ~(paddr_t)0;
+    int i;
+
+    for ( i = 0; i < mi->nr_mods; i++ )
+    {
+        paddr_t mod_s = mi->module[i].start;
+        paddr_t mod_e = mod_s + mi->module[i].size;
+
+        if ( !mi->module[i].size )
+            continue;
+
+        if ( mod_s < s )
+            continue;
+        if ( mod_s > lowest )
+            continue;
+        if ( mod_s > *end )
+            continue;
+        lowest = mod_s;
+        *end = min(*end, mod_e);
+    }
+    return lowest;
+}
+
+static void __init dt_unreserved_regions(paddr_t s, paddr_t e,
+                                         void (*cb)(paddr_t ps, paddr_t pe),
+                                         unsigned int first)
+{
+    unsigned int i;
+
+    for ( i = 0 ; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        paddr_t r_s = bootinfo.reserved_mem.bank[i].start;
+        paddr_t r_e = r_s + bootinfo.reserved_mem.bank[i].size;
+
+        if ( s < r_e && r_s < e )
+        {
+            dt_unreserved_regions(r_e, e, cb, i + 1);
+            dt_unreserved_regions(s, r_s, cb, i + 1);
+            return;
+        }
+    }
+
+    cb(s, e);
+}
+
+/*
+ * Populate the boot allocator. Based on arch/arm/setup.c's
+ * populate_boot_allocator.
+ * All RAM but the following regions will be added to the boot allocator:
+ *  - Modules (e.g., Xen, Kernel)
+ *  - Reserved regions
+ */
+static void __init populate_boot_allocator(void)
+{
+    unsigned int i;
+    const struct meminfo *banks = &bootinfo.mem;
+    paddr_t s, e;
+
+    for ( i = 0; i < banks->nr_banks; i++ )
+    {
+        const struct membank *bank = &banks->bank[i];
+        paddr_t bank_end = bank->start + bank->size;
+
+        s = bank->start;
+        while ( s < bank_end )
+        {
+            paddr_t n = bank_end;
+
+            e = next_module(s, &n);
+
+            if ( e == ~(paddr_t)0 )
+                e = n = bank_end;
+
+            /*
+             * Module in a RAM bank other than the one which we are
+             * not dealing with here.
+             */
+            if ( e > bank_end )
+                e = bank_end;
+
+            dt_unreserved_regions(s, e, init_boot_pages, 0);
+
+            s = n;
+        }
+    }
+}
+
 void setup_exceptions(void)
 {
     unsigned long lpcr;
@@ -24,6 +124,8 @@  void __init noreturn start_xen(unsigned long r3, unsigned long r4,
                                unsigned long r5, unsigned long r6,
                                unsigned long r7)
 {
+    void *boot_fdt;
+
     if ( r5 )
     {
         /* Unsupported OpenFirmware boot protocol */
@@ -32,11 +134,16 @@  void __init noreturn start_xen(unsigned long r3, unsigned long r4,
     else
     {
         /* kexec boot protocol */
-        boot_opal_init((void *)r3);
+        boot_fdt = (void *)r3;
+        boot_opal_init(boot_fdt);
     }
 
     setup_exceptions();
 
+    boot_fdt_init(boot_fdt, r3);
+
+    populate_boot_allocator();
+
     setup_initial_pagetables();
 
     early_printk("Hello, ppc64le!\n");