diff mbox series

[V4,2/3] xen/arm: Add handling of extended regions for Dom0

Message ID 1632955927-27911-3-git-send-email-olekstysh@gmail.com (mailing list archive)
State Superseded
Headers show
Series Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") | expand

Commit Message

Oleksandr Tyshchenko Sept. 29, 2021, 10:52 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
the algorithm to choose extended regions for it is different
in comparison with the algorithm for non-direct mapped DomU.
What is more, that extended regions should be chosen differently
whether IOMMU is enabled or not.

Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
holes found in host device-tree if otherwise. Make sure that
extended regions are 2MB-aligned and located within maximum possible
addressable physical memory range. The minimum size of extended
region is 64MB. The maximum number of extended regions is 128,
which is an artificial limit to minimize code changes (we reuse
struct meminfo to describe extended regions, so there are an array
field for 128 elements).

It worth mentioning that unallocated memory solution (when the IOMMU
is disabled) will work safely until Dom0 is able to allocate memory
outside of the original range.

Also introduce command line option to be able to globally enable or
disable support for extended regions for Dom0 (enabled by default).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Please note, we need to decide which approach to use in find_unallocated_memory(),
you can find details at:
https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/

Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property

Changes V2 -> V3:
   - update patch description
   - add comment for "size" calculation in add_ext_regions()
   - clarify "end" calculation in find_unallocated_memory() and
     find_memory_holes()
   - only pick up regions with size >= 64MB
   - allocate reg dynamically instead of keeping on the stack in
     make_hypervisor_node()
   - do not show warning for 32-bit domain
   - drop Linux specific limits EXT_REGION_*
   - also cover "ranges" property in find_memory_holes()
   - add command line arg to enable/disable extended region support

Changes V3 -> V4:
  - update opt_ext_regions purpose and comment in code
  - reorganize make_hypervisor_node() to move allocations after initial
    checks, allocate only required amount of elements instead of maximum
    possible
---
 docs/misc/xen-command-line.pandoc |  11 ++
 xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 294 insertions(+), 3 deletions(-)

Comments

Luca Fancellu Sept. 30, 2021, 3:36 p.m. UTC | #1
> On 29 Sep 2021, at 23:52, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>  currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>   - update patch description
>   - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>   - update patch description
>   - add comment for "size" calculation in add_ext_regions()
>   - clarify "end" calculation in find_unallocated_memory() and
>     find_memory_holes()
>   - only pick up regions with size >= 64MB
>   - allocate reg dynamically instead of keeping on the stack in
>     make_hypervisor_node()
>   - do not show warning for 32-bit domain
>   - drop Linux specific limits EXT_REGION_*
>   - also cover "ranges" property in find_memory_holes()
>   - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>  - update opt_ext_regions purpose and comment in code
>  - reorganize make_hypervisor_node() to move allocations after initial
>    checks, allocate only required amount of elements instead of maximum
>    possible
> ---
> docs/misc/xen-command-line.pandoc |  11 ++
> xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
> 2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
> Note that specifying zero as domU value means zero, while for dom0 it means
> to use the default.
> 
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
> ### flask
>> `= permissive | enforcing | late | disabled`
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
> static unsigned int __initdata opt_dom0_max_vcpus;
> integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
> 
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
> static u64 __initdata dom0_mem;
> static bool __initdata dom0_mem_set;
> 
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>     return res;
> }
> 
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
> static int __init make_hypervisor_node(struct domain *d,
>                                        const struct kernel_info *kinfo,
>                                        int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>     const char compat[] =
>         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>         "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>     gic_interrupt_t intr;
> -    __be32 *cells;
>     int res;
>     void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
> 
>     dt_dprintk("Create hypervisor node\n");
> 
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>     if ( res )
>         return res;
> 
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>     /* reg 0 is grant table space */
>     cells = &reg[0];
>     dt_child_set_range(&cells, addrcells, sizecells,
>                        kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>     res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>     if ( res )
>         return res;
> 
> -- 
> 2.7.4
> 
>
Stefano Stabellini Sept. 30, 2021, 10:53 p.m. UTC | #2
On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>    - update patch description
>    - add comment for "size" calculation in add_ext_regions()
>    - clarify "end" calculation in find_unallocated_memory() and
>      find_memory_holes()
>    - only pick up regions with size >= 64MB
>    - allocate reg dynamically instead of keeping on the stack in
>      make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - drop Linux specific limits EXT_REGION_*
>    - also cover "ranges" property in find_memory_holes()
>    - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>   - update opt_ext_regions purpose and comment in code
>   - reorganize make_hypervisor_node() to move allocations after initial
>     checks, allocate only required amount of elements instead of maximum
>     possible
> ---
>  docs/misc/xen-command-line.pandoc |  11 ++
>  xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>  ### flask
>  > `= permissive | enforcing | late | disabled`
>  
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>  static unsigned int __initdata opt_dom0_max_vcpus;
>  integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>  
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>  static u64 __initdata dom0_mem;
>  static bool __initdata dom0_mem_set;
>  
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>  static int __init make_hypervisor_node(struct domain *d,
>                                         const struct kernel_info *kinfo,
>                                         int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>      gic_interrupt_t intr;
> -    __be32 *cells;
>      int res;
>      void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>  
>      dt_dprintk("Create hypervisor node\n");
>  
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>      /* reg 0 is grant table space */
>      cells = &reg[0];
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>      res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>      if ( res )
>          return res;
>  
> -- 
> 2.7.4
>
Stefano Stabellini Oct. 2, 2021, 12:33 a.m. UTC | #3
On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

I thought about it and I decided to commit this patch because it doesn't
actually need anything from the other two patches, and it is very useful
on its own (both of them are for domU, while this one is for dom0).

In regards to Julien's suggestion: as explained in earlier emails I
prefer this version but I don't have a strong opinion. If Julien still
prefers the other approach we can still change it in time for 4.16
(Oleksandr has already implemented both and I am happy to review.)


> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>    - update patch description
>    - add comment for "size" calculation in add_ext_regions()
>    - clarify "end" calculation in find_unallocated_memory() and
>      find_memory_holes()
>    - only pick up regions with size >= 64MB
>    - allocate reg dynamically instead of keeping on the stack in
>      make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - drop Linux specific limits EXT_REGION_*
>    - also cover "ranges" property in find_memory_holes()
>    - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>   - update opt_ext_regions purpose and comment in code
>   - reorganize make_hypervisor_node() to move allocations after initial
>     checks, allocate only required amount of elements instead of maximum
>     possible
> ---
>  docs/misc/xen-command-line.pandoc |  11 ++
>  xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>  Note that specifying zero as domU value means zero, while for dom0 it means
>  to use the default.
>  
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>  ### flask
>  > `= permissive | enforcing | late | disabled`
>  
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>  static unsigned int __initdata opt_dom0_max_vcpus;
>  integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>  
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>  static u64 __initdata dom0_mem;
>  static bool __initdata dom0_mem_set;
>  
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)
> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)
> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {
> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>  static int __init make_hypervisor_node(struct domain *d,
>                                         const struct kernel_info *kinfo,
>                                         int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>      gic_interrupt_t intr;
> -    __be32 *cells;
>      int res;
>      void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>  
>      dt_dprintk("Create hypervisor node\n");
>  
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>      /* reg 0 is grant table space */
>      cells = &reg[0];
>      dt_child_set_range(&cells, addrcells, sizecells,
>                         kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);
> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>      res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>      if ( res )
>          return res;
>  
> -- 
> 2.7.4
>
Oleksandr Tyshchenko Oct. 2, 2021, 12:40 p.m. UTC | #4
On 02.10.21 03:33, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> I thought about it and I decided to commit this patch because it doesn't
> actually need anything from the other two patches, and it is very useful
> on its own (both of them are for domU, while this one is for dom0).

Thank you.


>
> In regards to Julien's suggestion: as explained in earlier emails I
> prefer this version but I don't have a strong opinion. If Julien still
> prefers the other approach we can still change it in time for 4.16
> (Oleksandr has already implemented both and I am happy to review.)

Sure, we will able to do it if needed.


>
>
>> ---
>> Please note, we need to decide which approach to use in find_unallocated_memory(),
>> you can find details at:
>> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - add comment for "size" calculation in add_ext_regions()
>>     - clarify "end" calculation in find_unallocated_memory() and
>>       find_memory_holes()
>>     - only pick up regions with size >= 64MB
>>     - allocate reg dynamically instead of keeping on the stack in
>>       make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - drop Linux specific limits EXT_REGION_*
>>     - also cover "ranges" property in find_memory_holes()
>>     - add command line arg to enable/disable extended region support
>>
>> Changes V3 -> V4:
>>    - update opt_ext_regions purpose and comment in code
>>    - reorganize make_hypervisor_node() to move allocations after initial
>>      checks, allocate only required amount of elements instead of maximum
>>      possible
>> ---
>>   docs/misc/xen-command-line.pandoc |  11 ++
>>   xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 294 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index 177e656..5cae4ad 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>>   Note that specifying zero as domU value means zero, while for dom0 it means
>>   to use the default.
>>   
>> +### ext_regions (Arm)
>> +> `= <boolean>`
>> +
>> +> Default : `true`
>> +
>> +Flag to enable or disable support for extended regions for Dom0.
>> +
>> +Extended regions are ranges of unused address space exposed to Dom0 as
>> +"safe to use" for special memory mappings. Disable if your board device
>> +tree is incomplete.
>> +
>>   ### flask
>>   > `= permissive | enforcing | late | disabled`
>>   
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d233d63..c5afbe2 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -34,6 +34,10 @@
>>   static unsigned int __initdata opt_dom0_max_vcpus;
>>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>>   
>> +/* If true, the extended regions support is enabled for dom0 */
>> +static bool __initdata opt_ext_regions = true;
>> +boolean_param("ext_regions", opt_ext_regions);
>> +
>>   static u64 __initdata dom0_mem;
>>   static bool __initdata dom0_mem_set;
>>   
>> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>>       return res;
>>   }
>>   
>> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    /*
>> +     * e is actually "end-1" because it is called by rangeset functions
>> +     * which are inclusive of the last address.
>> +     */
>> +    e += 1;
>> +    size = (e - start) & ~(SZ_2M - 1);
>> +    if ( size < MB(64) )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
>> +                                          struct meminfo *ext_regions)
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
>> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
>> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Remove regions described by "reg" and "ranges" properties where
>> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
>> +     */
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
>> +            end = PAGE_ALIGN(addr + size);
>> +            res = rangeset_remove_range(mem_holes, start, end - 1);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        if ( dt_device_type_is_equal(np, "pci" ) )
>> +        {
>> +            unsigned int range_size, nr_ranges;
>> +            int na, ns, pna;
>> +            const __be32 *ranges;
>> +            u32 len;
>> +
>> +            /*
>> +             * Looking for non-empty ranges property which in this context
>> +             * describes the PCI host bridge aperture.
>> +             */
>> +            ranges = dt_get_property(np, "ranges", &len);
>> +            if ( !ranges || !len )
>> +                continue;
>> +
>> +            pna = dt_n_addr_cells(np);
>> +            na = dt_child_n_addr_cells(np);
>> +            ns = dt_child_n_size_cells(np);
>> +            range_size = pna + na + ns;
>> +            nr_ranges = len / sizeof(__be32) / range_size;
>> +
>> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
>> +            {
>> +                /* Skip the child address and get the parent (CPU) address */
>> +                addr = dt_read_number(ranges + na, pna);
>> +                size = dt_read_number(ranges + na + pna, ns);
>> +
>> +                start = addr & PAGE_MASK;
>> +                end = PAGE_ALIGN(addr + size);
>> +                res = rangeset_remove_range(mem_holes, start, end - 1);
>> +                if ( res )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>> +                           start, end);
>> +                    goto out;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions,  ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>>       const char compat[] =
>>           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 *reg, *cells;
>>       gic_interrupt_t intr;
>> -    __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions = NULL;
>> +    unsigned int i, nr_ext_regions;
>>   
>>       dt_dprintk("Create hypervisor node\n");
>>   
>> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>>       if ( res )
>>           return res;
>>   
>> +    if ( !opt_ext_regions )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
>> +        nr_ext_regions = 0;
>> +    }
>> +    else if ( is_32bit_domain(d) )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
>> +        nr_ext_regions = 0;
>> +    }
>> +    else
>> +    {
>> +        ext_regions = xzalloc(struct meminfo);
>> +        if ( !ext_regions )
>> +            return -ENOMEM;
>> +
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
>> +        nr_ext_regions = ext_regions->nr_banks;
>> +    }
>> +
>> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
>> +    if ( !reg )
>> +    {
>> +        xfree(ext_regions);
>> +        return -ENOMEM;
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < nr_ext_regions; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) *
>> +                       (nr_ext_regions + 1));
>> +    xfree(ext_regions);
>> +    xfree(reg);
>> +
>>       if ( res )
>>           return res;
>>   
>> -- 
>> 2.7.4
>>
Julien Grall Oct. 4, 2021, 6:41 a.m. UTC | #5
Hi Stefano,

On 02/10/2021 02:33, Stefano Stabellini wrote:
> On Thu, 30 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> I thought about it and I decided to commit this patch because it doesn't
> actually need anything from the other two patches, and it is very useful
> on its own (both of them are for domU, while this one is for dom0).
> 
> In regards to Julien's suggestion: as explained in earlier emails I
> prefer this version but I don't have a strong opinion. If Julien still
> prefers the other approach we can still change it in time for 4.16
> (Oleksandr has already implemented both and I am happy to review.)

Lets keep the committed approach for the 4.16. This is not something we 
tie into the ABI so it can be modified later on if we find some issues 
(i.e. more and more ranges to exclude).

However, I would still like to see some changes on top of this patch for 
4.16 (will comment separately).

Cheers,
Julien Grall Oct. 4, 2021, 6:59 a.m. UTC | #6
Hi Oleksandr,

I saw Stefano committed this patch on Friday. However, I didn't have a 
chance go to through a second time and would like to request some 
follow-up changes.

On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>    currently.
> - The ACPI case is not covered.
> 
> ***
> 
> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> the algorithm to choose extended regions for it is different
> in comparison with the algorithm for non-direct mapped DomU.
> What is more, that extended regions should be chosen differently
> whether IOMMU is enabled or not.
> 
> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> holes found in host device-tree if otherwise. Make sure that
> extended regions are 2MB-aligned and located within maximum possible
> addressable physical memory range. The minimum size of extended
> region is 64MB. 

You explained below why the 128 limits, but I don't see any explanation 
on why 2MB and 64MB.

IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
sure for 64MB.

Could you add an in-code comment explaining the two limits?

> The maximum number of extended regions is 128,
> which is an artificial limit to minimize code changes (we reuse
> struct meminfo to describe extended regions, so there are an array
> field for 128 elements).
> 
> It worth mentioning that unallocated memory solution (when the IOMMU
> is disabled) will work safely until Dom0 is able to allocate memory
> outside of the original range.
> 
> Also introduce command line option to be able to globally enable or
> disable support for extended regions for Dom0 (enabled by default).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Please note, we need to decide which approach to use in find_unallocated_memory(),
> you can find details at:
> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/
> 
> Changes RFC -> V2:
>     - update patch description
>     - drop uneeded "extended-region" DT property
> 
> Changes V2 -> V3:
>     - update patch description
>     - add comment for "size" calculation in add_ext_regions()
>     - clarify "end" calculation in find_unallocated_memory() and
>       find_memory_holes()
>     - only pick up regions with size >= 64MB
>     - allocate reg dynamically instead of keeping on the stack in
>       make_hypervisor_node()
>     - do not show warning for 32-bit domain
>     - drop Linux specific limits EXT_REGION_*
>     - also cover "ranges" property in find_memory_holes()
>     - add command line arg to enable/disable extended region support
> 
> Changes V3 -> V4:
>    - update opt_ext_regions purpose and comment in code
>    - reorganize make_hypervisor_node() to move allocations after initial
>      checks, allocate only required amount of elements instead of maximum
>      possible
> ---
>   docs/misc/xen-command-line.pandoc |  11 ++
>   xen/arch/arm/domain_build.c       | 286 +++++++++++++++++++++++++++++++++++++-

The document in docs/misc/arm/device-tree/guest.txt needs to be updated 
to reflect the change in the binding.

>   2 files changed, 294 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 177e656..5cae4ad 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>   Note that specifying zero as domU value means zero, while for dom0 it means
>   to use the default.
>   
> +### ext_regions (Arm)
> +> `= <boolean>`
> +
> +> Default : `true`
> +
> +Flag to enable or disable support for extended regions for Dom0.
> +
> +Extended regions are ranges of unused address space exposed to Dom0 as
> +"safe to use" for special memory mappings. Disable if your board device
> +tree is incomplete.
> +
>   ### flask
>   > `= permissive | enforcing | late | disabled`
>   
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d233d63..c5afbe2 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -34,6 +34,10 @@
>   static unsigned int __initdata opt_dom0_max_vcpus;
>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>   
> +/* If true, the extended regions support is enabled for dom0 */
> +static bool __initdata opt_ext_regions = true;
> +boolean_param("ext_regions", opt_ext_regions);
> +
>   static u64 __initdata dom0_mem;
>   static bool __initdata dom0_mem_set;
>   
> @@ -886,6 +890,232 @@ static int __init make_memory_node(const struct domain *d,
>       return res;
>   }
>   
> +static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
> +{
> +    struct meminfo *ext_regions = data;
> +    paddr_t start, size;
> +
> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
> +        return 0;
> +
> +    /* Both start and size of the extended region should be 2MB aligned */
> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
> +    if ( start > e )
> +        return 0;
> +
> +    /*
> +     * e is actually "end-1" because it is called by rangeset functions
> +     * which are inclusive of the last address.
> +     */
> +    e += 1;
> +    size = (e - start) & ~(SZ_2M - 1);
> +    if ( size < MB(64) )
> +        return 0;
> +
> +    ext_regions->bank[ext_regions->nr_banks].start = start;
> +    ext_regions->bank[ext_regions->nr_banks].size = size;
> +    ext_regions->nr_banks++;
> +
> +    return 0;
> +}
> +
> +static int __init find_unallocated_memory(const struct kernel_info *kinfo,
> +                                          struct meminfo *ext_regions)

It would be good to have a comment on top of this function summarizing 
how this is meant to work.

> +{
> +    const struct meminfo *assign_mem = &kinfo->mem;
> +    struct rangeset *unalloc_mem;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find unallocated memory for extended regions\n");
> +
> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
> +    if ( !unalloc_mem )
> +        return -ENOMEM;
> +
> +    /* Start with all available RAM */
> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.mem.bank[i].start;
> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",

start and end are paddr_t. So you want to use PRIpaddr rather than PRIx64.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove RAM assigned to Dom0 */
> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
> +    {
> +        start = assign_mem->bank[i].start;
> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

Ditto.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove reserved-memory regions */
> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
> +    {
> +        start = bootinfo.reserved_mem.bank[i].start;
> +        end = bootinfo.reserved_mem.bank[i].start +
> +            bootinfo.reserved_mem.bank[i].size;
> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

Ditto.

> +                   start, end);
> +            goto out;
> +        }
> +    }
> +
> +    /* Remove grant table region */
> +    start = kinfo->gnttab_start;
> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

ditto.

> +               start, end);
> +        goto out;
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(unalloc_mem, start, end,
> +                                 add_ext_regions, ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(unalloc_mem);
> +
> +    return res;
> +}
> +
> +static int __init find_memory_holes(const struct kernel_info *kinfo,
> +                                    struct meminfo *ext_regions)

I think it would be good to have a comment on top of the function how 
this is meant to work.

> +{
> +    struct dt_device_node *np;
> +    struct rangeset *mem_holes;
> +    paddr_t start, end;
> +    unsigned int i;
> +    int res;
> +
> +    dt_dprintk("Find memory holes for extended regions\n");
> +
> +    mem_holes = rangeset_new(NULL, NULL, 0);
> +    if ( !mem_holes )
> +        return -ENOMEM;
> +
> +    /* Start with maximum possible addressable physical memory range */
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_add_range(mem_holes, start, end);
> +    if ( res )
> +    {
> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",

Please use PRIpaddr.

> +               start, end);
> +        goto out;
> +    }
> +
> +    /*
> +     * Remove regions described by "reg" and "ranges" properties where
> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
> +     */
> +    dt_for_each_device_node( dt_host, np )
> +    {
> +        unsigned int naddr;
> +        u64 addr, size;
> +
> +        naddr = dt_number_of_address(np);
> +
> +        for ( i = 0; i < naddr; i++ )
> +        {
> +            res = dt_device_get_address(np, i, &addr, &size);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
> +                       i, dt_node_full_name(np));
> +                goto out;
> +            }
> +
> +            start = addr & PAGE_MASK;
> +            end = PAGE_ALIGN(addr + size);
> +            res = rangeset_remove_range(mem_holes, start, end - 1);
> +            if ( res )
> +            {
> +                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
> +                       start, end);
> +                goto out;
> +            }
> +        }
> +
> +        if ( dt_device_type_is_equal(np, "pci" ) )
> +        {

The code below looks like an open-coding version of dt_for_each_range(). 
Can you try to re-use it please? This will help to reduce the complexity 
of this function.

> +            unsigned int range_size, nr_ranges;
> +            int na, ns, pna;
> +            const __be32 *ranges;
> +            u32 len;
> +
> +            /*
> +             * Looking for non-empty ranges property which in this context
> +             * describes the PCI host bridge aperture.
> +             */
> +            ranges = dt_get_property(np, "ranges", &len);
> +            if ( !ranges || !len )
> +                continue;
> +
> +            pna = dt_n_addr_cells(np);
> +            na = dt_child_n_addr_cells(np);
> +            ns = dt_child_n_size_cells(np);
> +            range_size = pna + na + ns;
> +            nr_ranges = len / sizeof(__be32) / range_size;
> +
> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
> +            {
> +                /* Skip the child address and get the parent (CPU) address */
> +                addr = dt_read_number(ranges + na, pna);
> +                size = dt_read_number(ranges + na + pna, ns);
> +
> +                start = addr & PAGE_MASK;
> +                end = PAGE_ALIGN(addr + size);
> +                res = rangeset_remove_range(mem_holes, start, end - 1);
> +                if ( res )
> +                {
> +                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",

This should be PRIpaddr.

> +                           start, end);
> +                    goto out;
> +                }
> +            }
> +        }
> +    }
> +
> +    start = 0;
> +    end = (1ULL << p2m_ipa_bits) - 1;
> +    res = rangeset_report_ranges(mem_holes, start, end,
> +                                 add_ext_regions,  ext_regions);
> +    if ( res )
> +        ext_regions->nr_banks = 0;
> +    else if ( !ext_regions->nr_banks )
> +        res = -ENOENT;
> +
> +out:
> +    rangeset_destroy(mem_holes);
> +
> +    return res;
> +}
> +
>   static int __init make_hypervisor_node(struct domain *d,
>                                          const struct kernel_info *kinfo,
>                                          int addrcells, int sizecells)
> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct domain *d,
>       const char compat[] =
>           "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>           "xen,xen";
> -    __be32 reg[4];
> +    __be32 *reg, *cells;
>       gic_interrupt_t intr;
> -    __be32 *cells;
>       int res;
>       void *fdt = kinfo->fdt;
> +    struct meminfo *ext_regions = NULL;
> +    unsigned int i, nr_ext_regions;
>   
>       dt_dprintk("Create hypervisor node\n");
>   
> @@ -919,12 +1150,61 @@ static int __init make_hypervisor_node(struct domain *d,
>       if ( res )
>           return res;
>   
> +    if ( !opt_ext_regions )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions support is disabled\n");

The extended regions is going to be critical for the performance in 
dom0. So I think this at least want to be a XENLOG_INFO.

> +        nr_ext_regions = 0;
> +    }
> +    else if ( is_32bit_domain(d) )
> +    {
> +        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");

This would want to be use XENLOG_WARN.

> +        nr_ext_regions = 0;
> +    }
> +    else
> +    {
> +        ext_regions = xzalloc(struct meminfo);
> +        if ( !ext_regions )
> +            return -ENOMEM;
> +
> +        if ( !is_iommu_enabled(d) )
> +            res = find_unallocated_memory(kinfo, ext_regions);
> +        else
> +            res = find_memory_holes(kinfo, ext_regions);
> +
> +        if ( res )
> +            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
> +        nr_ext_regions = ext_regions->nr_banks;
> +    }
> +
> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
> +    if ( !reg )
> +    {
> +        xfree(ext_regions);
> +        return -ENOMEM;
> +    }
> +
>       /* reg 0 is grant table space */
>       cells = &reg[0];
>       dt_child_set_range(&cells, addrcells, sizecells,
>                          kinfo->gnttab_start, kinfo->gnttab_size);
> +    /* reg 1...N are extended regions */
> +    for ( i = 0; i < nr_ext_regions; i++ )
> +    {
> +        u64 start = ext_regions->bank[i].start;
> +        u64 size = ext_regions->bank[i].size;
> +
> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
> +                   i, start, start + size);

I would prefer if this is a printk() so we get the extended region 
listed from the beginning.

> +
> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
> +    }
> +
>       res = fdt_property(fdt, "reg", reg,
> -                       dt_cells_to_size(addrcells + sizecells));
> +                       dt_cells_to_size(addrcells + sizecells) *
> +                       (nr_ext_regions + 1));
> +    xfree(ext_regions);
> +    xfree(reg);
> +
>       if ( res )
>           return res;
>   
>
Oleksandr Tyshchenko Oct. 4, 2021, 12:08 p.m. UTC | #7
On 04.10.21 09:59, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien


>
> I saw Stefano committed this patch on Friday. However, I didn't have a 
> chance go to through a second time and would like to request some 
> follow-up changes.

ok, do you prefer the follow-up patch to be pushed separately or within 
the rest patches of this series (#1 and #3)?  If the former, I will try 
to push it today to close this question.


>
>
> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>> the algorithm to choose extended regions for it is different
>> in comparison with the algorithm for non-direct mapped DomU.
>> What is more, that extended regions should be chosen differently
>> whether IOMMU is enabled or not.
>>
>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>> holes found in host device-tree if otherwise. Make sure that
>> extended regions are 2MB-aligned and located within maximum possible
>> addressable physical memory range. The minimum size of extended
>> region is 64MB. 
>
> You explained below why the 128 limits, but I don't see any 
> explanation on why 2MB and 64MB.
>
> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
> sure for 64MB.
>
> Could you add an in-code comment explaining the two limits?

Yes. There was a discussion at [1]. 64MB was chosen as a reasonable 
value to deal with between initial 2MB (we might end up having a lot of 
small ranges which are not quite useful but increase bookkeeping) and 
suggested 1GB (we might not be able find a suitable regions at all).


>
>
>> The maximum number of extended regions is 128,
>> which is an artificial limit to minimize code changes (we reuse
>> struct meminfo to describe extended regions, so there are an array
>> field for 128 elements).
>>
>> It worth mentioning that unallocated memory solution (when the IOMMU
>> is disabled) will work safely until Dom0 is able to allocate memory
>> outside of the original range.
>>
>> Also introduce command line option to be able to globally enable or
>> disable support for extended regions for Dom0 (enabled by default).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Please note, we need to decide which approach to use in 
>> find_unallocated_memory(),
>> you can find details at:
>> https://lore.kernel.org/xen-devel/28503e09-44c3-f623-bb8d-8778bb94225f@gmail.com/ 
>>
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - add comment for "size" calculation in add_ext_regions()
>>     - clarify "end" calculation in find_unallocated_memory() and
>>       find_memory_holes()
>>     - only pick up regions with size >= 64MB
>>     - allocate reg dynamically instead of keeping on the stack in
>>       make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - drop Linux specific limits EXT_REGION_*
>>     - also cover "ranges" property in find_memory_holes()
>>     - add command line arg to enable/disable extended region support
>>
>> Changes V3 -> V4:
>>    - update opt_ext_regions purpose and comment in code
>>    - reorganize make_hypervisor_node() to move allocations after initial
>>      checks, allocate only required amount of elements instead of 
>> maximum
>>      possible
>> ---
>>   docs/misc/xen-command-line.pandoc |  11 ++
>>   xen/arch/arm/domain_build.c       | 286 
>> +++++++++++++++++++++++++++++++++++++-
>
> The document in docs/misc/arm/device-tree/guest.txt needs to be 
> updated to reflect the change in the binding.

Good point. Will update.


>
>>   2 files changed, 294 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc 
>> b/docs/misc/xen-command-line.pandoc
>> index 177e656..5cae4ad 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1081,6 +1081,17 @@ hardware domain is architecture dependent.
>>   Note that specifying zero as domU value means zero, while for dom0 
>> it means
>>   to use the default.
>>   +### ext_regions (Arm)
>> +> `= <boolean>`
>> +
>> +> Default : `true`
>> +
>> +Flag to enable or disable support for extended regions for Dom0.
>> +
>> +Extended regions are ranges of unused address space exposed to Dom0 as
>> +"safe to use" for special memory mappings. Disable if your board device
>> +tree is incomplete.
>> +
>>   ### flask
>>   > `= permissive | enforcing | late | disabled`
>>   diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d233d63..c5afbe2 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -34,6 +34,10 @@
>>   static unsigned int __initdata opt_dom0_max_vcpus;
>>   integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
>>   +/* If true, the extended regions support is enabled for dom0 */
>> +static bool __initdata opt_ext_regions = true;
>> +boolean_param("ext_regions", opt_ext_regions);
>> +
>>   static u64 __initdata dom0_mem;
>>   static bool __initdata dom0_mem_set;
>>   @@ -886,6 +890,232 @@ static int __init make_memory_node(const 
>> struct domain *d,
>>       return res;
>>   }
>>   +static int __init add_ext_regions(unsigned long s, unsigned long 
>> e, void *data)
>> +{
>> +    struct meminfo *ext_regions = data;
>> +    paddr_t start, size;
>> +
>> +    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
>> +        return 0;
>> +
>> +    /* Both start and size of the extended region should be 2MB 
>> aligned */
>> +    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
>> +    if ( start > e )
>> +        return 0;
>> +
>> +    /*
>> +     * e is actually "end-1" because it is called by rangeset functions
>> +     * which are inclusive of the last address.
>> +     */
>> +    e += 1;
>> +    size = (e - start) & ~(SZ_2M - 1);
>> +    if ( size < MB(64) )
>> +        return 0;
>> +
>> +    ext_regions->bank[ext_regions->nr_banks].start = start;
>> +    ext_regions->bank[ext_regions->nr_banks].size = size;
>> +    ext_regions->nr_banks++;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __init find_unallocated_memory(const struct kernel_info 
>> *kinfo,
>> +                                          struct meminfo *ext_regions)
>
> It would be good to have a comment on top of this function summarizing 
> how this is meant to work.

Will add.


>
>
>> +{
>> +    const struct meminfo *assign_mem = &kinfo->mem;
>> +    struct rangeset *unalloc_mem;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find unallocated memory for extended regions\n");
>> +
>> +    unalloc_mem = rangeset_new(NULL, NULL, 0);
>> +    if ( !unalloc_mem )
>> +        return -ENOMEM;
>> +
>> +    /* Start with all available RAM */
>> +    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.mem.bank[i].start;
>> +        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
>> +        res = rangeset_add_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to add: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> start and end are paddr_t. So you want to use PRIpaddr rather than PRIx64.

ok for this and the similar comments below.


>
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove RAM assigned to Dom0 */
>> +    for ( i = 0; i < assign_mem->nr_banks; i++ )
>> +    {
>> +        start = assign_mem->bank[i].start;
>> +        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> Ditto.
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove reserved-memory regions */
>> +    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
>> +    {
>> +        start = bootinfo.reserved_mem.bank[i].start;
>> +        end = bootinfo.reserved_mem.bank[i].start +
>> +            bootinfo.reserved_mem.bank[i].size;
>> +        res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> Ditto.
>
>> +                   start, end);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /* Remove grant table region */
>> +    start = kinfo->gnttab_start;
>> +    end = kinfo->gnttab_start + kinfo->gnttab_size;
>> +    res = rangeset_remove_range(unalloc_mem, start, end - 1);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
>
> ditto.
>
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(unalloc_mem, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(unalloc_mem);
>> +
>> +    return res;
>> +}
>> +
>> +static int __init find_memory_holes(const struct kernel_info *kinfo,
>> +                                    struct meminfo *ext_regions)
>
> I think it would be good to have a comment on top of the function how 
> this is meant to work.

Will add.


>
>
>> +{
>> +    struct dt_device_node *np;
>> +    struct rangeset *mem_holes;
>> +    paddr_t start, end;
>> +    unsigned int i;
>> +    int res;
>> +
>> +    dt_dprintk("Find memory holes for extended regions\n");
>> +
>> +    mem_holes = rangeset_new(NULL, NULL, 0);
>> +    if ( !mem_holes )
>> +        return -ENOMEM;
>> +
>> +    /* Start with maximum possible addressable physical memory range */
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_add_range(mem_holes, start, end);
>> +    if ( res )
>> +    {
>> +        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
>
> Please use PRIpaddr.
>
>> +               start, end);
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Remove regions described by "reg" and "ranges" properties where
>> +     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
>> +     */
>> +    dt_for_each_device_node( dt_host, np )
>> +    {
>> +        unsigned int naddr;
>> +        u64 addr, size;
>> +
>> +        naddr = dt_number_of_address(np);
>> +
>> +        for ( i = 0; i < naddr; i++ )
>> +        {
>> +            res = dt_device_get_address(np, i, &addr, &size);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Unable to retrieve address %u for 
>> %s\n",
>> +                       i, dt_node_full_name(np));
>> +                goto out;
>> +            }
>> +
>> +            start = addr & PAGE_MASK;
>> +            end = PAGE_ALIGN(addr + size);
>> +            res = rangeset_remove_range(mem_holes, start, end - 1);
>> +            if ( res )
>> +            {
>> +                printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>> +                       start, end);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        if ( dt_device_type_is_equal(np, "pci" ) )
>> +        {
>
> The code below looks like an open-coding version of 
> dt_for_each_range(). Can you try to re-use it please? This will help 
> to reduce the complexity of this function.

You are right, it makes sense, will definitely reuse. If I was aware of 
that function before I would safe some time I spent on the investigation 
how to parse that)


diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index ef2a177..8820b9c 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1003,6 +1003,26 @@ out:
      return res;
  }

+static int __init handle_pci_range(const struct dt_device_node *dev,
+                                   u64 addr, u64 len, void *data)
+{
+    struct rangeset *mem_holes = data;
+    paddr_t start, end;
+    int res;
+
+    start = addr & PAGE_MASK;
+    end = PAGE_ALIGN(addr + len);
+    res = rangeset_remove_range(mem_holes, start, end - 1);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        return res;
+    }
+
+    return 0;
+}
+
  static int __init find_memory_holes(const struct kernel_info *kinfo,
                                      struct meminfo *ext_regions)
  {
@@ -1061,43 +1081,19 @@ static int __init find_memory_holes(const struct 
kernel_info *kinfo,
              }
          }

-        if ( dt_device_type_is_equal(np, "pci" ) )
+        if ( dt_device_type_is_equal(np, "pci") )
          {
-            unsigned int range_size, nr_ranges;
-            int na, ns, pna;
-            const __be32 *ranges;
-            u32 len;
-
              /*
-             * Looking for non-empty ranges property which in this context
-             * describes the PCI host bridge aperture.
+             * The ranges property in this context describes the PCI host
+             * bridge aperture. It shall be absent if no addresses are 
mapped
+             * through the bridge.
               */
-            ranges = dt_get_property(np, "ranges", &len);
-            if ( !ranges || !len )
+            if ( !dt_get_property(np, "ranges", NULL) )
                  continue;

-            pna = dt_n_addr_cells(np);
-            na = dt_child_n_addr_cells(np);
-            ns = dt_child_n_size_cells(np);
-            range_size = pna + na + ns;
-            nr_ranges = len / sizeof(__be32) / range_size;
-
-            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
-            {
-                /* Skip the child address and get the parent (CPU) 
address */
-                addr = dt_read_number(ranges + na, pna);
-                size = dt_read_number(ranges + na + pna, ns);
-
-                start = addr & PAGE_MASK;
-                end = PAGE_ALIGN(addr + size);
-                res = rangeset_remove_range(mem_holes, start, end - 1);
-                if ( res )
-                {
-                    printk(XENLOG_ERR "Failed to remove: 
%#"PRIx64"->%#"PRIx64"\n",
-                           start, end);
-                    goto out;
-                }
-            }
+            res = dt_for_each_range(np, &handle_pci_range, mem_holes);
+            if ( res )
+                goto out;
          }
      }

(END)


>
>
>> +            unsigned int range_size, nr_ranges;
>> +            int na, ns, pna;
>> +            const __be32 *ranges;
>> +            u32 len;
>> +
>> +            /*
>> +             * Looking for non-empty ranges property which in this 
>> context
>> +             * describes the PCI host bridge aperture.
>> +             */
>> +            ranges = dt_get_property(np, "ranges", &len);
>> +            if ( !ranges || !len )
>> +                continue;
>> +
>> +            pna = dt_n_addr_cells(np);
>> +            na = dt_child_n_addr_cells(np);
>> +            ns = dt_child_n_size_cells(np);
>> +            range_size = pna + na + ns;
>> +            nr_ranges = len / sizeof(__be32) / range_size;
>> +
>> +            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
>> +            {
>> +                /* Skip the child address and get the parent (CPU) 
>> address */
>> +                addr = dt_read_number(ranges + na, pna);
>> +                size = dt_read_number(ranges + na + pna, ns);
>> +
>> +                start = addr & PAGE_MASK;
>> +                end = PAGE_ALIGN(addr + size);
>> +                res = rangeset_remove_range(mem_holes, start, end - 1);
>> +                if ( res )
>> +                {
>> +                    printk(XENLOG_ERR "Failed to remove: 
>> %#"PRIx64"->%#"PRIx64"\n",
>
> This should be PRIpaddr.
>
>> +                           start, end);
>> +                    goto out;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    start = 0;
>> +    end = (1ULL << p2m_ipa_bits) - 1;
>> +    res = rangeset_report_ranges(mem_holes, start, end,
>> +                                 add_ext_regions, ext_regions);
>> +    if ( res )
>> +        ext_regions->nr_banks = 0;
>> +    else if ( !ext_regions->nr_banks )
>> +        res = -ENOENT;
>> +
>> +out:
>> +    rangeset_destroy(mem_holes);
>> +
>> +    return res;
>> +}
>> +
>>   static int __init make_hypervisor_node(struct domain *d,
>>                                          const struct kernel_info 
>> *kinfo,
>>                                          int addrcells, int sizecells)
>> @@ -893,11 +1123,12 @@ static int __init make_hypervisor_node(struct 
>> domain *d,
>>       const char compat[] =
>> "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>>           "xen,xen";
>> -    __be32 reg[4];
>> +    __be32 *reg, *cells;
>>       gic_interrupt_t intr;
>> -    __be32 *cells;
>>       int res;
>>       void *fdt = kinfo->fdt;
>> +    struct meminfo *ext_regions = NULL;
>> +    unsigned int i, nr_ext_regions;
>>         dt_dprintk("Create hypervisor node\n");
>>   @@ -919,12 +1150,61 @@ static int __init 
>> make_hypervisor_node(struct domain *d,
>>       if ( res )
>>           return res;
>>   +    if ( !opt_ext_regions )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions support is 
>> disabled\n");
>
> The extended regions is going to be critical for the performance in 
> dom0. So I think this at least want to be a XENLOG_INFO.

ok


>
>
>> +        nr_ext_regions = 0;
>> +    }
>> +    else if ( is_32bit_domain(d) )
>> +    {
>> +        printk(XENLOG_DEBUG "The extended regions are only supported 
>> for 64-bit guest currently\n");
>
> This would want to be use XENLOG_WARN.

ok, it seems this was in my initial version.


>
>
>> +        nr_ext_regions = 0;
>> +    }
>> +    else
>> +    {
>> +        ext_regions = xzalloc(struct meminfo);
>> +        if ( !ext_regions )
>> +            return -ENOMEM;
>> +
>> +        if ( !is_iommu_enabled(d) )
>> +            res = find_unallocated_memory(kinfo, ext_regions);
>> +        else
>> +            res = find_memory_holes(kinfo, ext_regions);
>> +
>> +        if ( res )
>> +            printk(XENLOG_WARNING "Failed to allocate extended 
>> regions\n");
>> +        nr_ext_regions = ext_regions->nr_banks;
>> +    }
>> +
>> +    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + 
>> sizecells));
>> +    if ( !reg )
>> +    {
>> +        xfree(ext_regions);
>> +        return -ENOMEM;
>> +    }
>> +
>>       /* reg 0 is grant table space */
>>       cells = &reg[0];
>>       dt_child_set_range(&cells, addrcells, sizecells,
>>                          kinfo->gnttab_start, kinfo->gnttab_size);
>> +    /* reg 1...N are extended regions */
>> +    for ( i = 0; i < nr_ext_regions; i++ )
>> +    {
>> +        u64 start = ext_regions->bank[i].start;
>> +        u64 size = ext_regions->bank[i].size;
>> +
>> +        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
>> +                   i, start, start + size);
>
> I would prefer if this is a printk() so we get the extended region 
> listed from the beginning.

ok


>
>
>> +
>> +        dt_child_set_range(&cells, addrcells, sizecells, start, size);
>> +    }
>> +
>>       res = fdt_property(fdt, "reg", reg,
>> -                       dt_cells_to_size(addrcells + sizecells));
>> +                       dt_cells_to_size(addrcells + sizecells) *
>> +                       (nr_ext_regions + 1));
>> +    xfree(ext_regions);
>> +    xfree(reg);
>> +
>>       if ( res )
>>           return res;
>>
>
[1] 
https://lore.kernel.org/xen-devel/b349e43a-91d9-16ba-57c6-34e790b8b31c@gmail.com/


Thank you.
Julien Grall Oct. 6, 2021, 6:11 p.m. UTC | #8
Hi Oleksandr,

On 04/10/2021 14:08, Oleksandr wrote:
> 
> On 04.10.21 09:59, Julien Grall wrote:
>> Hi Oleksandr,
> 
> Hi Julien

Hi Oleksandr,

> 
> 
>>
>> I saw Stefano committed this patch on Friday. However, I didn't have a 
>> chance go to through a second time and would like to request some 
>> follow-up changes.
> 
> ok, do you prefer the follow-up patch to be pushed separately or within 
> the rest patches of this series (#1 and #3)?  If the former, I will try 
> to push it today to close this question.

I don't mind. My main ask is they are addressed for 4.16.

> 
> 
>>
>>
>> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> The extended region (safe range) is a region of guest physical
>>> address space which is unused and could be safely used to create
>>> grant/foreign mappings instead of wasting real RAM pages from
>>> the domain memory for establishing these mappings.
>>>
>>> The extended regions are chosen at the domain creation time and
>>> advertised to it via "reg" property under hypervisor node in
>>> the guest device-tree. As region 0 is reserved for grant table
>>> space (always present), the indexes for extended regions are 1...N.
>>> If extended regions could not be allocated for some reason,
>>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>>
>>> Please note the following limitations:
>>> - The extended region feature is only supported for 64-bit domain
>>>    currently.
>>> - The ACPI case is not covered.
>>>
>>> ***
>>>
>>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>>> the algorithm to choose extended regions for it is different
>>> in comparison with the algorithm for non-direct mapped DomU.
>>> What is more, that extended regions should be chosen differently
>>> whether IOMMU is enabled or not.
>>>
>>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>>> holes found in host device-tree if otherwise. Make sure that
>>> extended regions are 2MB-aligned and located within maximum possible
>>> addressable physical memory range. The minimum size of extended
>>> region is 64MB. 
>>
>> You explained below why the 128 limits, but I don't see any 
>> explanation on why 2MB and 64MB.
>>
>> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely 
>> sure for 64MB.
>>
>> Could you add an in-code comment explaining the two limits?
> 
> Yes. There was a discussion at [1]. 64MB was chosen as a reasonable 
> value to deal with between initial 2MB (we might end up having a lot of 
> small ranges which are not quite useful but increase bookkeeping) and 
> suggested 1GB (we might not be able find a suitable regions at all).

Ok. Please document in the code. Note that I don't think this choice 
should be advertised to the OS. This would give us some flexibility to 
change the size in the future (e.g. if we have platform where chunk of 
less than 64MB is beneficial).

>> The code below looks like an open-coding version of 
>> dt_for_each_range(). Can you try to re-use it please? This will help 
>> to reduce the complexity of this function.
> 
> You are right, it makes sense, will definitely reuse. If I was aware of 
> that function before I would safe some time I spent on the investigation 
> how to parse that)

I have only skimmed through the diff below. This looks fine to me. 
Please submit a formal patch.

Cheers,
Oleksandr Tyshchenko Oct. 6, 2021, 6:15 p.m. UTC | #9
On Wed, Oct 6, 2021 at 9:11 PM Julien Grall <julien@xen.org> wrote:

> Hi Oleksandr,
>

Hi Julien

[Sorry for the possible format issues]



>
> On 04/10/2021 14:08, Oleksandr wrote:
> >
> > On 04.10.21 09:59, Julien Grall wrote:
> >> Hi Oleksandr,
> >
> > Hi Julien
>
> Hi Oleksandr,
>
> >
> >
> >>
> >> I saw Stefano committed this patch on Friday. However, I didn't have a
> >> chance go to through a second time and would like to request some
> >> follow-up changes.
> >
> > ok, do you prefer the follow-up patch to be pushed separately or within
> > the rest patches of this series (#1 and #3)?  If the former, I will try
> > to push it today to close this question.
>
> I don't mind. My main ask is they are addressed for 4.16.
>
> >
> >
> >>
> >>
> >> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
> >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>
> >>> The extended region (safe range) is a region of guest physical
> >>> address space which is unused and could be safely used to create
> >>> grant/foreign mappings instead of wasting real RAM pages from
> >>> the domain memory for establishing these mappings.
> >>>
> >>> The extended regions are chosen at the domain creation time and
> >>> advertised to it via "reg" property under hypervisor node in
> >>> the guest device-tree. As region 0 is reserved for grant table
> >>> space (always present), the indexes for extended regions are 1...N.
> >>> If extended regions could not be allocated for some reason,
> >>> Xen doesn't fail and behaves as usual, so only inserts region 0.
> >>>
> >>> Please note the following limitations:
> >>> - The extended region feature is only supported for 64-bit domain
> >>>    currently.
> >>> - The ACPI case is not covered.
> >>>
> >>> ***
> >>>
> >>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
> >>> the algorithm to choose extended regions for it is different
> >>> in comparison with the algorithm for non-direct mapped DomU.
> >>> What is more, that extended regions should be chosen differently
> >>> whether IOMMU is enabled or not.
> >>>
> >>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
> >>> holes found in host device-tree if otherwise. Make sure that
> >>> extended regions are 2MB-aligned and located within maximum possible
> >>> addressable physical memory range. The minimum size of extended
> >>> region is 64MB.
> >>
> >> You explained below why the 128 limits, but I don't see any
> >> explanation on why 2MB and 64MB.
> >>
> >> IIRC, 2MB was to potentally allow superpage mapping. I am not entirely
> >> sure for 64MB.
> >>
> >> Could you add an in-code comment explaining the two limits?
> >
> > Yes. There was a discussion at [1]. 64MB was chosen as a reasonable
> > value to deal with between initial 2MB (we might end up having a lot of
> > small ranges which are not quite useful but increase bookkeeping) and
> > suggested 1GB (we might not be able find a suitable regions at all).
>
> Ok. Please document in the code. Note that I don't think this choice
> should be advertised to the OS. This would give us some flexibility to
> change the size in the future (e.g. if we have platform where chunk of
> less than 64MB is beneficial).
>
> >> The code below looks like an open-coding version of
> >> dt_for_each_range(). Can you try to re-use it please? This will help
> >> to reduce the complexity of this function.
> >
> > You are right, it makes sense, will definitely reuse. If I was aware of
> > that function before I would safe some time I spent on the investigation
> > how to parse that)
>
> I have only skimmed through the diff below. This looks fine to me.
> Please submit a formal patch.
>

Already submitted, please take a look at [1].

 [1]
https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/
Julien Grall Oct. 6, 2021, 6:35 p.m. UTC | #10
Hi Oleksandr,

On 06/10/2021 20:15, Oleksandr Tyshchenko wrote:
> 
> 
> On Wed, Oct 6, 2021 at 9:11 PM Julien Grall <julien@xen.org 
> <mailto:julien@xen.org>> wrote:
> 
>     Hi Oleksandr,
> 
> 
> Hi Julien
> 
> [Sorry for the possible format issues]
> 
> 
>     On 04/10/2021 14:08, Oleksandr wrote:
>      >
>      > On 04.10.21 09:59, Julien Grall wrote:
>      >> Hi Oleksandr,
>      >
>      > Hi Julien
> 
>     Hi Oleksandr,
> 
>      >
>      >
>      >>
>      >> I saw Stefano committed this patch on Friday. However, I didn't
>     have a
>      >> chance go to through a second time and would like to request some
>      >> follow-up changes.
>      >
>      > ok, do you prefer the follow-up patch to be pushed separately or
>     within
>      > the rest patches of this series (#1 and #3)?  If the former, I
>     will try
>      > to push it today to close this question.
> 
>     I don't mind. My main ask is they are addressed for 4.16.
> 
>      >
>      >
>      >>
>      >>
>      >> On 30/09/2021 00:52, Oleksandr Tyshchenko wrote:
>      >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com
>     <mailto:oleksandr_tyshchenko@epam.com>>
>      >>>
>      >>> The extended region (safe range) is a region of guest physical
>      >>> address space which is unused and could be safely used to create
>      >>> grant/foreign mappings instead of wasting real RAM pages from
>      >>> the domain memory for establishing these mappings.
>      >>>
>      >>> The extended regions are chosen at the domain creation time and
>      >>> advertised to it via "reg" property under hypervisor node in
>      >>> the guest device-tree. As region 0 is reserved for grant table
>      >>> space (always present), the indexes for extended regions are 1...N.
>      >>> If extended regions could not be allocated for some reason,
>      >>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>      >>>
>      >>> Please note the following limitations:
>      >>> - The extended region feature is only supported for 64-bit domain
>      >>>    currently.
>      >>> - The ACPI case is not covered.
>      >>>
>      >>> ***
>      >>>
>      >>> As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
>      >>> the algorithm to choose extended regions for it is different
>      >>> in comparison with the algorithm for non-direct mapped DomU.
>      >>> What is more, that extended regions should be chosen differently
>      >>> whether IOMMU is enabled or not.
>      >>>
>      >>> Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
>      >>> holes found in host device-tree if otherwise. Make sure that
>      >>> extended regions are 2MB-aligned and located within maximum
>     possible
>      >>> addressable physical memory range. The minimum size of extended
>      >>> region is 64MB.
>      >>
>      >> You explained below why the 128 limits, but I don't see any
>      >> explanation on why 2MB and 64MB.
>      >>
>      >> IIRC, 2MB was to potentally allow superpage mapping. I am not
>     entirely
>      >> sure for 64MB.
>      >>
>      >> Could you add an in-code comment explaining the two limits?
>      >
>      > Yes. There was a discussion at [1]. 64MB was chosen as a reasonable
>      > value to deal with between initial 2MB (we might end up having a
>     lot of
>      > small ranges which are not quite useful but increase bookkeeping)
>     and
>      > suggested 1GB (we might not be able find a suitable regions at all).
> 
>     Ok. Please document in the code. Note that I don't think this choice
>     should be advertised to the OS. This would give us some flexibility to
>     change the size in the future (e.g. if we have platform where chunk of
>     less than 64MB is beneficial).
> 
>      >> The code below looks like an open-coding version of
>      >> dt_for_each_range(). Can you try to re-use it please? This will
>     help
>      >> to reduce the complexity of this function.
>      >
>      > You are right, it makes sense, will definitely reuse. If I was
>     aware of
>      > that function before I would safe some time I spent on the
>     investigation
>      > how to parse that)
> 
>     I have only skimmed through the diff below. This looks fine to me.
>     Please submit a formal patch.
> 
> 
> Already submitted, please take a look at [1].
> 
>   [1] 
> https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/ 
> <https://lore.kernel.org/xen-devel/1633519346-3686-4-git-send-email-olekstysh@gmail.com/>

Thanks! Sorry, my inbox has been growing quite a lot while I was away. I 
will have a look when I am fully back next week :).

Cheers,
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 177e656..5cae4ad 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1081,6 +1081,17 @@  hardware domain is architecture dependent.
 Note that specifying zero as domU value means zero, while for dom0 it means
 to use the default.
 
+### ext_regions (Arm)
+> `= <boolean>`
+
+> Default : `true`
+
+Flag to enable or disable support for extended regions for Dom0.
+
+Extended regions are ranges of unused address space exposed to Dom0 as
+"safe to use" for special memory mappings. Disable if your board device
+tree is incomplete.
+
 ### flask
 > `= permissive | enforcing | late | disabled`
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d233d63..c5afbe2 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -34,6 +34,10 @@ 
 static unsigned int __initdata opt_dom0_max_vcpus;
 integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
 
+/* If true, the extended regions support is enabled for dom0 */
+static bool __initdata opt_ext_regions = true;
+boolean_param("ext_regions", opt_ext_regions);
+
 static u64 __initdata dom0_mem;
 static bool __initdata dom0_mem_set;
 
@@ -886,6 +890,232 @@  static int __init make_memory_node(const struct domain *d,
     return res;
 }
 
+static int __init add_ext_regions(unsigned long s, unsigned long e, void *data)
+{
+    struct meminfo *ext_regions = data;
+    paddr_t start, size;
+
+    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
+        return 0;
+
+    /* Both start and size of the extended region should be 2MB aligned */
+    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
+    if ( start > e )
+        return 0;
+
+    /*
+     * e is actually "end-1" because it is called by rangeset functions
+     * which are inclusive of the last address.
+     */
+    e += 1;
+    size = (e - start) & ~(SZ_2M - 1);
+    if ( size < MB(64) )
+        return 0;
+
+    ext_regions->bank[ext_regions->nr_banks].start = start;
+    ext_regions->bank[ext_regions->nr_banks].size = size;
+    ext_regions->nr_banks++;
+
+    return 0;
+}
+
+static int __init find_unallocated_memory(const struct kernel_info *kinfo,
+                                          struct meminfo *ext_regions)
+{
+    const struct meminfo *assign_mem = &kinfo->mem;
+    struct rangeset *unalloc_mem;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find unallocated memory for extended regions\n");
+
+    unalloc_mem = rangeset_new(NULL, NULL, 0);
+    if ( !unalloc_mem )
+        return -ENOMEM;
+
+    /* Start with all available RAM */
+    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
+    {
+        start = bootinfo.mem.bank[i].start;
+        end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size;
+        res = rangeset_add_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove RAM assigned to Dom0 */
+    for ( i = 0; i < assign_mem->nr_banks; i++ )
+    {
+        start = assign_mem->bank[i].start;
+        end = assign_mem->bank[i].start + assign_mem->bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove reserved-memory regions */
+    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        start = bootinfo.reserved_mem.bank[i].start;
+        end = bootinfo.reserved_mem.bank[i].start +
+            bootinfo.reserved_mem.bank[i].size;
+        res = rangeset_remove_range(unalloc_mem, start, end - 1);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove grant table region */
+    start = kinfo->gnttab_start;
+    end = kinfo->gnttab_start + kinfo->gnttab_size;
+    res = rangeset_remove_range(unalloc_mem, start, end - 1);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(unalloc_mem, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(unalloc_mem);
+
+    return res;
+}
+
+static int __init find_memory_holes(const struct kernel_info *kinfo,
+                                    struct meminfo *ext_regions)
+{
+    struct dt_device_node *np;
+    struct rangeset *mem_holes;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find memory holes for extended regions\n");
+
+    mem_holes = rangeset_new(NULL, NULL, 0);
+    if ( !mem_holes )
+        return -ENOMEM;
+
+    /* Start with maximum possible addressable physical memory range */
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_add_range(mem_holes, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    /*
+     * Remove regions described by "reg" and "ranges" properties where
+     * the memory is addressable (MMIO, RAM, PCI BAR, etc).
+     */
+    dt_for_each_device_node( dt_host, np )
+    {
+        unsigned int naddr;
+        u64 addr, size;
+
+        naddr = dt_number_of_address(np);
+
+        for ( i = 0; i < naddr; i++ )
+        {
+            res = dt_device_get_address(np, i, &addr, &size);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Unable to retrieve address %u for %s\n",
+                       i, dt_node_full_name(np));
+                goto out;
+            }
+
+            start = addr & PAGE_MASK;
+            end = PAGE_ALIGN(addr + size);
+            res = rangeset_remove_range(mem_holes, start, end - 1);
+            if ( res )
+            {
+                printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                       start, end);
+                goto out;
+            }
+        }
+
+        if ( dt_device_type_is_equal(np, "pci" ) )
+        {
+            unsigned int range_size, nr_ranges;
+            int na, ns, pna;
+            const __be32 *ranges;
+            u32 len;
+
+            /*
+             * Looking for non-empty ranges property which in this context
+             * describes the PCI host bridge aperture.
+             */
+            ranges = dt_get_property(np, "ranges", &len);
+            if ( !ranges || !len )
+                continue;
+
+            pna = dt_n_addr_cells(np);
+            na = dt_child_n_addr_cells(np);
+            ns = dt_child_n_size_cells(np);
+            range_size = pna + na + ns;
+            nr_ranges = len / sizeof(__be32) / range_size;
+
+            for ( i = 0; i < nr_ranges; i++, ranges += range_size )
+            {
+                /* Skip the child address and get the parent (CPU) address */
+                addr = dt_read_number(ranges + na, pna);
+                size = dt_read_number(ranges + na + pna, ns);
+
+                start = addr & PAGE_MASK;
+                end = PAGE_ALIGN(addr + size);
+                res = rangeset_remove_range(mem_holes, start, end - 1);
+                if ( res )
+                {
+                    printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+                           start, end);
+                    goto out;
+                }
+            }
+        }
+    }
+
+    start = 0;
+    end = (1ULL << p2m_ipa_bits) - 1;
+    res = rangeset_report_ranges(mem_holes, start, end,
+                                 add_ext_regions,  ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(mem_holes);
+
+    return res;
+}
+
 static int __init make_hypervisor_node(struct domain *d,
                                        const struct kernel_info *kinfo,
                                        int addrcells, int sizecells)
@@ -893,11 +1123,12 @@  static int __init make_hypervisor_node(struct domain *d,
     const char compat[] =
         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
         "xen,xen";
-    __be32 reg[4];
+    __be32 *reg, *cells;
     gic_interrupt_t intr;
-    __be32 *cells;
     int res;
     void *fdt = kinfo->fdt;
+    struct meminfo *ext_regions = NULL;
+    unsigned int i, nr_ext_regions;
 
     dt_dprintk("Create hypervisor node\n");
 
@@ -919,12 +1150,61 @@  static int __init make_hypervisor_node(struct domain *d,
     if ( res )
         return res;
 
+    if ( !opt_ext_regions )
+    {
+        printk(XENLOG_DEBUG "The extended regions support is disabled\n");
+        nr_ext_regions = 0;
+    }
+    else if ( is_32bit_domain(d) )
+    {
+        printk(XENLOG_DEBUG "The extended regions are only supported for 64-bit guest currently\n");
+        nr_ext_regions = 0;
+    }
+    else
+    {
+        ext_regions = xzalloc(struct meminfo);
+        if ( !ext_regions )
+            return -ENOMEM;
+
+        if ( !is_iommu_enabled(d) )
+            res = find_unallocated_memory(kinfo, ext_regions);
+        else
+            res = find_memory_holes(kinfo, ext_regions);
+
+        if ( res )
+            printk(XENLOG_WARNING "Failed to allocate extended regions\n");
+        nr_ext_regions = ext_regions->nr_banks;
+    }
+
+    reg = xzalloc_array(__be32, (nr_ext_regions + 1) * (addrcells + sizecells));
+    if ( !reg )
+    {
+        xfree(ext_regions);
+        return -ENOMEM;
+    }
+
     /* reg 0 is grant table space */
     cells = &reg[0];
     dt_child_set_range(&cells, addrcells, sizecells,
                        kinfo->gnttab_start, kinfo->gnttab_size);
+    /* reg 1...N are extended regions */
+    for ( i = 0; i < nr_ext_regions; i++ )
+    {
+        u64 start = ext_regions->bank[i].start;
+        u64 size = ext_regions->bank[i].size;
+
+        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
+                   i, start, start + size);
+
+        dt_child_set_range(&cells, addrcells, sizecells, start, size);
+    }
+
     res = fdt_property(fdt, "reg", reg,
-                       dt_cells_to_size(addrcells + sizecells));
+                       dt_cells_to_size(addrcells + sizecells) *
+                       (nr_ext_regions + 1));
+    xfree(ext_regions);
+    xfree(reg);
+
     if ( res )
         return res;