diff mbox series

[V5,2/3] libxl/arm: Add handling of extended regions for DomU

Message ID 1633519346-3686-3-git-send-email-olekstysh@gmail.com (mailing list archive)
State Superseded
Headers show
Series Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") | expand

Commit Message

Oleksandr Tyshchenko Oct. 6, 2021, 11:22 a.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

The algorithm to choose extended regions for non-direct mapped
DomU is simpler in comparison with the algorithm for direct mapped
Dom0. We usually have a lot of unused space above 4GB, and might
have some unused space below 4GB (depends on guest memory size).
Try to allocate separate 2MB-aligned extended regions from the first
(below 4GB) and second (above 4GB) RAM banks taking into the account
the maximum supported guest physical address space size and the amount
of memory assigned to the guest. The minimum size of extended region
the same as for Dom0 (64MB).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
! Stefano, Ian, Michal I dropped your A-b/R-b. I decided to change patch
to make it more functional !

Changes RFC -> V2:
   - update patch description
   - drop uneeded "extended-region" DT property
   - clear reg array in finalise_ext_region() and add a TODO

Changes V2 -> V3:
   - update patch description, comments in code
   - only pick up regions with size >= 64MB
   - move the region calculation to make_hypervisor_node() and drop
     finalise_ext_region()
   - extend the list of arguments for make_hypervisor_node()
   - do not show warning for 32-bit domain
   - change the region alignment from 1GB to 2MB
   - move EXT_REGION_SIZE to public/arch-arm.h

Changes V3 -> V4:
   - add R-b, A-b and T-b

Changes V4 -> V5:
   - update patch description and comments in code
   - reflect changes done in previous patch to pass gpaddr_bits
     via createdomain domctl (struct xen_arch_domainconfig)
   - drop R-b, A-b and T-b
   - drop limit for maximum extended region size (128GB)
   - try to also allocate region below 4GB, optimize code
     for calculating extended regions
---
 tools/libs/light/libxl_arm.c  | 80 ++++++++++++++++++++++++++++++++++++++++---
 xen/include/public/arch-arm.h |  2 ++
 2 files changed, 77 insertions(+), 5 deletions(-)

Comments

Ian Jackson Oct. 6, 2021, 11:34 a.m. UTC | #1
Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of extended regions for DomU"):
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.

Please forgive me for asking this question now, but: why is this
ARM-specific ?

Ian.
Oleksandr Tyshchenko Oct. 6, 2021, 12:28 p.m. UTC | #2
On 06.10.21 14:34, Ian Jackson wrote:

Hi Ian

> Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of extended regions for DomU"):
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
> Please forgive me for asking this question now, but: why is this
> ARM-specific ?


Sorry, I can't say for sure which x86 mode also suffers from that. I 
might be wrong, but as I understand that x86 in PVH (and HVM?) mode uses 
unpopulated memory ranges (which are unused from Linux PoV, actually 
everything not yet allocated or reserved from "iomem_resource") to 
create foreign/grant mappings. So the real RAM pages are not ballooned 
out to get an physical address space to create these mappings. The 
problem is that we cannot follow Linux advise which memory ranges are 
unused on Arm for several reasons, this is why this patch series makes 
the hypervisor to start allocating and exposing these ranges.

>
> Ian.
Stefano Stabellini Oct. 7, 2021, midnight UTC | #3
On Wed, 6 Oct 2021, Oleksandr wrote:
> On 06.10.21 14:34, Ian Jackson wrote:
> 
> Hi Ian
> 
> > Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of
> > extended regions for DomU"):
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > Please forgive me for asking this question now, but: why is this
> > ARM-specific ?
> 
> 
> Sorry, I can't say for sure which x86 mode also suffers from that. I might be
> wrong, but as I understand that x86 in PVH (and HVM?) mode uses unpopulated
> memory ranges (which are unused from Linux PoV, actually everything not yet
> allocated or reserved from "iomem_resource") to create foreign/grant mappings.
> So the real RAM pages are not ballooned out to get an physical address space
> to create these mappings. The problem is that we cannot follow Linux advise
> which memory ranges are unused on Arm for several reasons, this is why this
> patch series makes the hypervisor to start allocating and exposing these
> ranges.

Two more things about this being ARM-specific.

Even if x86 was affected exactly by the same problem, the code to expose
the safe memory ranges to DomU is arch-specific (currently device tree.)

Also the code to calculate the safe memory ranges is arch-specific as it
depends on the DomU memory layout which is arch-specific.
Stefano Stabellini Oct. 7, 2021, 1:29 a.m. UTC | #4
On Wed, 6 Oct 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The extended region (safe range) is a region of guest physical
> address space which is unused and could be safely used to create
> grant/foreign mappings instead of wasting real RAM pages from
> the domain memory for establishing these mappings.
> 
> The extended regions are chosen at the domain creation time and
> advertised to it via "reg" property under hypervisor node in
> the guest device-tree. As region 0 is reserved for grant table
> space (always present), the indexes for extended regions are 1...N.
> If extended regions could not be allocated for some reason,
> Xen doesn't fail and behaves as usual, so only inserts region 0.
> 
> Please note the following limitations:
> - The extended region feature is only supported for 64-bit domain
>   currently.
> - The ACPI case is not covered.
> 
> ***
> 
> The algorithm to choose extended regions for non-direct mapped
> DomU is simpler in comparison with the algorithm for direct mapped
> Dom0. We usually have a lot of unused space above 4GB, and might
> have some unused space below 4GB (depends on guest memory size).
> Try to allocate separate 2MB-aligned extended regions from the first
> (below 4GB) and second (above 4GB) RAM banks taking into the account
> the maximum supported guest physical address space size and the amount
> of memory assigned to the guest. The minimum size of extended region
> the same as for Dom0 (64MB).
> 
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> ! Stefano, Ian, Michal I dropped your A-b/R-b. I decided to change patch
> to make it more functional !
> 
> Changes RFC -> V2:
>    - update patch description
>    - drop uneeded "extended-region" DT property
>    - clear reg array in finalise_ext_region() and add a TODO
> 
> Changes V2 -> V3:
>    - update patch description, comments in code
>    - only pick up regions with size >= 64MB
>    - move the region calculation to make_hypervisor_node() and drop
>      finalise_ext_region()
>    - extend the list of arguments for make_hypervisor_node()
>    - do not show warning for 32-bit domain
>    - change the region alignment from 1GB to 2MB
>    - move EXT_REGION_SIZE to public/arch-arm.h
> 
> Changes V3 -> V4:
>    - add R-b, A-b and T-b
> 
> Changes V4 -> V5:
>    - update patch description and comments in code
>    - reflect changes done in previous patch to pass gpaddr_bits
>      via createdomain domctl (struct xen_arch_domainconfig)
>    - drop R-b, A-b and T-b
>    - drop limit for maximum extended region size (128GB)
>    - try to also allocate region below 4GB, optimize code
>      for calculating extended regions
> ---
>  tools/libs/light/libxl_arm.c  | 80 ++++++++++++++++++++++++++++++++++++++++---
>  xen/include/public/arch-arm.h |  2 ++
>  2 files changed, 77 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> index 45e0386..cd743f7 100644
> --- a/tools/libs/light/libxl_arm.c
> +++ b/tools/libs/light/libxl_arm.c
> @@ -600,9 +600,21 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
>      return 0;
>  }
>  
> +#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
> +
>  static int make_hypervisor_node(libxl__gc *gc, void *fdt,
> -                                const libxl_version_info *vers)
> +                                const libxl_version_info *vers,
> +                                const libxl_domain_build_info *b_info,
> +                                const struct xc_dom_image *dom)
>  {
> +    uint64_t region_size[GUEST_RAM_BANKS] = {0}, region_base[GUEST_RAM_BANKS],
> +        banksize[GUEST_RAM_BANKS], bankend[GUEST_RAM_BANKS], ramsize;
> +    uint32_t regs[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
> +                  (GUEST_RAM_BANKS + 1)];
> +    be32 *cells = &regs[0];
> +    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
> +    unsigned int i, len, nr_regions = 0;
> +    uint8_t gpaddr_bits;
>      int res;
>      gic_interrupt intr;
>  
> @@ -617,9 +629,67 @@ static int make_hypervisor_node(libxl__gc *gc, void *fdt,
>                                "xen,xen");
>      if (res) return res;
>  
> -    /* reg 0 is grant table space */
> -    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
> +    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
> +        LOG(WARN, "The extended regions are only supported for 64-bit guest currently");
> +        goto out;
> +    }
> +
> +    gpaddr_bits = b_info->arch_arm.gpaddr_bits;
> +    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
> +
> +    /*
> +     * Try to allocate separate 2MB-aligned extended regions from the first
> +     * (below 4GB) and second (above 4GB) RAM banks taking into the account
> +     * the maximum supported guest physical address space size and the amount
> +     * of memory assigned to the guest.
> +     * As the guest memory layout is not populated yet we cannot rely on
> +     * dom->rambank_size[], so calculate the actual size of both banks using
> +     * "max_memkb" value.
> +     */
> +    ramsize = b_info->max_memkb * 1024;
> +    if (ramsize <= GUEST_RAM0_SIZE) {
> +        banksize[0] = ramsize;
> +        banksize[1] = 0;
> +    } else {
> +        banksize[0] = GUEST_RAM0_SIZE;
> +        banksize[1] = ramsize - GUEST_RAM0_SIZE;
> +    }
> +
> +    bankend[0] = GUEST_RAM0_BASE + GUEST_RAM0_SIZE;
> +    bankend[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
> +
> +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
> +        region_base[i] = bankbase[i] + ALIGN_UP_TO_2MB(banksize[i]);
> +        if (bankend[i] > region_base[i])
> +            region_size[i] = bankend[i] - region_base[i];
> +    }

This seems correct but it looks a bit overkill. I would have written
like this:

    if (ramsize <= GUEST_RAM0_SIZE) {
        region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
        region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
        region_base[1] = GUEST_RAM1_BASE;
        region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) -
                         region_base[0];
    } else {
        region_size[0] = 0;
        region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize - GUEST_RAM0_SIZE);
        region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) -
                         region_base[1];
    }

Which removes the needs for banksize, bankend, bankbase. What do you
think? Your version works too, so I am OK anyway.



> +out:
> +    /*
> +     * The region 0 for grant table space must be always present. If we managed
> +     * to allocate the extended regions then insert them as regions 1...N.
> +     */
> +    set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> +              GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
> +
> +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
> +        if (region_size[i] < GUEST_EXT_REGION_MIN_SIZE)
> +            continue;
> +
> +        LOG(DEBUG, "Extended region %u: %#"PRIx64"->%#"PRIx64"\n",
> +            nr_regions, region_base[i], region_base[i] + region_size[i]);
> +
> +        set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> +                  region_base[i], region_size[i]);
> +        nr_regions ++;
                     ^ code style


> +    }
> +
> +    if (!nr_regions)
> +        LOG(WARN, "The extended regions cannot be allocated, not enough space");
> +
> +    len = sizeof(regs[0]) * (GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
> +        (nr_regions + 1);
> +    res = fdt_property(fdt, "reg", regs, len);
>      if (res) return res;
>  
>      /*
> @@ -965,7 +1035,7 @@ next_resize:
>          }
>  
>          FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
> -        FDT( make_hypervisor_node(gc, fdt, vers) );
> +        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
>  
>          if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
>              FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index 4a01f8b..f74cc0b 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -454,6 +454,8 @@ typedef uint64_t xen_callback_t;
>  #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
>  #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
>  
> +#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
> +
>  /* Current supported guest VCPUs */
>  #define GUEST_MAX_VCPUS 128
>  
> -- 
> 2.7.4
>
Ian Jackson Oct. 7, 2021, 10:57 a.m. UTC | #5
Stefano Stabellini writes ("Re: [PATCH V5 2/3] libxl/arm: Add handling of extended regions for DomU"):
> On Wed, 6 Oct 2021, Oleksandr wrote:
> > On 06.10.21 14:34, Ian Jackson wrote:
> > > Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of
> > > extended regions for DomU"):
> > > > The extended region (safe range) is a region of guest physical
> > > > address space which is unused and could be safely used to create
> > > > grant/foreign mappings instead of wasting real RAM pages from
> > > > the domain memory for establishing these mappings.
> > > Please forgive me for asking this question now, but: why is this
> > > ARM-specific ?
> > 
> > 
> > Sorry, I can't say for sure which x86 mode also suffers from
> > that. I might be wrong, but as I understand that x86 in PVH (and
> > HVM?) mode uses unpopulated memory ranges (which are unused from
> > Linux PoV, actually everything not yet allocated or reserved from
> > "iomem_resource") to create foreign/grant mappings.  So the real
> > RAM pages are not ballooned out to get an physical address space
> > to create these mappings. The problem is that we cannot follow
> > Linux advise which memory ranges are unused on Arm for several
> > reasons, this is why this patch series makes the hypervisor to
> > start allocating and exposing these ranges.

So it sounds like you are saying this is an ARM-specific problem ?
The key being the "several reasons" which you mention.  Are they
ARM-specifc problems.

> Two more things about this being ARM-specific.
> 
> Even if x86 was affected exactly by the same problem, the code to expose
> the safe memory ranges to DomU is arch-specific (currently device tree.)
> 
> Also the code to calculate the safe memory ranges is arch-specific as it
> depends on the DomU memory layout which is arch-specific.

This demonstrates that the implementation is arch-specific.  But one
of libxl's functions is to abstract away implementation details and
provide an interface that can be used to "do the right thing".

Ian.
Oleksandr Tyshchenko Oct. 7, 2021, 2:42 p.m. UTC | #6
On 07.10.21 13:57, Ian Jackson wrote:

Hi Ian.

> Stefano Stabellini writes ("Re: [PATCH V5 2/3] libxl/arm: Add handling of extended regions for DomU"):
>> On Wed, 6 Oct 2021, Oleksandr wrote:
>>> On 06.10.21 14:34, Ian Jackson wrote:
>>>> Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of
>>>> extended regions for DomU"):
>>>>> The extended region (safe range) is a region of guest physical
>>>>> address space which is unused and could be safely used to create
>>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>>> the domain memory for establishing these mappings.
>>>> Please forgive me for asking this question now, but: why is this
>>>> ARM-specific ?
>>>
>>> Sorry, I can't say for sure which x86 mode also suffers from
>>> that. I might be wrong, but as I understand that x86 in PVH (and
>>> HVM?) mode uses unpopulated memory ranges (which are unused from
>>> Linux PoV, actually everything not yet allocated or reserved from
>>> "iomem_resource") to create foreign/grant mappings.  So the real
>>> RAM pages are not ballooned out to get an physical address space
>>> to create these mappings. The problem is that we cannot follow
>>> Linux advise which memory ranges are unused on Arm for several
>>> reasons, this is why this patch series makes the hypervisor to
>>> start allocating and exposing these ranges.
> So it sounds like you are saying this is an ARM-specific problem ?
> The key being the "several reasons" which you mention.  Are they
> ARM-specifc problems.

Yes, you could say that. Below are reasons why we need the hypervisor to 
provide these ranges on Arm from my understanding.

[leaving aside hotplug case]

1. [Related to Dom0]  Dom0 is mapped 1:1 on Arm, but there might be some 
guest mapping, mapped identically in P2M (GFN == MFN) for PV drivers to 
work. So Xen has everything in hand to be able to choose extended 
regions (which won't clash with other resources).
2. [Related to every domain]  Not all device I/O regions might be known 
(registered) by the time the guest starts creating grant/foreign 
mappings, so guest cannot guess by itself what is really unused, but the 
entity which creates the domain (Xen, toolstack) knows these details in 
advance to be able to choose extended regions (which won't clash with 
other resources).


>
>> Two more things about this being ARM-specific.
>>
>> Even if x86 was affected exactly by the same problem, the code to expose
>> the safe memory ranges to DomU is arch-specific (currently device tree.)
>>
>> Also the code to calculate the safe memory ranges is arch-specific as it
>> depends on the DomU memory layout which is arch-specific.
> This demonstrates that the implementation is arch-specific.  But one
> of libxl's functions is to abstract away implementation details and
> provide an interface that can be used to "do the right thing".
>
> Ian.
Oleksandr Tyshchenko Oct. 7, 2021, 4:57 p.m. UTC | #7
On 07.10.21 04:29, Stefano Stabellini wrote:

Hi Stefano

> On Wed, 6 Oct 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The extended region (safe range) is a region of guest physical
>> address space which is unused and could be safely used to create
>> grant/foreign mappings instead of wasting real RAM pages from
>> the domain memory for establishing these mappings.
>>
>> The extended regions are chosen at the domain creation time and
>> advertised to it via "reg" property under hypervisor node in
>> the guest device-tree. As region 0 is reserved for grant table
>> space (always present), the indexes for extended regions are 1...N.
>> If extended regions could not be allocated for some reason,
>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>
>> Please note the following limitations:
>> - The extended region feature is only supported for 64-bit domain
>>    currently.
>> - The ACPI case is not covered.
>>
>> ***
>>
>> The algorithm to choose extended regions for non-direct mapped
>> DomU is simpler in comparison with the algorithm for direct mapped
>> Dom0. We usually have a lot of unused space above 4GB, and might
>> have some unused space below 4GB (depends on guest memory size).
>> Try to allocate separate 2MB-aligned extended regions from the first
>> (below 4GB) and second (above 4GB) RAM banks taking into the account
>> the maximum supported guest physical address space size and the amount
>> of memory assigned to the guest. The minimum size of extended region
>> the same as for Dom0 (64MB).
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> ! Stefano, Ian, Michal I dropped your A-b/R-b. I decided to change patch
>> to make it more functional !
>>
>> Changes RFC -> V2:
>>     - update patch description
>>     - drop uneeded "extended-region" DT property
>>     - clear reg array in finalise_ext_region() and add a TODO
>>
>> Changes V2 -> V3:
>>     - update patch description, comments in code
>>     - only pick up regions with size >= 64MB
>>     - move the region calculation to make_hypervisor_node() and drop
>>       finalise_ext_region()
>>     - extend the list of arguments for make_hypervisor_node()
>>     - do not show warning for 32-bit domain
>>     - change the region alignment from 1GB to 2MB
>>     - move EXT_REGION_SIZE to public/arch-arm.h
>>
>> Changes V3 -> V4:
>>     - add R-b, A-b and T-b
>>
>> Changes V4 -> V5:
>>     - update patch description and comments in code
>>     - reflect changes done in previous patch to pass gpaddr_bits
>>       via createdomain domctl (struct xen_arch_domainconfig)
>>     - drop R-b, A-b and T-b
>>     - drop limit for maximum extended region size (128GB)
>>     - try to also allocate region below 4GB, optimize code
>>       for calculating extended regions
>> ---
>>   tools/libs/light/libxl_arm.c  | 80 ++++++++++++++++++++++++++++++++++++++++---
>>   xen/include/public/arch-arm.h |  2 ++
>>   2 files changed, 77 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
>> index 45e0386..cd743f7 100644
>> --- a/tools/libs/light/libxl_arm.c
>> +++ b/tools/libs/light/libxl_arm.c
>> @@ -600,9 +600,21 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
>>       return 0;
>>   }
>>   
>> +#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
>> +
>>   static int make_hypervisor_node(libxl__gc *gc, void *fdt,
>> -                                const libxl_version_info *vers)
>> +                                const libxl_version_info *vers,
>> +                                const libxl_domain_build_info *b_info,
>> +                                const struct xc_dom_image *dom)
>>   {
>> +    uint64_t region_size[GUEST_RAM_BANKS] = {0}, region_base[GUEST_RAM_BANKS],
>> +        banksize[GUEST_RAM_BANKS], bankend[GUEST_RAM_BANKS], ramsize;
>> +    uint32_t regs[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
>> +                  (GUEST_RAM_BANKS + 1)];
>> +    be32 *cells = &regs[0];
>> +    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
>> +    unsigned int i, len, nr_regions = 0;
>> +    uint8_t gpaddr_bits;
>>       int res;
>>       gic_interrupt intr;
>>   
>> @@ -617,9 +629,67 @@ static int make_hypervisor_node(libxl__gc *gc, void *fdt,
>>                                 "xen,xen");
>>       if (res) return res;
>>   
>> -    /* reg 0 is grant table space */
>> -    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
>> -                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
>> +    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
>> +        LOG(WARN, "The extended regions are only supported for 64-bit guest currently");
>> +        goto out;
>> +    }
>> +
>> +    gpaddr_bits = b_info->arch_arm.gpaddr_bits;
>> +    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
>> +
>> +    /*
>> +     * Try to allocate separate 2MB-aligned extended regions from the first
>> +     * (below 4GB) and second (above 4GB) RAM banks taking into the account
>> +     * the maximum supported guest physical address space size and the amount
>> +     * of memory assigned to the guest.
>> +     * As the guest memory layout is not populated yet we cannot rely on
>> +     * dom->rambank_size[], so calculate the actual size of both banks using
>> +     * "max_memkb" value.
>> +     */
>> +    ramsize = b_info->max_memkb * 1024;
>> +    if (ramsize <= GUEST_RAM0_SIZE) {
>> +        banksize[0] = ramsize;
>> +        banksize[1] = 0;
>> +    } else {
>> +        banksize[0] = GUEST_RAM0_SIZE;
>> +        banksize[1] = ramsize - GUEST_RAM0_SIZE;
>> +    }
>> +
>> +    bankend[0] = GUEST_RAM0_BASE + GUEST_RAM0_SIZE;
>> +    bankend[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
>> +
>> +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
>> +        region_base[i] = bankbase[i] + ALIGN_UP_TO_2MB(banksize[i]);
>> +        if (bankend[i] > region_base[i])
>> +            region_size[i] = bankend[i] - region_base[i];
>> +    }
> This seems correct but it looks a bit overkill. I would have written
> like this:
>
>      if (ramsize <= GUEST_RAM0_SIZE) {
>          region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
>          region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
>          region_base[1] = GUEST_RAM1_BASE;
>          region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) -
>                           region_base[0];
Why  "- region_base[0]" in last expression? I think it should be "- 
region_base[1]", the same as for "else" case (so it can be moved out of 
if-else construct). Also we need to check

that min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) is 
greater than region_base[1] before the subtraction. If gpaddr_bits = 32 
(on Arm64) we will get incorrect result.


>      } else {
>          region_size[0] = 0;
>          region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize - GUEST_RAM0_SIZE);
>          region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) -
>                           region_base[1];
>      }
>
> Which removes the needs for banksize, bankend, bankbase. What do you
> think? Your version works too, so I am OK anyway.
Thank you for looking into this.

I think, you version will also work with adjustments. I am OK either 
way. Your version reduces the number of locals, so probably better.  
Although "min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE)" 
construction probably wants latching in local bank1end.


Below the updated version:


     if (ramsize <= GUEST_RAM0_SIZE) {
         region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
         region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
         region_base[1] = GUEST_RAM1_BASE;
     } else
         region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize - 
GUEST_RAM0_SIZE);

     bank1end = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
     if (bank1end > region_base[1])
         region_size[1] = bank1end - region_base[1];




>
>
>
>> +out:
>> +    /*
>> +     * The region 0 for grant table space must be always present. If we managed
>> +     * to allocate the extended regions then insert them as regions 1...N.
>> +     */
>> +    set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
>> +              GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
>> +
>> +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
>> +        if (region_size[i] < GUEST_EXT_REGION_MIN_SIZE)
>> +            continue;
>> +
>> +        LOG(DEBUG, "Extended region %u: %#"PRIx64"->%#"PRIx64"\n",
>> +            nr_regions, region_base[i], region_base[i] + region_size[i]);
>> +
>> +        set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
>> +                  region_base[i], region_size[i]);
>> +        nr_regions ++;
>                       ^ code style

ok


>
>
>> +    }
>> +
>> +    if (!nr_regions)
>> +        LOG(WARN, "The extended regions cannot be allocated, not enough space");
>> +
>> +    len = sizeof(regs[0]) * (GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
>> +        (nr_regions + 1);
>> +    res = fdt_property(fdt, "reg", regs, len);
>>       if (res) return res;
>>   
>>       /*
>> @@ -965,7 +1035,7 @@ next_resize:
>>           }
>>   
>>           FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
>> -        FDT( make_hypervisor_node(gc, fdt, vers) );
>> +        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
>>   
>>           if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
>>               FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
>> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
>> index 4a01f8b..f74cc0b 100644
>> --- a/xen/include/public/arch-arm.h
>> +++ b/xen/include/public/arch-arm.h
>> @@ -454,6 +454,8 @@ typedef uint64_t xen_callback_t;
>>   #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
>>   #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
>>   
>> +#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
>> +
>>   /* Current supported guest VCPUs */
>>   #define GUEST_MAX_VCPUS 128
>>   
>> -- 
>> 2.7.4
>>
Stefano Stabellini Oct. 7, 2021, 8:29 p.m. UTC | #8
On Thu, 7 Oct 2021, Oleksandr wrote:
> On 07.10.21 04:29, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> > On Wed, 6 Oct 2021, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > The extended region (safe range) is a region of guest physical
> > > address space which is unused and could be safely used to create
> > > grant/foreign mappings instead of wasting real RAM pages from
> > > the domain memory for establishing these mappings.
> > > 
> > > The extended regions are chosen at the domain creation time and
> > > advertised to it via "reg" property under hypervisor node in
> > > the guest device-tree. As region 0 is reserved for grant table
> > > space (always present), the indexes for extended regions are 1...N.
> > > If extended regions could not be allocated for some reason,
> > > Xen doesn't fail and behaves as usual, so only inserts region 0.
> > > 
> > > Please note the following limitations:
> > > - The extended region feature is only supported for 64-bit domain
> > >    currently.
> > > - The ACPI case is not covered.
> > > 
> > > ***
> > > 
> > > The algorithm to choose extended regions for non-direct mapped
> > > DomU is simpler in comparison with the algorithm for direct mapped
> > > Dom0. We usually have a lot of unused space above 4GB, and might
> > > have some unused space below 4GB (depends on guest memory size).
> > > Try to allocate separate 2MB-aligned extended regions from the first
> > > (below 4GB) and second (above 4GB) RAM banks taking into the account
> > > the maximum supported guest physical address space size and the amount
> > > of memory assigned to the guest. The minimum size of extended region
> > > the same as for Dom0 (64MB).
> > > 
> > > Suggested-by: Julien Grall <jgrall@amazon.com>
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > ! Stefano, Ian, Michal I dropped your A-b/R-b. I decided to change patch
> > > to make it more functional !
> > > 
> > > Changes RFC -> V2:
> > >     - update patch description
> > >     - drop uneeded "extended-region" DT property
> > >     - clear reg array in finalise_ext_region() and add a TODO
> > > 
> > > Changes V2 -> V3:
> > >     - update patch description, comments in code
> > >     - only pick up regions with size >= 64MB
> > >     - move the region calculation to make_hypervisor_node() and drop
> > >       finalise_ext_region()
> > >     - extend the list of arguments for make_hypervisor_node()
> > >     - do not show warning for 32-bit domain
> > >     - change the region alignment from 1GB to 2MB
> > >     - move EXT_REGION_SIZE to public/arch-arm.h
> > > 
> > > Changes V3 -> V4:
> > >     - add R-b, A-b and T-b
> > > 
> > > Changes V4 -> V5:
> > >     - update patch description and comments in code
> > >     - reflect changes done in previous patch to pass gpaddr_bits
> > >       via createdomain domctl (struct xen_arch_domainconfig)
> > >     - drop R-b, A-b and T-b
> > >     - drop limit for maximum extended region size (128GB)
> > >     - try to also allocate region below 4GB, optimize code
> > >       for calculating extended regions
> > > ---
> > >   tools/libs/light/libxl_arm.c  | 80
> > > ++++++++++++++++++++++++++++++++++++++++---
> > >   xen/include/public/arch-arm.h |  2 ++
> > >   2 files changed, 77 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> > > index 45e0386..cd743f7 100644
> > > --- a/tools/libs/light/libxl_arm.c
> > > +++ b/tools/libs/light/libxl_arm.c
> > > @@ -600,9 +600,21 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
> > >       return 0;
> > >   }
> > >   +#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
> > > +
> > >   static int make_hypervisor_node(libxl__gc *gc, void *fdt,
> > > -                                const libxl_version_info *vers)
> > > +                                const libxl_version_info *vers,
> > > +                                const libxl_domain_build_info *b_info,
> > > +                                const struct xc_dom_image *dom)
> > >   {
> > > +    uint64_t region_size[GUEST_RAM_BANKS] = {0},
> > > region_base[GUEST_RAM_BANKS],
> > > +        banksize[GUEST_RAM_BANKS], bankend[GUEST_RAM_BANKS], ramsize;
> > > +    uint32_t regs[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
> > > +                  (GUEST_RAM_BANKS + 1)];
> > > +    be32 *cells = &regs[0];
> > > +    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
> > > +    unsigned int i, len, nr_regions = 0;
> > > +    uint8_t gpaddr_bits;
> > >       int res;
> > >       gic_interrupt intr;
> > >   @@ -617,9 +629,67 @@ static int make_hypervisor_node(libxl__gc *gc, void
> > > *fdt,
> > >                                 "xen,xen");
> > >       if (res) return res;
> > >   -    /* reg 0 is grant table space */
> > > -    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS,
> > > GUEST_ROOT_SIZE_CELLS,
> > > -                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
> > > +    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
> > > +        LOG(WARN, "The extended regions are only supported for 64-bit
> > > guest currently");
> > > +        goto out;
> > > +    }
> > > +
> > > +    gpaddr_bits = b_info->arch_arm.gpaddr_bits;
> > > +    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
> > > +
> > > +    /*
> > > +     * Try to allocate separate 2MB-aligned extended regions from the
> > > first
> > > +     * (below 4GB) and second (above 4GB) RAM banks taking into the
> > > account
> > > +     * the maximum supported guest physical address space size and the
> > > amount
> > > +     * of memory assigned to the guest.
> > > +     * As the guest memory layout is not populated yet we cannot rely on
> > > +     * dom->rambank_size[], so calculate the actual size of both banks
> > > using
> > > +     * "max_memkb" value.
> > > +     */
> > > +    ramsize = b_info->max_memkb * 1024;
> > > +    if (ramsize <= GUEST_RAM0_SIZE) {
> > > +        banksize[0] = ramsize;
> > > +        banksize[1] = 0;
> > > +    } else {
> > > +        banksize[0] = GUEST_RAM0_SIZE;
> > > +        banksize[1] = ramsize - GUEST_RAM0_SIZE;
> > > +    }
> > > +
> > > +    bankend[0] = GUEST_RAM0_BASE + GUEST_RAM0_SIZE;
> > > +    bankend[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
> > > GUEST_RAM1_SIZE);
> > > +
> > > +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
> > > +        region_base[i] = bankbase[i] + ALIGN_UP_TO_2MB(banksize[i]);
> > > +        if (bankend[i] > region_base[i])
> > > +            region_size[i] = bankend[i] - region_base[i];
> > > +    }
> > This seems correct but it looks a bit overkill. I would have written
> > like this:
> > 
> >      if (ramsize <= GUEST_RAM0_SIZE) {
> >          region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
> >          region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
> >          region_base[1] = GUEST_RAM1_BASE;
> >          region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
> > GUEST_RAM1_SIZE) -
> >                           region_base[0];
> Why  "- region_base[0]" in last expression? I think it should be "-
> region_base[1]", the same as for "else" case (so it can be moved out of
> if-else construct). Also we need to check
> 
> that min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) is greater
> than region_base[1] before the subtraction. If gpaddr_bits = 32 (on Arm64) we
> will get incorrect result.
> 
> 
> >      } else {
> >          region_size[0] = 0;
> >          region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize -
> > GUEST_RAM0_SIZE);
> >          region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
> > GUEST_RAM1_SIZE) -
> >                           region_base[1];
> >      }
> > 
> > Which removes the needs for banksize, bankend, bankbase. What do you
> > think? Your version works too, so I am OK anyway.
> Thank you for looking into this.
> 
> I think, you version will also work with adjustments. I am OK either way. Your
> version reduces the number of locals, so probably better.  Although "min(1ULL
> << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE)" construction probably
> wants latching in local bank1end.
> 
> 
> Below the updated version:
> 
>     if (ramsize <= GUEST_RAM0_SIZE) {
>         region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
>         region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
>         region_base[1] = GUEST_RAM1_BASE;
>     } else
>         region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize - GUEST_RAM0_SIZE);
> 
>     bank1end = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
>     if (bank1end > region_base[1])
>         region_size[1] = bank1end - region_base[1];


Yeah I like this. I'd be happy to go with it.
Stefano Stabellini Oct. 7, 2021, 8:37 p.m. UTC | #9
On Thu, 7 Oct 2021, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCH V5 2/3] libxl/arm: Add handling of extended regions for DomU"):
> > On Wed, 6 Oct 2021, Oleksandr wrote:
> > > On 06.10.21 14:34, Ian Jackson wrote:
> > > > Oleksandr Tyshchenko writes ("[PATCH V5 2/3] libxl/arm: Add handling of
> > > > extended regions for DomU"):
> > > > > The extended region (safe range) is a region of guest physical
> > > > > address space which is unused and could be safely used to create
> > > > > grant/foreign mappings instead of wasting real RAM pages from
> > > > > the domain memory for establishing these mappings.
> > > > Please forgive me for asking this question now, but: why is this
> > > > ARM-specific ?
> > > 
> > > 
> > > Sorry, I can't say for sure which x86 mode also suffers from
> > > that. I might be wrong, but as I understand that x86 in PVH (and
> > > HVM?) mode uses unpopulated memory ranges (which are unused from
> > > Linux PoV, actually everything not yet allocated or reserved from
> > > "iomem_resource") to create foreign/grant mappings.  So the real
> > > RAM pages are not ballooned out to get an physical address space
> > > to create these mappings. The problem is that we cannot follow
> > > Linux advise which memory ranges are unused on Arm for several
> > > reasons, this is why this patch series makes the hypervisor to
> > > start allocating and exposing these ranges.
> 
> So it sounds like you are saying this is an ARM-specific problem ?
> The key being the "several reasons" which you mention.  Are they
> ARM-specifc problems.
> 
> > Two more things about this being ARM-specific.
> > 
> > Even if x86 was affected exactly by the same problem, the code to expose
> > the safe memory ranges to DomU is arch-specific (currently device tree.)
> > 
> > Also the code to calculate the safe memory ranges is arch-specific as it
> > depends on the DomU memory layout which is arch-specific.
> 
> This demonstrates that the implementation is arch-specific.  But one
> of libxl's functions is to abstract away implementation details and
> provide an interface that can be used to "do the right thing".

That's fair enough, I understand your question a bit better now.

I am not certain whether x86 has the same issue. But if it does, I am
pretty sure both the strategy to address the problem and the code would
be very different. At the moment, I cannot imagine how we could share
anything in this patch with x86, especially given the difference in
firmware interfaces (ACPI vs DT).

But I could see a theoretical third architecture (RISC-V?) with device
tree support potentially making use of make_hypervisor_node. But then at
the libxl API I don't imagine the API would look different or would need
to change.
Oleksandr Tyshchenko Oct. 7, 2021, 8:55 p.m. UTC | #10
On 07.10.21 23:29, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 7 Oct 2021, Oleksandr wrote:
>> On 07.10.21 04:29, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>>> On Wed, 6 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> The extended region (safe range) is a region of guest physical
>>>> address space which is unused and could be safely used to create
>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>> the domain memory for establishing these mappings.
>>>>
>>>> The extended regions are chosen at the domain creation time and
>>>> advertised to it via "reg" property under hypervisor node in
>>>> the guest device-tree. As region 0 is reserved for grant table
>>>> space (always present), the indexes for extended regions are 1...N.
>>>> If extended regions could not be allocated for some reason,
>>>> Xen doesn't fail and behaves as usual, so only inserts region 0.
>>>>
>>>> Please note the following limitations:
>>>> - The extended region feature is only supported for 64-bit domain
>>>>     currently.
>>>> - The ACPI case is not covered.
>>>>
>>>> ***
>>>>
>>>> The algorithm to choose extended regions for non-direct mapped
>>>> DomU is simpler in comparison with the algorithm for direct mapped
>>>> Dom0. We usually have a lot of unused space above 4GB, and might
>>>> have some unused space below 4GB (depends on guest memory size).
>>>> Try to allocate separate 2MB-aligned extended regions from the first
>>>> (below 4GB) and second (above 4GB) RAM banks taking into the account
>>>> the maximum supported guest physical address space size and the amount
>>>> of memory assigned to the guest. The minimum size of extended region
>>>> the same as for Dom0 (64MB).
>>>>
>>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> ! Stefano, Ian, Michal I dropped your A-b/R-b. I decided to change patch
>>>> to make it more functional !
>>>>
>>>> Changes RFC -> V2:
>>>>      - update patch description
>>>>      - drop uneeded "extended-region" DT property
>>>>      - clear reg array in finalise_ext_region() and add a TODO
>>>>
>>>> Changes V2 -> V3:
>>>>      - update patch description, comments in code
>>>>      - only pick up regions with size >= 64MB
>>>>      - move the region calculation to make_hypervisor_node() and drop
>>>>        finalise_ext_region()
>>>>      - extend the list of arguments for make_hypervisor_node()
>>>>      - do not show warning for 32-bit domain
>>>>      - change the region alignment from 1GB to 2MB
>>>>      - move EXT_REGION_SIZE to public/arch-arm.h
>>>>
>>>> Changes V3 -> V4:
>>>>      - add R-b, A-b and T-b
>>>>
>>>> Changes V4 -> V5:
>>>>      - update patch description and comments in code
>>>>      - reflect changes done in previous patch to pass gpaddr_bits
>>>>        via createdomain domctl (struct xen_arch_domainconfig)
>>>>      - drop R-b, A-b and T-b
>>>>      - drop limit for maximum extended region size (128GB)
>>>>      - try to also allocate region below 4GB, optimize code
>>>>        for calculating extended regions
>>>> ---
>>>>    tools/libs/light/libxl_arm.c  | 80
>>>> ++++++++++++++++++++++++++++++++++++++++---
>>>>    xen/include/public/arch-arm.h |  2 ++
>>>>    2 files changed, 77 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
>>>> index 45e0386..cd743f7 100644
>>>> --- a/tools/libs/light/libxl_arm.c
>>>> +++ b/tools/libs/light/libxl_arm.c
>>>> @@ -600,9 +600,21 @@ static int make_timer_node(libxl__gc *gc, void *fdt,
>>>>        return 0;
>>>>    }
>>>>    +#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
>>>> +
>>>>    static int make_hypervisor_node(libxl__gc *gc, void *fdt,
>>>> -                                const libxl_version_info *vers)
>>>> +                                const libxl_version_info *vers,
>>>> +                                const libxl_domain_build_info *b_info,
>>>> +                                const struct xc_dom_image *dom)
>>>>    {
>>>> +    uint64_t region_size[GUEST_RAM_BANKS] = {0},
>>>> region_base[GUEST_RAM_BANKS],
>>>> +        banksize[GUEST_RAM_BANKS], bankend[GUEST_RAM_BANKS], ramsize;
>>>> +    uint32_t regs[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
>>>> +                  (GUEST_RAM_BANKS + 1)];
>>>> +    be32 *cells = &regs[0];
>>>> +    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
>>>> +    unsigned int i, len, nr_regions = 0;
>>>> +    uint8_t gpaddr_bits;
>>>>        int res;
>>>>        gic_interrupt intr;
>>>>    @@ -617,9 +629,67 @@ static int make_hypervisor_node(libxl__gc *gc, void
>>>> *fdt,
>>>>                                  "xen,xen");
>>>>        if (res) return res;
>>>>    -    /* reg 0 is grant table space */
>>>> -    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS,
>>>> GUEST_ROOT_SIZE_CELLS,
>>>> -                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
>>>> +    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
>>>> +        LOG(WARN, "The extended regions are only supported for 64-bit
>>>> guest currently");
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    gpaddr_bits = b_info->arch_arm.gpaddr_bits;
>>>> +    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
>>>> +
>>>> +    /*
>>>> +     * Try to allocate separate 2MB-aligned extended regions from the
>>>> first
>>>> +     * (below 4GB) and second (above 4GB) RAM banks taking into the
>>>> account
>>>> +     * the maximum supported guest physical address space size and the
>>>> amount
>>>> +     * of memory assigned to the guest.
>>>> +     * As the guest memory layout is not populated yet we cannot rely on
>>>> +     * dom->rambank_size[], so calculate the actual size of both banks
>>>> using
>>>> +     * "max_memkb" value.
>>>> +     */
>>>> +    ramsize = b_info->max_memkb * 1024;
>>>> +    if (ramsize <= GUEST_RAM0_SIZE) {
>>>> +        banksize[0] = ramsize;
>>>> +        banksize[1] = 0;
>>>> +    } else {
>>>> +        banksize[0] = GUEST_RAM0_SIZE;
>>>> +        banksize[1] = ramsize - GUEST_RAM0_SIZE;
>>>> +    }
>>>> +
>>>> +    bankend[0] = GUEST_RAM0_BASE + GUEST_RAM0_SIZE;
>>>> +    bankend[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
>>>> GUEST_RAM1_SIZE);
>>>> +
>>>> +    for (i = 0; i < GUEST_RAM_BANKS; i++) {
>>>> +        region_base[i] = bankbase[i] + ALIGN_UP_TO_2MB(banksize[i]);
>>>> +        if (bankend[i] > region_base[i])
>>>> +            region_size[i] = bankend[i] - region_base[i];
>>>> +    }
>>> This seems correct but it looks a bit overkill. I would have written
>>> like this:
>>>
>>>       if (ramsize <= GUEST_RAM0_SIZE) {
>>>           region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
>>>           region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
>>>           region_base[1] = GUEST_RAM1_BASE;
>>>           region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
>>> GUEST_RAM1_SIZE) -
>>>                            region_base[0];
>> Why  "- region_base[0]" in last expression? I think it should be "-
>> region_base[1]", the same as for "else" case (so it can be moved out of
>> if-else construct). Also we need to check
>>
>> that min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE) is greater
>> than region_base[1] before the subtraction. If gpaddr_bits = 32 (on Arm64) we
>> will get incorrect result.
>>
>>
>>>       } else {
>>>           region_size[0] = 0;
>>>           region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize -
>>> GUEST_RAM0_SIZE);
>>>           region_size[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE +
>>> GUEST_RAM1_SIZE) -
>>>                            region_base[1];
>>>       }
>>>
>>> Which removes the needs for banksize, bankend, bankbase. What do you
>>> think? Your version works too, so I am OK anyway.
>> Thank you for looking into this.
>>
>> I think, you version will also work with adjustments. I am OK either way. Your
>> version reduces the number of locals, so probably better.  Although "min(1ULL
>> << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE)" construction probably
>> wants latching in local bank1end.
>>
>>
>> Below the updated version:
>>
>>      if (ramsize <= GUEST_RAM0_SIZE) {
>>          region_base[0] = GUEST_RAM0_BASE + ALIGN_UP_TO_2MB(ramsize);
>>          region_size[0] = GUEST_RAM0_SIZE - ALIGN_UP_TO_2MB(ramsize);
>>          region_base[1] = GUEST_RAM1_BASE;
>>      } else
>>          region_base[1] = GUEST_RAM1_BASE + ALIGN_UP_TO_2MB(ramsize - GUEST_RAM0_SIZE);
>>
>>      bank1end = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
>>      if (bank1end > region_base[1])
>>          region_size[1] = bank1end - region_base[1];
>
> Yeah I like this. I'd be happy to go with it.

Great, thank you, will update.
diff mbox series

Patch

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 45e0386..cd743f7 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -600,9 +600,21 @@  static int make_timer_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+#define ALIGN_UP_TO_2MB(x)   (((x) + MB(2) - 1) & (~(MB(2) - 1)))
+
 static int make_hypervisor_node(libxl__gc *gc, void *fdt,
-                                const libxl_version_info *vers)
+                                const libxl_version_info *vers,
+                                const libxl_domain_build_info *b_info,
+                                const struct xc_dom_image *dom)
 {
+    uint64_t region_size[GUEST_RAM_BANKS] = {0}, region_base[GUEST_RAM_BANKS],
+        banksize[GUEST_RAM_BANKS], bankend[GUEST_RAM_BANKS], ramsize;
+    uint32_t regs[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
+                  (GUEST_RAM_BANKS + 1)];
+    be32 *cells = &regs[0];
+    const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
+    unsigned int i, len, nr_regions = 0;
+    uint8_t gpaddr_bits;
     int res;
     gic_interrupt intr;
 
@@ -617,9 +629,67 @@  static int make_hypervisor_node(libxl__gc *gc, void *fdt,
                               "xen,xen");
     if (res) return res;
 
-    /* reg 0 is grant table space */
-    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                            1,GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
+    if (strcmp(dom->guest_type, "xen-3.0-aarch64")) {
+        LOG(WARN, "The extended regions are only supported for 64-bit guest currently");
+        goto out;
+    }
+
+    gpaddr_bits = b_info->arch_arm.gpaddr_bits;
+    assert(gpaddr_bits >= 32 && gpaddr_bits <= 48);
+
+    /*
+     * Try to allocate separate 2MB-aligned extended regions from the first
+     * (below 4GB) and second (above 4GB) RAM banks taking into the account
+     * the maximum supported guest physical address space size and the amount
+     * of memory assigned to the guest.
+     * As the guest memory layout is not populated yet we cannot rely on
+     * dom->rambank_size[], so calculate the actual size of both banks using
+     * "max_memkb" value.
+     */
+    ramsize = b_info->max_memkb * 1024;
+    if (ramsize <= GUEST_RAM0_SIZE) {
+        banksize[0] = ramsize;
+        banksize[1] = 0;
+    } else {
+        banksize[0] = GUEST_RAM0_SIZE;
+        banksize[1] = ramsize - GUEST_RAM0_SIZE;
+    }
+
+    bankend[0] = GUEST_RAM0_BASE + GUEST_RAM0_SIZE;
+    bankend[1] = min(1ULL << gpaddr_bits, GUEST_RAM1_BASE + GUEST_RAM1_SIZE);
+
+    for (i = 0; i < GUEST_RAM_BANKS; i++) {
+        region_base[i] = bankbase[i] + ALIGN_UP_TO_2MB(banksize[i]);
+        if (bankend[i] > region_base[i])
+            region_size[i] = bankend[i] - region_base[i];
+    }
+
+out:
+    /*
+     * The region 0 for grant table space must be always present. If we managed
+     * to allocate the extended regions then insert them as regions 1...N.
+     */
+    set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+              GUEST_GNTTAB_BASE, GUEST_GNTTAB_SIZE);
+
+    for (i = 0; i < GUEST_RAM_BANKS; i++) {
+        if (region_size[i] < GUEST_EXT_REGION_MIN_SIZE)
+            continue;
+
+        LOG(DEBUG, "Extended region %u: %#"PRIx64"->%#"PRIx64"\n",
+            nr_regions, region_base[i], region_base[i] + region_size[i]);
+
+        set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                  region_base[i], region_size[i]);
+        nr_regions ++;
+    }
+
+    if (!nr_regions)
+        LOG(WARN, "The extended regions cannot be allocated, not enough space");
+
+    len = sizeof(regs[0]) * (GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
+        (nr_regions + 1);
+    res = fdt_property(fdt, "reg", regs, len);
     if (res) return res;
 
     /*
@@ -965,7 +1035,7 @@  next_resize:
         }
 
         FDT( make_timer_node(gc, fdt, ainfo, state->clock_frequency) );
-        FDT( make_hypervisor_node(gc, fdt, vers) );
+        FDT( make_hypervisor_node(gc, fdt, vers, info, dom) );
 
         if (info->arch_arm.vuart == LIBXL_VUART_TYPE_SBSA_UART)
             FDT( make_vpl011_uart_node(gc, fdt, ainfo, dom) );
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 4a01f8b..f74cc0b 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -454,6 +454,8 @@  typedef uint64_t xen_callback_t;
 #define GUEST_RAM_BANK_BASES   { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
 #define GUEST_RAM_BANK_SIZES   { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
 
+#define GUEST_EXT_REGION_MIN_SIZE   xen_mk_ullong(0x0004000000) /* 64MB */
+
 /* Current supported guest VCPUs */
 #define GUEST_MAX_VCPUS 128