diff mbox series

[v2,3/3] xen/heap: pass order to free_heap_pages() in heap init

Message ID 20220715170312.13931-4-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen/mm: Optimize init_heap_pages() | expand

Commit Message

Julien Grall July 15, 2022, 5:03 p.m. UTC
From: Hongyan Xia <hongyxia@amazon.com>

The idea is to split the range into multiple aligned power-of-2 regions
which only needs to call free_heap_pages() once each. We check the least
significant set bit of the start address and use its bit index as the
order of this increment. This makes sure that each increment is both
power-of-2 and properly aligned, which can be safely passed to
free_heap_pages(). Of course, the order also needs to be sanity checked
against the upper bound and MAX_ORDER.

Tested on a nested environment on c5.metal with various amount
of RAM and CONFIG_DEBUG=n. Time for end_boot_allocator() to complete:
            Before         After
    - 90GB: 1445 ms         96 ms
    -  8GB:  126 ms          8 ms
    -  4GB:   62 ms          4 ms

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changes in v2:
    - Update comment
    - Update the numbers. They are slightly better as is_contig_page()
      has been folded in init_heap_pages().
---
 xen/common/page_alloc.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

Comments

Wei Chen July 18, 2022, 8:38 a.m. UTC | #1
Hi Julien,

On 2022/7/16 1:03, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> The idea is to split the range into multiple aligned power-of-2 regions
> which only needs to call free_heap_pages() once each. We check the least
> significant set bit of the start address and use its bit index as the
> order of this increment. This makes sure that each increment is both
> power-of-2 and properly aligned, which can be safely passed to
> free_heap_pages(). Of course, the order also needs to be sanity checked
> against the upper bound and MAX_ORDER.
> 
> Tested on a nested environment on c5.metal with various amount
> of RAM and CONFIG_DEBUG=n. Time for end_boot_allocator() to complete:
>              Before         After
>      - 90GB: 1445 ms         96 ms
>      -  8GB:  126 ms          8 ms
>      -  4GB:   62 ms          4 ms
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
> Changes in v2:
>      - Update comment
>      - Update the numbers. They are slightly better as is_contig_page()
>        has been folded in init_heap_pages().
> ---
>   xen/common/page_alloc.c | 35 ++++++++++++++++++++++++++++++++---
>   1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index eedb2fed77c3..2b99801d2ea3 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1779,7 +1779,7 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>   
>   /*
>    * This function should only be called with valid pages from the same NUMA
> - * node.
> + * node and zone.
>    */
>   static void _init_heap_pages(const struct page_info *pg,
>                                unsigned long nr_pages,
> @@ -1806,8 +1806,22 @@ static void _init_heap_pages(const struct page_info *pg,
>   
>       while ( s < e )
>       {
> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
> -        s += 1UL;
> +        /*
> +         * For s == 0, we simply use the largest increment by checking the
> +         * MSB of the region size. For s != 0, we also need to ensure that the
> +         * chunk is properly sized to end at power-of-two alignment. We do this
> +         * by checking the LSB of the start address and use its index as
> +         * the increment. Both cases need to be guarded by MAX_ORDER.
> +         *
> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
> +         * to decrement it by 1.
> +         */
> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
> +
> +        if ( s )
> +            inc_order = min(inc_order, ffsl(s) - 1);
> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
> +        s += (1UL << inc_order);
>       }
>   }
>   
> @@ -1844,6 +1858,9 @@ static void init_heap_pages(
>   
>       for ( i = 0; i < nr_pages; )
>       {
> +#ifdef CONFIG_SEPARATE_XENHEAP
> +        unsigned int zone = page_to_zone(pg);
> +#endif
>           unsigned int nid = phys_to_nid(page_to_maddr(pg));
>           unsigned long left = nr_pages - i;
>           unsigned long contig_pages;
> @@ -1856,6 +1873,18 @@ static void init_heap_pages(
>            */
>           for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>           {
> +            /*
> +             * No need to check for the zone when !CONFIG_SEPARATE_XENHEAP
> +             * because free_heap_pages() can only take power-of-two ranges
> +             * which never cross zone boundaries. But for separate xenheap
> +             * which is manually defined, it is possible for power-of-two
> +             * range to cross zones.
> +             */
> +#ifdef CONFIG_SEPARATE_XENHEAP
> +            if ( zone != page_to_zone(pg) )
> +                break;
> +#endif
> +
>               if ( nid != (phys_to_nid(page_to_maddr(pg))) )
>                   break;
>           }

Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Jan Beulich July 18, 2022, 9:43 a.m. UTC | #2
On 15.07.2022 19:03, Julien Grall wrote:
> @@ -1806,8 +1806,22 @@ static void _init_heap_pages(const struct page_info *pg,
>  
>      while ( s < e )
>      {
> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
> -        s += 1UL;
> +        /*
> +         * For s == 0, we simply use the largest increment by checking the
> +         * MSB of the region size. For s != 0, we also need to ensure that the
> +         * chunk is properly sized to end at power-of-two alignment. We do this
> +         * by checking the LSB of the start address and use its index as
> +         * the increment. Both cases need to be guarded by MAX_ORDER.

s/guarded/bounded/ ?

> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
> +         * to decrement it by 1.
> +         */
> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);

unsigned int, since ...

> +        if ( s )
> +            inc_order = min(inc_order, ffsl(s) - 1);
> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);

... that's what the function parameter says, and the value also can go
negative? Preferably with these adjustments
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Julien Grall July 18, 2022, 10:24 a.m. UTC | #3
Hi Jan,

On 18/07/2022 10:43, Jan Beulich wrote:
> On 15.07.2022 19:03, Julien Grall wrote:
>> @@ -1806,8 +1806,22 @@ static void _init_heap_pages(const struct page_info *pg,
>>   
>>       while ( s < e )
>>       {
>> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
>> -        s += 1UL;
>> +        /*
>> +         * For s == 0, we simply use the largest increment by checking the
>> +         * MSB of the region size. For s != 0, we also need to ensure that the
>> +         * chunk is properly sized to end at power-of-two alignment. We do this
>> +         * by checking the LSB of the start address and use its index as
>> +         * the increment. Both cases need to be guarded by MAX_ORDER.
> 
> s/guarded/bounded/ ?
> 
>> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
>> +         * to decrement it by 1.
>> +         */
>> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
> 
> unsigned int, since ...

MAX_ORDER, flsl(), ffsl() are both returning signed value. So inc_order 
should be 'int' to avoid any compilation issue.

> 
>> +        if ( s )
>> +            inc_order = min(inc_order, ffsl(s) - 1);
>> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
> 
> ... that's what the function parameter says, and the value also can go
> negative?

AFAICT, none of the values are negative. But per above, if we use min() 
then the local variable should be 'int'. The two possible alternatives are:
   1) Use min_t()
   2) Update MAX_ORDER, flsl(), ffsl() to return an unsigned value

The ideal would be #2 but it will require at least an extra patch and 
effort. If we use #1, then they use may become stale if we implement #2.

So I would prefer to keep min(). That said, I would be open to use 
min_t() if you strongly prefer it.

> Preferably with these adjustments
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks!

Cheers,
Jan Beulich July 18, 2022, 11:02 a.m. UTC | #4
On 18.07.2022 12:24, Julien Grall wrote:
> Hi Jan,
> 
> On 18/07/2022 10:43, Jan Beulich wrote:
>> On 15.07.2022 19:03, Julien Grall wrote:
>>> @@ -1806,8 +1806,22 @@ static void _init_heap_pages(const struct page_info *pg,
>>>   
>>>       while ( s < e )
>>>       {
>>> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
>>> -        s += 1UL;
>>> +        /*
>>> +         * For s == 0, we simply use the largest increment by checking the
>>> +         * MSB of the region size. For s != 0, we also need to ensure that the
>>> +         * chunk is properly sized to end at power-of-two alignment. We do this
>>> +         * by checking the LSB of the start address and use its index as
>>> +         * the increment. Both cases need to be guarded by MAX_ORDER.
>>
>> s/guarded/bounded/ ?
>>
>>> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
>>> +         * to decrement it by 1.
>>> +         */
>>> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
>>
>> unsigned int, since ...
> 
> MAX_ORDER, flsl(), ffsl() are both returning signed value. So inc_order 
> should be 'int' to avoid any compilation issue.
> 
>>
>>> +        if ( s )
>>> +            inc_order = min(inc_order, ffsl(s) - 1);
>>> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
>>
>> ... that's what the function parameter says, and the value also can go
>> negative?
> 
> AFAICT, none of the values are negative. But per above, if we use min() 
> then the local variable should be 'int'. The two possible alternatives are:
>    1) Use min_t()
>    2) Update MAX_ORDER, flsl(), ffsl() to return an unsigned value
> 
> The ideal would be #2 but it will require at least an extra patch and 
> effort. If we use #1, then they use may become stale if we implement #2.

I'm not sure #2 is a good option - we'd deviate from conventions used
elsewhere on what flsl() and ffsl() return. And constants would imo
best remain suffix-less unless a suffix is very relevant to have (for
MAX_ORDER, which isn't at risk of being subject to ~ or -, it may be
warranted).

3)
        unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);

        if ( s )
            inc_order = min(inc_order, ffsl(s) - 1U);

No compilation issues to expect here, afaict.

> So I would prefer to keep min(). That said, I would be open to use 
> min_t() if you strongly prefer it.

I consider max_t() / min_t() dangerous, so I'd like to limit their use
to cases where no good alternatives exist.

Jan
Julien Grall July 18, 2022, 5:39 p.m. UTC | #5
Hi Jan,

On 18/07/2022 12:02, Jan Beulich wrote:
> On 18.07.2022 12:24, Julien Grall wrote:
> 3)
>          unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
> 
>          if ( s )
>              inc_order = min(inc_order, ffsl(s) - 1U);

I like this idea!

> 
> No compilation issues to expect here, afaict.

Correct, GCC is happy with this approach. Assuming there are no other 
comments, are you happy if I make the change on commit?

Cheers,
Jan Beulich July 19, 2022, 6:01 a.m. UTC | #6
On 18.07.2022 19:39, Julien Grall wrote:
> On 18/07/2022 12:02, Jan Beulich wrote:
>> On 18.07.2022 12:24, Julien Grall wrote:
>> 3)
>>          unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
>>
>>          if ( s )
>>              inc_order = min(inc_order, ffsl(s) - 1U);
> 
> I like this idea!
> 
>>
>> No compilation issues to expect here, afaict.
> 
> Correct, GCC is happy with this approach. Assuming there are no other 
> comments, are you happy if I make the change on commit?

Sure - iirc I gave my R-b already.

Jan
Julien Grall July 20, 2022, 6:27 p.m. UTC | #7
Hi Jan,

On 19/07/2022 07:01, Jan Beulich wrote:
> On 18.07.2022 19:39, Julien Grall wrote:
>> On 18/07/2022 12:02, Jan Beulich wrote:
>>> On 18.07.2022 12:24, Julien Grall wrote:
>>> 3)
>>>           unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
>>>
>>>           if ( s )
>>>               inc_order = min(inc_order, ffsl(s) - 1U);
>>
>> I like this idea!
>>
>>>
>>> No compilation issues to expect here, afaict.
>>
>> Correct, GCC is happy with this approach. Assuming there are no other
>> comments, are you happy if I make the change on commit?
> 
> Sure - iirc I gave my R-b already.

You did. Just wanted to make sure your reviewed-by was valid with your 
proposal.

I have now committed the series.

Cheers,
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index eedb2fed77c3..2b99801d2ea3 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1779,7 +1779,7 @@  int query_page_offline(mfn_t mfn, uint32_t *status)
 
 /*
  * This function should only be called with valid pages from the same NUMA
- * node.
+ * node and zone.
  */
 static void _init_heap_pages(const struct page_info *pg,
                              unsigned long nr_pages,
@@ -1806,8 +1806,22 @@  static void _init_heap_pages(const struct page_info *pg,
 
     while ( s < e )
     {
-        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
-        s += 1UL;
+        /*
+         * For s == 0, we simply use the largest increment by checking the
+         * MSB of the region size. For s != 0, we also need to ensure that the
+         * chunk is properly sized to end at power-of-two alignment. We do this
+         * by checking the LSB of the start address and use its index as
+         * the increment. Both cases need to be guarded by MAX_ORDER.
+         *
+         * Note that the value of ffsl() and flsl() starts from 1 so we need
+         * to decrement it by 1.
+         */
+        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
+
+        if ( s )
+            inc_order = min(inc_order, ffsl(s) - 1);
+        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
+        s += (1UL << inc_order);
     }
 }
 
@@ -1844,6 +1858,9 @@  static void init_heap_pages(
 
     for ( i = 0; i < nr_pages; )
     {
+#ifdef CONFIG_SEPARATE_XENHEAP
+        unsigned int zone = page_to_zone(pg);
+#endif
         unsigned int nid = phys_to_nid(page_to_maddr(pg));
         unsigned long left = nr_pages - i;
         unsigned long contig_pages;
@@ -1856,6 +1873,18 @@  static void init_heap_pages(
          */
         for ( contig_pages = 1; contig_pages < left; contig_pages++ )
         {
+            /*
+             * No need to check for the zone when !CONFIG_SEPARATE_XENHEAP
+             * because free_heap_pages() can only take power-of-two ranges
+             * which never cross zone boundaries. But for separate xenheap
+             * which is manually defined, it is possible for power-of-two
+             * range to cross zones.
+             */
+#ifdef CONFIG_SEPARATE_XENHEAP
+            if ( zone != page_to_zone(pg) )
+                break;
+#endif
+
             if ( nid != (phys_to_nid(page_to_maddr(pg))) )
                 break;
         }