mbox series

[0/2] xen/mm: Optimize init_heap_pages()

Message ID 20220609083039.76667-1-julien@xen.org (mailing list archive)
Headers show
Series xen/mm: Optimize init_heap_pages() | expand

Message

Julien Grall June 9, 2022, 8:30 a.m. UTC
From: Julien Grall <jgrall@amazon.com>

Hi all,

As part of the Live-Update work, we noticed that a big part Xen boot
is spent to add pages in the heap. For instance, on when running Xen
in a nested envionment on a c5.metal, it takes ~1.5s.

This small series is reworking init_heap_pages() to give the pages
to free_heap_pages() by chunk rather than one by one.

With this approach, the time spent to init the heap is down
to 166 ms in the setup mention above.

There is potentially one more optimization possible that would
allow to further reduce the time spent. The new approach is accessing
the page information multiple time in separate loop that can potentially
be large.

Cheers,

Hongyan Xia (1):
  xen/heap: pass order to free_heap_pages() in heap init

Julien Grall (1):
  xen/heap: Split init_heap_pages() in two

 xen/common/page_alloc.c | 109 ++++++++++++++++++++++++++++++----------
 1 file changed, 82 insertions(+), 27 deletions(-)

Comments

Julien Grall June 10, 2022, 9:36 a.m. UTC | #1
On 09/06/2022 09:30, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Hi all,
> 
> As part of the Live-Update work, we noticed that a big part Xen boot
> is spent to add pages in the heap. For instance, on when running Xen
> in a nested envionment on a c5.metal, it takes ~1.5s.

On IRC, Bertrand asked me how I measured the time taken here. I will 
share on xen-devel so everyone can use it. Note the patch is x86 
specific, but could be easily adapted for Arm.

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 53a73010e029..d99b9f3abf5e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -615,10 +615,16 @@ static inline bool using_2M_mapping(void)
             !l1_table_offset((unsigned long)__2M_rwdata_end);
  }

+extern uint64_t myticks;
+
  static void noreturn init_done(void)
  {
      void *va;
      unsigned long start, end;
+    uint64_t elapsed = tsc_ticks2ns(myticks);
+
+    printk("elapsed %lu ms %lu ns\n", elapsed / MILLISECS(1),
+           elapsed % MILLISECS(1));

      system_state = SYS_STATE_active;

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index ea59cd1a4aba..3e6504283f1e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1865,9 +1865,12 @@ static unsigned long avail_heap_pages(
      return free_pages;
  }

+uint64_t myticks;
+
  void __init end_boot_allocator(void)
  {
      unsigned int i;
+    uint64_t stsc = rdtsc_ordered();

      /* Pages that are free now go to the domain sub-allocator. */
      for ( i = 0; i < nr_bootmem_regions; i++ )
@@ -1892,6 +1895,8 @@ void __init end_boot_allocator(void)
      if ( !dma_bitsize && (num_online_nodes() > 1) )
          dma_bitsize = arch_get_dma_bitsize();

+    myticks = rdtsc_ordered() - stsc;
+
      printk("Domain heap initialised");
      if ( dma_bitsize )
          printk(" DMA width %u bits", dma_bitsize);

Cheers,