Message ID | 20240513134046.82605-12-eliasely@amazon.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Remove the directmap | expand |
On Mon, May 13, 2024 at 01:40:38PM +0000, Elias El Yandouzi wrote: > From: Hongyan Xia <hongyxia@amazon.com> > > When we do not have a direct map, memory for metadata of heap nodes in > init_node_heap() is allocated from xenheap, which needs to be mapped and > unmapped on demand. However, we cannot just take memory from the boot > allocator to create the PTEs while we are passing memory to the heap > allocator. > > To solve this race, we leave early boot slightly sooner so that Xen PTE > pages are allocated from the heap instead of the boot allocator. We can > do this because the metadata for the 1st node is statically allocated, > and by the time we need memory to create mappings for the 2nd node, we > already have enough memory in the heap allocator in the 1st node. > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> > Signed-off-by: Julien Grall <jgrall@amazon.com> > Signed-off-by: Elias El Yandouzi <eliasely@amazon.com> > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > index bd6b1184f5..f26c9799e4 100644 > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -1751,6 +1751,22 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) > > numa_initmem_init(0, raw_max_page); > > + /* > + * When we do not have a direct map, memory for metadata of heap nodes in > + * init_node_heap() is allocated from xenheap, which needs to be mapped and > + * unmapped on demand. Hm, maybe I'm confused, but isn't xenheap memory supposed to be always mapped when in use? In one of the previous patches xenheap memory is unconditionally mapped in alloc_xenheap_pages(). IMO, this would better be worded as: "... is allocated from xenheap, which needs to be mapped at allocation and unmapped when freed." > However, we cannot just take memory from the boot > + * allocator to create the PTEs while we are passing memory to the heap > + * allocator during end_boot_allocator(). Could you elaborate here, I don't obviously see why we can't consume memory from the boot allocator. Is it because under certain conditions we might try to allocate memory from the boot allocator in order to fulfill a call to map_pages_to_xen() and find the boot allocator empty? Thanks, Roger.
On 13.05.2024 15:40, Elias El Yandouzi wrote: > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -1751,6 +1751,22 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) > > numa_initmem_init(0, raw_max_page); > > + /* > + * When we do not have a direct map, memory for metadata of heap nodes in > + * init_node_heap() is allocated from xenheap, which needs to be mapped and > + * unmapped on demand. However, we cannot just take memory from the boot > + * allocator to create the PTEs while we are passing memory to the heap > + * allocator during end_boot_allocator(). > + * > + * To solve this race, we need to leave early boot before > + * end_boot_allocator() so that Xen PTE pages are allocated from the heap > + * instead of the boot allocator. We can do this because the metadata for > + * the 1st node is statically allocated, and by the time we need memory to > + * create mappings for the 2nd node, we already have enough memory in the > + * heap allocator in the 1st node. > + */ > + system_state = SYS_STATE_boot; > + > if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) ) > { > unsigned long lo = virt_to_mfn(HYPERVISOR_VIRT_END - 1); > @@ -1782,8 +1798,6 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) > else > end_boot_allocator(); > > - system_state = SYS_STATE_boot; > - > bsp_stack = cpu_alloc_stack(0); > if ( !bsp_stack ) > panic("No memory for BSP stack\n"); I'm pretty wary of this movement, even more so when Arm isn't switched at the same time. It has (virtually?) always been the case that this state switch happens _after_ end_boot_allocator(), and I wouldn't be surprised if there was a dependency on that somewhere. I realize you've been telling use that at Amazon you've been running with an earlier variant of these changes for a long time, and you not having hit issues with this is a good sign. But I'm afraid it's not a proof. As to possible alternatives - as pointed out by Roger, the comment / patch description aren't entirely clear as to what exactly needs working around. One possibility might be to introduce an x86-only boolean controlling from when on to use the heap allocator for page table allocations, thus decoupling that from system_state. Jan
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index bd6b1184f5..f26c9799e4 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1751,6 +1751,22 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) numa_initmem_init(0, raw_max_page); + /* + * When we do not have a direct map, memory for metadata of heap nodes in + * init_node_heap() is allocated from xenheap, which needs to be mapped and + * unmapped on demand. However, we cannot just take memory from the boot + * allocator to create the PTEs while we are passing memory to the heap + * allocator during end_boot_allocator(). + * + * To solve this race, we need to leave early boot before + * end_boot_allocator() so that Xen PTE pages are allocated from the heap + * instead of the boot allocator. We can do this because the metadata for + * the 1st node is statically allocated, and by the time we need memory to + * create mappings for the 2nd node, we already have enough memory in the + * heap allocator in the 1st node. + */ + system_state = SYS_STATE_boot; + if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) ) { unsigned long lo = virt_to_mfn(HYPERVISOR_VIRT_END - 1); @@ -1782,8 +1798,6 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p) else end_boot_allocator(); - system_state = SYS_STATE_boot; - bsp_stack = cpu_alloc_stack(0); if ( !bsp_stack ) panic("No memory for BSP stack\n");