Message ID | a26bc4aeba89f7895c79df7e320adfc695b16d50.1714322424.git.w1benny@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: Make MAX_ALTP2M configurable | expand |
On 28.04.2024 18:52, Petr Beneš wrote: > From: Petr Beneš <w1benny@gmail.com> > > This change anticipates scenarios where `max_altp2m` is set to its maximum > supported value (i.e., 512), ensuring sufficient memory is allocated upfront > to accommodate all altp2m tables without initialization failure. And guests with fewer or even no altp2m-s still need the same bump? You know the number of altp2m-s upon domain creation, so why bump by any more than what's strictly needed for that? > The necessity for this increase arises from the current mechanism where altp2m > tables are allocated at initialization, requiring one page from the mempool > for each altp2m view. So that's the p2m_alloc_table() out of hap_enable()? If you're permitting up to 512 altp2m-s, I think it needs considering to not waste up to 2Mb without knowing how many of the altp2m-s are actually going to be used. How complicate on-demand allocation would be I can't tell though, I have to admit. > --- a/tools/tests/paging-mempool/test-paging-mempool.c > +++ b/tools/tests/paging-mempool/test-paging-mempool.c > @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = { > > static uint64_t default_mempool_size_bytes = > #if defined(__x86_64__) || defined(__i386__) > - 256 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ > + 1024 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ I also can't derive from the description why we'd need to go from 256 to 1024 here and ... > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode) > if ( old_pages == 0 ) > { > paging_lock(d); > - rv = hap_set_allocation(d, 256, NULL); > + rv = hap_set_allocation(d, 1024, NULL); ... here. You talk of (up to) 512 pages there only. Also isn't there at least one more place where the tool stack (libxl I think) would need changing, where Dom0 ballooning needs are calculated? And/or doesn't the pool size have a default calculation in the tool stack, too? Jan
On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote: > > On 28.04.2024 18:52, Petr Beneš wrote: > > From: Petr Beneš <w1benny@gmail.com> > > > > This change anticipates scenarios where `max_altp2m` is set to its maximum > > supported value (i.e., 512), ensuring sufficient memory is allocated upfront > > to accommodate all altp2m tables without initialization failure. > > And guests with fewer or even no altp2m-s still need the same bump? You > know the number of altp2m-s upon domain creation, so why bump by any more > than what's strictly needed for that? I have to admit I've considered computing the value which goes to hap_set_allocation by simply adding 256 + max_altp2m, but that felt so arbitrary - the 256 value itself feels arbitrary, as I haven't found any reasoning for it anywhere. I have also tried to make code changes to make the initial allocation size configurable via libxl (possibly reusing the shadow_memkb) - which seemed to me like the "correct" solution, but those changes were more complicated than I had anticipated and I would definitely not make it till the 4.19 deadline. Question is, what to do now? Should I change it to 256 + max_altp2m? > > The necessity for this increase arises from the current mechanism where altp2m > > tables are allocated at initialization, requiring one page from the mempool > > for each altp2m view. > > So that's the p2m_alloc_table() out of hap_enable()? If you're permitting > up to 512 altp2m-s, I think it needs considering to not waste up to 2Mb > without knowing how many of the altp2m-s are actually going to be used. Yes and I ultimately agree. > How complicate on-demand allocation would be I can't tell though, I have > to admit. That's also a fix I've been trying to work on - to make whole altp2m allocations on-demand. Unfortunately, I didn't make it in time. > > --- a/tools/tests/paging-mempool/test-paging-mempool.c > > +++ b/tools/tests/paging-mempool/test-paging-mempool.c > > @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = { > > > > static uint64_t default_mempool_size_bytes = > > #if defined(__x86_64__) || defined(__i386__) > > - 256 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ > > + 1024 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ > > I also can't derive from the description why we'd need to go from 256 to > 1024 here and ... It's explained in the code few lines below: /* * Check that the domain has the expected default allocation size. This * will fail if the logic in Xen is altered without an equivalent * adjustment here. */ I have verified that the default_mempool_size_bytes must reflect the number of pages in the initial hap_set_allocation() call. Is it something I should include in the commit message, too? > > --- a/xen/arch/x86/mm/hap/hap.c > > +++ b/xen/arch/x86/mm/hap/hap.c > > @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode) > > if ( old_pages == 0 ) > > { > > paging_lock(d); > > - rv = hap_set_allocation(d, 256, NULL); > > + rv = hap_set_allocation(d, 1024, NULL); > > ... here. You talk of (up to) 512 pages there only. > > Also isn't there at least one more place where the tool stack (libxl I > think) would need changing, where Dom0 ballooning needs are calculated? > And/or doesn't the pool size have a default calculation in the tool > stack, too? I have found places in libxl where the mempool_size is calculated, but that mempool size is then set AFTER the domain is created via xc_set_paging_mempool_size. In my opinion it doesn't necessarily require change, since it's expected by the user to manually set it via shadow_memkb. The only current problem is (which this commit is trying to fix) that setting shadow_memkb doesn't help when max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool size is hardcoded. I didn't find any other places that would require reflecting the current "256" value. P.
On 30.04.2024 17:40, Petr Beneš wrote: > On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 28.04.2024 18:52, Petr Beneš wrote: >>> From: Petr Beneš <w1benny@gmail.com> >>> >>> This change anticipates scenarios where `max_altp2m` is set to its maximum >>> supported value (i.e., 512), ensuring sufficient memory is allocated upfront >>> to accommodate all altp2m tables without initialization failure. >> >> And guests with fewer or even no altp2m-s still need the same bump? You >> know the number of altp2m-s upon domain creation, so why bump by any more >> than what's strictly needed for that? > > I have to admit I've considered computing the value which goes to > hap_set_allocation > by simply adding 256 + max_altp2m, but that felt so arbitrary - the > 256 value itself > feels arbitrary, as I haven't found any reasoning for it anywhere. > > I have also tried to make code changes to make the initial allocation > size configurable > via libxl (possibly reusing the shadow_memkb) - which seemed to me > like the "correct" > solution, but those changes were more complicated than I had > anticipated and I would > definitely not make it till the 4.19 deadline. > > Question is, what to do now? Should I change it to 256 + max_altp2m? Counter question: Is accounting for just the root page table really enough? Meaning to say: I'm not convinced that minimum would really be appropriate for altp2m use even before your changes. You growing the number of root page tables _always_ required just makes things worse even without considering how (many) altp2m-s are then going to be used. Such an issue, if I'm right with this, would imo want addressing up front, in a separate patch. You may also want to take a look at the shadow side of things, even if for altp2m shadow mode may not be relevant. Things like sh_min_allocation() and shadow_min_acceptable_pages() may well need proper counterparts in HAP now. >>> --- a/tools/tests/paging-mempool/test-paging-mempool.c >>> +++ b/tools/tests/paging-mempool/test-paging-mempool.c >>> @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = { >>> >>> static uint64_t default_mempool_size_bytes = >>> #if defined(__x86_64__) || defined(__i386__) >>> - 256 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ >>> + 1024 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ >> >> I also can't derive from the description why we'd need to go from 256 to >> 1024 here and ... > > It's explained in the code few lines below: > > /* > * Check that the domain has the expected default allocation size. This > * will fail if the logic in Xen is altered without an equivalent > * adjustment here. > */ > > I have verified that the default_mempool_size_bytes must reflect the number > of pages in the initial hap_set_allocation() call. > > Is it something I should include in the commit message, too? Well, yes and no. My question wasn't about the "why", but ... >>> --- a/xen/arch/x86/mm/hap/hap.c >>> +++ b/xen/arch/x86/mm/hap/hap.c >>> @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode) >>> if ( old_pages == 0 ) >>> { >>> paging_lock(d); >>> - rv = hap_set_allocation(d, 256, NULL); >>> + rv = hap_set_allocation(d, 1024, NULL); >> >> ... here. You talk of (up to) 512 pages there only. ... about the "by how many". >> Also isn't there at least one more place where the tool stack (libxl I >> think) would need changing, where Dom0 ballooning needs are calculated? >> And/or doesn't the pool size have a default calculation in the tool >> stack, too? > > I have found places in libxl where the mempool_size is calculated, but > that mempool > size is then set AFTER the domain is created via xc_set_paging_mempool_size. > > In my opinion it doesn't necessarily require change, since it's > expected by the user > to manually set it via shadow_memkb. The only current problem is (which this > commit is trying to fix) that setting shadow_memkb doesn't help when > max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool > size is hardcoded. Wait - are you saying the guest config value isn't respected in certain cases? That would be another thing wanting to be fixed separately, up front. Jan
On Thu, May 2, 2024 at 8:36 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 30.04.2024 17:40, Petr Beneš wrote: > > On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote: > >> > >> On 28.04.2024 18:52, Petr Beneš wrote: > >>> From: Petr Beneš <w1benny@gmail.com> > >>> > >>> This change anticipates scenarios where `max_altp2m` is set to its maximum > >>> supported value (i.e., 512), ensuring sufficient memory is allocated upfront > >>> to accommodate all altp2m tables without initialization failure. > >> > >> And guests with fewer or even no altp2m-s still need the same bump? You > >> know the number of altp2m-s upon domain creation, so why bump by any more > >> than what's strictly needed for that? > > > > I have to admit I've considered computing the value which goes to > > hap_set_allocation > > by simply adding 256 + max_altp2m, but that felt so arbitrary - the > > 256 value itself > > feels arbitrary, as I haven't found any reasoning for it anywhere. > > > > I have also tried to make code changes to make the initial allocation > > size configurable > > via libxl (possibly reusing the shadow_memkb) - which seemed to me > > like the "correct" > > solution, but those changes were more complicated than I had > > anticipated and I would > > definitely not make it till the 4.19 deadline. > > > > Question is, what to do now? Should I change it to 256 + max_altp2m? > > Counter question: Is accounting for just the root page table really > enough? Meaning to say: I'm not convinced that minimum would really > be appropriate for altp2m use even before your changes. You growing > the number of root page tables _always_ required just makes things > worse even without considering how (many) altp2m-s are then going > to be used. Such an issue, if I'm right with this, would imo want > addressing up front, in a separate patch. It is enough - at least based on my experiments. I'll try to explain it below. > >> Also isn't there at least one more place where the tool stack (libxl I > >> think) would need changing, where Dom0 ballooning needs are calculated? > >> And/or doesn't the pool size have a default calculation in the tool > >> stack, too? > > > > I have found places in libxl where the mempool_size is calculated, but > > that mempool > > size is then set AFTER the domain is created via xc_set_paging_mempool_size. > > > > In my opinion it doesn't necessarily require change, since it's > > expected by the user > > to manually set it via shadow_memkb. The only current problem is (which this > > commit is trying to fix) that setting shadow_memkb doesn't help when > > max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool > > size is hardcoded. > > Wait - are you saying the guest config value isn't respected in certain > cases? That would be another thing wanting to be fixed separately, up > front. The xc_set_paging_mempool_size is still called within domain_create. So the value of shadow_memkb is respected before any of the guest code is run. My point was that shadow_memkb isn't respected here: >>> --- a/xen/arch/x86/mm/hap/hap.c >>> +++ b/xen/arch/x86/mm/hap/hap.c >>> @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode) >>> if ( old_pages == 0 ) >>> { >>> paging_lock(d); >>> - rv = hap_set_allocation(d, 256, NULL); >>> + rv = hap_set_allocation(d, 1024, NULL); This code (+ the root altp2ms allocation) is executed before the libxl sends the shadow_memkb. In another words, the sequence is following: libxl: ------ do_domain_create initiate_domain_create libxl__domain_make xc_domain_create // MAX_ALTP2M is passed here // and hap_enable is called dcs->bl.callback = domcreate_bootloader_done domcreate_bootloader_done libxl__domain_build libxl__build_pre libxl__arch_domain_create libxl__domain_set_paging_mempool_size xc_set_paging_mempool_size(shadow_mem) xen (xc_domain_create cont.): ----------------------------- domain_create arch_domain_create hvm_domain_initialise paging_enable hap_enable // note that we shadow_mem (from config) hasn't been // provided yet hap_set_allocation(d, 1024, NULL); p2m_alloc_table(p2m_get_hostp2m(d)); p2m_alloc_table(d->arch.nested_p2m[i..MAX_NESTEDP2M]); p2m_alloc_table(d->arch.altp2m_p2m[i..MAX_ALTP2M]); (I hope the email will preserve the spacing...) Based on this, I would argue that shadow_memkb should be also part of xc_domain_create/xen_domctl_createdomain. Which is why I've said in previous email: > > I have also tried to make code changes to make the initial allocation > > size configurable > > via libxl (possibly reusing the shadow_memkb) - which seemed to me > > like the "correct" > > solution, but those changes were more complicated than I had > > anticipated and I would > > definitely not make it till the 4.19 deadline. P.
diff --git a/tools/tests/paging-mempool/test-paging-mempool.c b/tools/tests/paging-mempool/test-paging-mempool.c index 1ebc13455a..91b06fa0cf 100644 --- a/tools/tests/paging-mempool/test-paging-mempool.c +++ b/tools/tests/paging-mempool/test-paging-mempool.c @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = { static uint64_t default_mempool_size_bytes = #if defined(__x86_64__) || defined(__i386__) - 256 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ + 1024 << 12; /* Only x86 HAP for now. x86 Shadow needs more work. */ #elif defined (__arm__) || defined(__aarch64__) 16 << 12; #endif diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index 7aff5fa664..fab7e256a4 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode) if ( old_pages == 0 ) { paging_lock(d); - rv = hap_set_allocation(d, 256, NULL); + rv = hap_set_allocation(d, 1024, NULL); if ( rv != 0 ) { hap_set_allocation(d, 0, NULL);