diff mbox series

[v2,7/7] x86/hap: Increase the number of initial mempool_size to 1024 pages

Message ID a26bc4aeba89f7895c79df7e320adfc695b16d50.1714322424.git.w1benny@gmail.com (mailing list archive)
State New
Headers show
Series x86: Make MAX_ALTP2M configurable | expand

Commit Message

Petr Beneš April 28, 2024, 4:52 p.m. UTC
From: Petr Beneš <w1benny@gmail.com>

This change anticipates scenarios where `max_altp2m` is set to its maximum
supported value (i.e., 512), ensuring sufficient memory is allocated upfront
to accommodate all altp2m tables without initialization failure.

The necessity for this increase arises from the current mechanism where altp2m
tables are allocated at initialization, requiring one page from the mempool
for each altp2m view.

Signed-off-by: Petr Beneš <w1benny@gmail.com>
---
 tools/tests/paging-mempool/test-paging-mempool.c | 2 +-
 xen/arch/x86/mm/hap/hap.c                        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Jan Beulich April 30, 2024, 2:47 p.m. UTC | #1
On 28.04.2024 18:52, Petr Beneš wrote:
> From: Petr Beneš <w1benny@gmail.com>
> 
> This change anticipates scenarios where `max_altp2m` is set to its maximum
> supported value (i.e., 512), ensuring sufficient memory is allocated upfront
> to accommodate all altp2m tables without initialization failure.

And guests with fewer or even no altp2m-s still need the same bump? You
know the number of altp2m-s upon domain creation, so why bump by any more
than what's strictly needed for that?

> The necessity for this increase arises from the current mechanism where altp2m
> tables are allocated at initialization, requiring one page from the mempool
> for each altp2m view.

So that's the p2m_alloc_table() out of hap_enable()? If you're permitting
up to 512 altp2m-s, I think it needs considering to not waste up to 2Mb
without knowing how many of the altp2m-s are actually going to be used.
How complicate on-demand allocation would be I can't tell though, I have
to admit.

> --- a/tools/tests/paging-mempool/test-paging-mempool.c
> +++ b/tools/tests/paging-mempool/test-paging-mempool.c
> @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = {
>  
>  static uint64_t default_mempool_size_bytes =
>  #if defined(__x86_64__) || defined(__i386__)
> -    256 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
> +    1024 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */

I also can't derive from the description why we'd need to go from 256 to
1024 here and ...

> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode)
>      if ( old_pages == 0 )
>      {
>          paging_lock(d);
> -        rv = hap_set_allocation(d, 256, NULL);
> +        rv = hap_set_allocation(d, 1024, NULL);

... here. You talk of (up to) 512 pages there only.

Also isn't there at least one more place where the tool stack (libxl I
think) would need changing, where Dom0 ballooning needs are calculated?
And/or doesn't the pool size have a default calculation in the tool
stack, too?

Jan
Petr Beneš April 30, 2024, 3:40 p.m. UTC | #2
On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.04.2024 18:52, Petr Beneš wrote:
> > From: Petr Beneš <w1benny@gmail.com>
> >
> > This change anticipates scenarios where `max_altp2m` is set to its maximum
> > supported value (i.e., 512), ensuring sufficient memory is allocated upfront
> > to accommodate all altp2m tables without initialization failure.
>
> And guests with fewer or even no altp2m-s still need the same bump? You
> know the number of altp2m-s upon domain creation, so why bump by any more
> than what's strictly needed for that?

I have to admit I've considered computing the value which goes to
hap_set_allocation
by simply adding 256 + max_altp2m, but that felt so arbitrary - the
256 value itself
feels arbitrary, as I haven't found any reasoning for it anywhere.

I have also tried to make code changes to make the initial allocation
size configurable
via libxl (possibly reusing the shadow_memkb) - which seemed to me
like the "correct"
solution, but those changes were more complicated than I had
anticipated and I would
definitely not make it till the 4.19 deadline.

Question is, what to do now? Should I change it to 256 + max_altp2m?

> > The necessity for this increase arises from the current mechanism where altp2m
> > tables are allocated at initialization, requiring one page from the mempool
> > for each altp2m view.
>
> So that's the p2m_alloc_table() out of hap_enable()? If you're permitting
> up to 512 altp2m-s, I think it needs considering to not waste up to 2Mb
> without knowing how many of the altp2m-s are actually going to be used.

Yes and I ultimately agree.

> How complicate on-demand allocation would be I can't tell though, I have
> to admit.

That's also a fix I've been trying to work on - to make whole altp2m allocations
on-demand. Unfortunately, I didn't make it in time.

> > --- a/tools/tests/paging-mempool/test-paging-mempool.c
> > +++ b/tools/tests/paging-mempool/test-paging-mempool.c
> > @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = {
> >
> >  static uint64_t default_mempool_size_bytes =
> >  #if defined(__x86_64__) || defined(__i386__)
> > -    256 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
> > +    1024 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
>
> I also can't derive from the description why we'd need to go from 256 to
> 1024 here and ...

It's explained in the code few lines below:

    /*
     * Check that the domain has the expected default allocation size.  This
     * will fail if the logic in Xen is altered without an equivalent
     * adjustment here.
     */

I have verified that the default_mempool_size_bytes must reflect the number
of pages in the initial hap_set_allocation() call.

Is it something I should include in the commit message, too?

> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode)
> >      if ( old_pages == 0 )
> >      {
> >          paging_lock(d);
> > -        rv = hap_set_allocation(d, 256, NULL);
> > +        rv = hap_set_allocation(d, 1024, NULL);
>
> ... here. You talk of (up to) 512 pages there only.
>
> Also isn't there at least one more place where the tool stack (libxl I
> think) would need changing, where Dom0 ballooning needs are calculated?
> And/or doesn't the pool size have a default calculation in the tool
> stack, too?

I have found places in libxl where the mempool_size is calculated, but
that mempool
size is then set AFTER the domain is created via xc_set_paging_mempool_size.

In my opinion it doesn't necessarily require change, since it's
expected by the user
to manually set it via shadow_memkb. The only current problem is (which this
commit is trying to fix) that setting shadow_memkb doesn't help when
max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool
size is hardcoded.

I didn't find any other places that would require reflecting the
current "256" value.

P.
Jan Beulich May 2, 2024, 6:36 a.m. UTC | #3
On 30.04.2024 17:40, Petr Beneš wrote:
> On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.04.2024 18:52, Petr Beneš wrote:
>>> From: Petr Beneš <w1benny@gmail.com>
>>>
>>> This change anticipates scenarios where `max_altp2m` is set to its maximum
>>> supported value (i.e., 512), ensuring sufficient memory is allocated upfront
>>> to accommodate all altp2m tables without initialization failure.
>>
>> And guests with fewer or even no altp2m-s still need the same bump? You
>> know the number of altp2m-s upon domain creation, so why bump by any more
>> than what's strictly needed for that?
> 
> I have to admit I've considered computing the value which goes to
> hap_set_allocation
> by simply adding 256 + max_altp2m, but that felt so arbitrary - the
> 256 value itself
> feels arbitrary, as I haven't found any reasoning for it anywhere.
> 
> I have also tried to make code changes to make the initial allocation
> size configurable
> via libxl (possibly reusing the shadow_memkb) - which seemed to me
> like the "correct"
> solution, but those changes were more complicated than I had
> anticipated and I would
> definitely not make it till the 4.19 deadline.
> 
> Question is, what to do now? Should I change it to 256 + max_altp2m?

Counter question: Is accounting for just the root page table really
enough? Meaning to say: I'm not convinced that minimum would really
be appropriate for altp2m use even before your changes. You growing
the number of root page tables _always_ required just makes things
worse even without considering how (many) altp2m-s are then going
to be used. Such an issue, if I'm right with this, would imo want
addressing up front, in a separate patch.

You may also want to take a look at the shadow side of things, even
if for altp2m shadow mode may not be relevant. Things like
sh_min_allocation() and shadow_min_acceptable_pages() may well need
proper counterparts in HAP now.

>>> --- a/tools/tests/paging-mempool/test-paging-mempool.c
>>> +++ b/tools/tests/paging-mempool/test-paging-mempool.c
>>> @@ -35,7 +35,7 @@ static struct xen_domctl_createdomain create = {
>>>
>>>  static uint64_t default_mempool_size_bytes =
>>>  #if defined(__x86_64__) || defined(__i386__)
>>> -    256 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
>>> +    1024 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
>>
>> I also can't derive from the description why we'd need to go from 256 to
>> 1024 here and ...
> 
> It's explained in the code few lines below:
> 
>     /*
>      * Check that the domain has the expected default allocation size.  This
>      * will fail if the logic in Xen is altered without an equivalent
>      * adjustment here.
>      */
> 
> I have verified that the default_mempool_size_bytes must reflect the number
> of pages in the initial hap_set_allocation() call.
> 
> Is it something I should include in the commit message, too?

Well, yes and no. My question wasn't about the "why", but ...

>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode)
>>>      if ( old_pages == 0 )
>>>      {
>>>          paging_lock(d);
>>> -        rv = hap_set_allocation(d, 256, NULL);
>>> +        rv = hap_set_allocation(d, 1024, NULL);
>>
>> ... here. You talk of (up to) 512 pages there only.

... about the "by how many".

>> Also isn't there at least one more place where the tool stack (libxl I
>> think) would need changing, where Dom0 ballooning needs are calculated?
>> And/or doesn't the pool size have a default calculation in the tool
>> stack, too?
> 
> I have found places in libxl where the mempool_size is calculated, but
> that mempool
> size is then set AFTER the domain is created via xc_set_paging_mempool_size.
> 
> In my opinion it doesn't necessarily require change, since it's
> expected by the user
> to manually set it via shadow_memkb. The only current problem is (which this
> commit is trying to fix) that setting shadow_memkb doesn't help when
> max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool
> size is hardcoded.

Wait - are you saying the guest config value isn't respected in certain
cases? That would be another thing wanting to be fixed separately, up
front.

Jan
Petr Beneš May 2, 2024, 11:59 a.m. UTC | #4
On Thu, May 2, 2024 at 8:36 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 30.04.2024 17:40, Petr Beneš wrote:
> > On Tue, Apr 30, 2024 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 28.04.2024 18:52, Petr Beneš wrote:
> >>> From: Petr Beneš <w1benny@gmail.com>
> >>>
> >>> This change anticipates scenarios where `max_altp2m` is set to its maximum
> >>> supported value (i.e., 512), ensuring sufficient memory is allocated upfront
> >>> to accommodate all altp2m tables without initialization failure.
> >>
> >> And guests with fewer or even no altp2m-s still need the same bump? You
> >> know the number of altp2m-s upon domain creation, so why bump by any more
> >> than what's strictly needed for that?
> >
> > I have to admit I've considered computing the value which goes to
> > hap_set_allocation
> > by simply adding 256 + max_altp2m, but that felt so arbitrary - the
> > 256 value itself
> > feels arbitrary, as I haven't found any reasoning for it anywhere.
> >
> > I have also tried to make code changes to make the initial allocation
> > size configurable
> > via libxl (possibly reusing the shadow_memkb) - which seemed to me
> > like the "correct"
> > solution, but those changes were more complicated than I had
> > anticipated and I would
> > definitely not make it till the 4.19 deadline.
> >
> > Question is, what to do now? Should I change it to 256 + max_altp2m?
>
> Counter question: Is accounting for just the root page table really
> enough? Meaning to say: I'm not convinced that minimum would really
> be appropriate for altp2m use even before your changes. You growing
> the number of root page tables _always_ required just makes things
> worse even without considering how (many) altp2m-s are then going
> to be used. Such an issue, if I'm right with this, would imo want
> addressing up front, in a separate patch.

It is enough - at least based on my experiments. I'll try to explain it below.

> >> Also isn't there at least one more place where the tool stack (libxl I
> >> think) would need changing, where Dom0 ballooning needs are calculated?
> >> And/or doesn't the pool size have a default calculation in the tool
> >> stack, too?
> >
> > I have found places in libxl where the mempool_size is calculated, but
> > that mempool
> > size is then set AFTER the domain is created via xc_set_paging_mempool_size.
> >
> > In my opinion it doesn't necessarily require change, since it's
> > expected by the user
> > to manually set it via shadow_memkb. The only current problem is (which this
> > commit is trying to fix) that setting shadow_memkb doesn't help when
> > max_altp2m > (256 - 1 + vcpus + MAX_NESTEDP2M), since the initial mempool
> > size is hardcoded.
>
> Wait - are you saying the guest config value isn't respected in certain
> cases? That would be another thing wanting to be fixed separately, up
> front.

The xc_set_paging_mempool_size is still called within domain_create.
So the value of shadow_memkb is respected before any of the guest code
is run. My point was that shadow_memkb isn't respected here:

>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -468,7 +468,7 @@ int hap_enable(struct domain *d, u32 mode)
>>>      if ( old_pages == 0 )
>>>      {
>>>          paging_lock(d);
>>> -        rv = hap_set_allocation(d, 256, NULL);
>>> +        rv = hap_set_allocation(d, 1024, NULL);

This code (+ the root altp2ms allocation) is executed before the libxl
sends the shadow_memkb.

In another words, the sequence is following:

libxl:
------

do_domain_create
    initiate_domain_create
        libxl__domain_make
            xc_domain_create // MAX_ALTP2M is passed here
                             // and hap_enable is called

        dcs->bl.callback = domcreate_bootloader_done

domcreate_bootloader_done
    libxl__domain_build
        libxl__build_pre
            libxl__arch_domain_create
                libxl__domain_set_paging_mempool_size
                    xc_set_paging_mempool_size(shadow_mem)

xen (xc_domain_create cont.):
-----------------------------
domain_create
    arch_domain_create
        hvm_domain_initialise
            paging_enable
                hap_enable
                    // note that we shadow_mem (from config) hasn't been
                    // provided yet
                    hap_set_allocation(d, 1024, NULL);
                    p2m_alloc_table(p2m_get_hostp2m(d));
                    p2m_alloc_table(d->arch.nested_p2m[i..MAX_NESTEDP2M]);
                    p2m_alloc_table(d->arch.altp2m_p2m[i..MAX_ALTP2M]);

(I hope the email will preserve the spacing...)

Based on this, I would argue that shadow_memkb should be also part of
xc_domain_create/xen_domctl_createdomain. Which is why I've said in
previous email:

> > I have also tried to make code changes to make the initial allocation
> > size configurable
> > via libxl (possibly reusing the shadow_memkb) - which seemed to me
> > like the "correct"
> > solution, but those changes were more complicated than I had
> > anticipated and I would
> > definitely not make it till the 4.19 deadline.

P.
diff mbox series

Patch

diff --git a/tools/tests/paging-mempool/test-paging-mempool.c b/tools/tests/paging-mempool/test-paging-mempool.c
index 1ebc13455a..91b06fa0cf 100644
--- a/tools/tests/paging-mempool/test-paging-mempool.c
+++ b/tools/tests/paging-mempool/test-paging-mempool.c
@@ -35,7 +35,7 @@  static struct xen_domctl_createdomain create = {
 
 static uint64_t default_mempool_size_bytes =
 #if defined(__x86_64__) || defined(__i386__)
-    256 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
+    1024 << 12; /* Only x86 HAP for now.  x86 Shadow needs more work. */
 #elif defined (__arm__) || defined(__aarch64__)
     16 << 12;
 #endif
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 7aff5fa664..fab7e256a4 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -468,7 +468,7 @@  int hap_enable(struct domain *d, u32 mode)
     if ( old_pages == 0 )
     {
         paging_lock(d);
-        rv = hap_set_allocation(d, 256, NULL);
+        rv = hap_set_allocation(d, 1024, NULL);
         if ( rv != 0 )
         {
             hap_set_allocation(d, 0, NULL);