diff mbox series

[v2] tools/libxl: make default of max event channels dependant on vcpus

Message ID 20200406082704.13994-1-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] tools/libxl: make default of max event channels dependant on vcpus | expand

Commit Message

Jürgen Groß April 6, 2020, 8:27 a.m. UTC
Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.

Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements. Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
 tools/libxl/libxl_create.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Julien Grall April 6, 2020, 9:24 a.m. UTC | #1
Hi Jurgen,

On 06/04/2020 09:27, Juergen Gross wrote:
> Since Xen 4.4 the maximum number of event channels for a guest is
> defaulting to 1023. For large guests with lots of vcpus this is not
> enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
> limiting the guest to about 140 vcpus.

Large guests on which arch? Which type of guests?

> Instead of requiring to specify the allowed number of event channels
> via the "event_channels" domain config option, make the default
> depend on the maximum number of vcpus of the guest. This will require
> to use the "event_channels" domain config option in fewer cases as
> before.
> 
> As different guests will have differing needs the calculation of the
> maximum number of event channels can be a rough estimate only,
> currently based on the Linux kernel requirements.

I am not overly happy to extend the default numbers of event channels 
for everyone based on Linux behavior on a given setup. Yes you have more 
guests that would be able to run, but at the expense of allowing a guest 
to use more xen memory.

For instance, I don't think this limit increase is necessary on Arm.

> Nevertheless it is
> much better than the static upper limit of today as more guests will
> boot just fine with the new approach.
> 
> In order not to regress current configs use 1023 as the minimum
> default setting.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - use max() instead of min()
> - clarify commit message a little bit
> ---
>   tools/libxl/libxl_create.c | 2 +-

The documentation should be updated.

>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index e7cb2dbc2b..c025b21894 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>               b_info->iomem[i].gfn = b_info->iomem[i].start;
>   
>       if (!b_info->event_channels)
> -        b_info->event_channels = 1023;
> +        b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);

What is the 255 for?

Cheers,
Jürgen Groß April 6, 2020, 10:17 a.m. UTC | #2
On 06.04.20 11:24, Julien Grall wrote:
> Hi Jurgen,
> 
> On 06/04/2020 09:27, Juergen Gross wrote:
>> Since Xen 4.4 the maximum number of event channels for a guest is
>> defaulting to 1023. For large guests with lots of vcpus this is not
>> enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
>> limiting the guest to about 140 vcpus.
> 
> Large guests on which arch? Which type of guests?

I'm pretty sure this applies to x86 only. I'm not aware of event
channels being used on ARM for IPIs.

> 
>> Instead of requiring to specify the allowed number of event channels
>> via the "event_channels" domain config option, make the default
>> depend on the maximum number of vcpus of the guest. This will require
>> to use the "event_channels" domain config option in fewer cases as
>> before.
>>
>> As different guests will have differing needs the calculation of the
>> maximum number of event channels can be a rough estimate only,
>> currently based on the Linux kernel requirements.
> 
> I am not overly happy to extend the default numbers of event channels 
> for everyone based on Linux behavior on a given setup. Yes you have more 
> guests that would be able to run, but at the expense of allowing a guest 
> to use more xen memory.

The resulting number would be larger than today only for guests with
more than 96 vcpus. So I don't think the additional amount of memory
is really that problematic.

> 
> For instance, I don't think this limit increase is necessary on Arm.
> 
>> Nevertheless it is
>> much better than the static upper limit of today as more guests will
>> boot just fine with the new approach.
>>
>> In order not to regress current configs use 1023 as the minimum
>> default setting.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - use max() instead of min()
>> - clarify commit message a little bit
>> ---
>>   tools/libxl/libxl_create.c | 2 +-
> 
> The documentation should be updated.

Oh, indeed.

> 
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index e7cb2dbc2b..c025b21894 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc 
>> *gc,
>>               b_info->iomem[i].gfn = b_info->iomem[i].start;
>>       if (!b_info->event_channels)
>> -        b_info->event_channels = 1023;
>> +        b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);
> 
> What is the 255 for?

Just some headroom for e.g. pv devices.


Juergen
Julien Grall April 6, 2020, 10:37 a.m. UTC | #3
Hi Juergen,

On 06/04/2020 11:17, Jürgen Groß wrote:
> On 06.04.20 11:24, Julien Grall wrote:
>> Hi Jurgen,
>>
>> On 06/04/2020 09:27, Juergen Gross wrote:
>>> Since Xen 4.4 the maximum number of event channels for a guest is
>>> defaulting to 1023. For large guests with lots of vcpus this is not
>>> enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
>>> limiting the guest to about 140 vcpus.
>>
>> Large guests on which arch? Which type of guests?
> 
> I'm pretty sure this applies to x86 only. I'm not aware of event
> channels being used on ARM for IPIs.

How about the guest types?

> 
>>
>>> Instead of requiring to specify the allowed number of event channels
>>> via the "event_channels" domain config option, make the default
>>> depend on the maximum number of vcpus of the guest. This will require
>>> to use the "event_channels" domain config option in fewer cases as
>>> before.
>>>
>>> As different guests will have differing needs the calculation of the
>>> maximum number of event channels can be a rough estimate only,
>>> currently based on the Linux kernel requirements.
>>
>> I am not overly happy to extend the default numbers of event channels 
>> for everyone based on Linux behavior on a given setup. Yes you have 
>> more guests that would be able to run, but at the expense of allowing 
>> a guest to use more xen memory.
> 
> The resulting number would be larger than today only for guests with
> more than 96 vcpus. So I don't think the additional amount of memory
> is really that problematic.
This is not a very forward looking argument. For Arm, we limit to 128 
vCPUs at the moment but it would be possible to support many more (I 
think our vGIC implementation support up to 4096 vCPUs).

So even if your change impacts a small subset, each architectures should 
be able to make the decision on the limit and not imposed by x86 Linux PV.

> 
>>
>> For instance, I don't think this limit increase is necessary on Arm.
>>
>>> Nevertheless it is
>>> much better than the static upper limit of today as more guests will
>>> boot just fine with the new approach.
>>>
>>> In order not to regress current configs use 1023 as the minimum
>>> default setting.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> V2:
>>> - use max() instead of min()
>>> - clarify commit message a little bit
>>> ---
>>>   tools/libxl/libxl_create.c | 2 +-
>>
>> The documentation should be updated.
> 
> Oh, indeed.
> 
>>
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>>> index e7cb2dbc2b..c025b21894 100644
>>> --- a/tools/libxl/libxl_create.c
>>> +++ b/tools/libxl/libxl_create.c
>>> @@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc 
>>> *gc,
>>>               b_info->iomem[i].gfn = b_info->iomem[i].start;
>>>       if (!b_info->event_channels)
>>> -        b_info->event_channels = 1023;
>>> +        b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 
>>> 255);
>>
>> What is the 255 for?
> 
> Just some headroom for e.g. pv devices.

That should really be explained in the commit message and a comment.

Cheers,
Jürgen Groß April 6, 2020, 10:47 a.m. UTC | #4
On 06.04.20 12:37, Julien Grall wrote:
> Hi Juergen,
> 
> On 06/04/2020 11:17, Jürgen Groß wrote:
>> On 06.04.20 11:24, Julien Grall wrote:
>>> Hi Jurgen,
>>>
>>> On 06/04/2020 09:27, Juergen Gross wrote:
>>>> Since Xen 4.4 the maximum number of event channels for a guest is
>>>> defaulting to 1023. For large guests with lots of vcpus this is not
>>>> enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
>>>> limiting the guest to about 140 vcpus.
>>>
>>> Large guests on which arch? Which type of guests?
>>
>> I'm pretty sure this applies to x86 only. I'm not aware of event
>> channels being used on ARM for IPIs.
> 
> How about the guest types?

PV and HVM with PV enhancements.

> 
>>
>>>
>>>> Instead of requiring to specify the allowed number of event channels
>>>> via the "event_channels" domain config option, make the default
>>>> depend on the maximum number of vcpus of the guest. This will require
>>>> to use the "event_channels" domain config option in fewer cases as
>>>> before.
>>>>
>>>> As different guests will have differing needs the calculation of the
>>>> maximum number of event channels can be a rough estimate only,
>>>> currently based on the Linux kernel requirements.
>>>
>>> I am not overly happy to extend the default numbers of event channels 
>>> for everyone based on Linux behavior on a given setup. Yes you have 
>>> more guests that would be able to run, but at the expense of allowing 
>>> a guest to use more xen memory.
>>
>> The resulting number would be larger than today only for guests with
>> more than 96 vcpus. So I don't think the additional amount of memory
>> is really that problematic.
> This is not a very forward looking argument. For Arm, we limit to 128 
> vCPUs at the moment but it would be possible to support many more (I 
> think our vGIC implementation support up to 4096 vCPUs).
> 
> So even if your change impacts a small subset, each architectures should 
> be able to make the decision on the limit and not imposed by x86 Linux PV.

Okay, what about moving the default setting of b_info->event_channels
into libxl__arch_domain_build_info_setdefault() then?

> 
>>
>>>
>>> For instance, I don't think this limit increase is necessary on Arm.
>>>
>>>> Nevertheless it is
>>>> much better than the static upper limit of today as more guests will
>>>> boot just fine with the new approach.
>>>>
>>>> In order not to regress current configs use 1023 as the minimum
>>>> default setting.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>> V2:
>>>> - use max() instead of min()
>>>> - clarify commit message a little bit
>>>> ---
>>>>   tools/libxl/libxl_create.c | 2 +-
>>>
>>> The documentation should be updated.
>>
>> Oh, indeed.
>>
>>>
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>>>> index e7cb2dbc2b..c025b21894 100644
>>>> --- a/tools/libxl/libxl_create.c
>>>> +++ b/tools/libxl/libxl_create.c
>>>> @@ -226,7 +226,7 @@ int 
>>>> libxl__domain_build_info_setdefault(libxl__gc *gc,
>>>>               b_info->iomem[i].gfn = b_info->iomem[i].start;
>>>>       if (!b_info->event_channels)
>>>> -        b_info->event_channels = 1023;
>>>> +        b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 
>>>> 255);
>>>
>>> What is the 255 for?
>>
>> Just some headroom for e.g. pv devices.
> 
> That should really be explained in the commit message and a comment.

Okay.


Juergen
Ian Jackson April 6, 2020, 10:47 a.m. UTC | #5
Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
> On 06.04.20 11:24, Julien Grall wrote:
> > Large guests on which arch? Which type of guests?
> 
> I'm pretty sure this applies to x86 only. I'm not aware of event
> channels being used on ARM for IPIs.

Should this be arch-dependent then ?  It seems like the figure is just
a heuristic anyway, and ...

> The resulting number would be larger than today only for guests with
> more than 96 vcpus. So I don't think the additional amount of memory
> is really that problematic.

Julien, are there likely to be any ARM guests now which have anywhere
near that number of vcpus ?  If not do we know now what such guests
are likely to be like ?

If this is all hypothetical on ARM it would seem silly to make this
arch-specific for the benefit of ARM given that the ARM implementation
would be entirely guesswork.  Maybe we should postpone that
specialisation until we know better what the ARM function should be
like for these large numbers of vcpus.

If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.

Ian.
Ian Jackson April 6, 2020, 10:52 a.m. UTC | #6
Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
> Okay, what about moving the default setting of b_info->event_channels
> into libxl__arch_domain_build_info_setdefault() then?

FTAOD, if this satisfies ARM maintainers then I am content with it,
even though I doubt the utility.

I guess you should make two patches 1. duplicate the existing formula
(no functional change) 2. change the x86 formula.

I would ack the first and be guided by x86 folks for the 2nd.

Ian.
Julien Grall April 6, 2020, 10:55 a.m. UTC | #7
Hi Ian,

On 06/04/2020 11:47, Ian Jackson wrote:
> Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>> On 06.04.20 11:24, Julien Grall wrote:
>>> Large guests on which arch? Which type of guests?
>>
>> I'm pretty sure this applies to x86 only. I'm not aware of event
>> channels being used on ARM for IPIs.
> 
> Should this be arch-dependent then ?  It seems like the figure is just
> a heuristic anyway, and ...
> 
>> The resulting number would be larger than today only for guests with
>> more than 96 vcpus. So I don't think the additional amount of memory
>> is really that problematic.
> 
> Julien, are there likely to be any ARM guests now which have anywhere
> near that number of vcpus ?  If not do we know now what such guests
> are likely to be like ?

We are meant to support up to 128 vCPUs. But our implementation can 
support up to 4096 vCPUs on vGICv3.

> 
> If this is all hypothetical on ARM it would seem silly to make this
> arch-specific for the benefit of ARM given that the ARM implementation
> would be entirely guesswork.  Maybe we should postpone that
> specialisation until we know better what the ARM function should be
> like for these large numbers of vcpus.

There are no correlation between event channels and vCPUs. The number of 
event channels only depends on the number of frontend you have in your 
guest. So...

> If ARM folks want to have a different formula for the default then
> that is of course fine but I wonder whether this might do ARMk more
> harm than good in this case.

... 1023 event channels is going to be plenty enough for most of the use 
cases.

Cheers,
Ian Jackson April 6, 2020, 11 a.m. UTC | #8
Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
> There are no correlation between event channels and vCPUs. The number of 
> event channels only depends on the number of frontend you have in your 
> guest. So...
> 
> Hi Ian,
> 
> On 06/04/2020 11:47, Ian Jackson wrote:
> > If ARM folks want to have a different formula for the default then
> > that is of course fine but I wonder whether this might do ARMk more
> > harm than good in this case.
> 
> ... 1023 event channels is going to be plenty enough for most of the use 
> cases.

OK, thanks for the quick reply.

So, Jürgen, I think everyone will be happy with this:

Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
> I guess you should make two patches 1. duplicate the existing formula
> (no functional change) 2. change the x86 formula.
> 
> I would ack the first and be guided by x86 folks for the 2nd.

Ian.
Jürgen Groß April 6, 2020, 11:03 a.m. UTC | #9
On 06.04.20 13:00, Ian Jackson wrote:
> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>> There are no correlation between event channels and vCPUs. The number of
>> event channels only depends on the number of frontend you have in your
>> guest. So...
>>
>> Hi Ian,
>>
>> On 06/04/2020 11:47, Ian Jackson wrote:
>>> If ARM folks want to have a different formula for the default then
>>> that is of course fine but I wonder whether this might do ARMk more
>>> harm than good in this case.
>>
>> ... 1023 event channels is going to be plenty enough for most of the use
>> cases.
> 
> OK, thanks for the quick reply.
> 
> So, Jürgen, I think everyone will be happy with this:
> 
> Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>> I guess you should make two patches 1. duplicate the existing formula
>> (no functional change) 2. change the x86 formula.
>>
>> I would ack the first and be guided by x86 folks for the 2nd.

Okay, thanks. Will do that.


Juergen
Jan Beulich April 6, 2020, 11:11 a.m. UTC | #10
On 06.04.2020 13:00, Ian Jackson wrote:
> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>> There are no correlation between event channels and vCPUs. The number of 
>> event channels only depends on the number of frontend you have in your 
>> guest. So...
>>
>> Hi Ian,
>>
>> On 06/04/2020 11:47, Ian Jackson wrote:
>>> If ARM folks want to have a different formula for the default then
>>> that is of course fine but I wonder whether this might do ARMk more
>>> harm than good in this case.
>>
>> ... 1023 event channels is going to be plenty enough for most of the use 
>> cases.
> 
> OK, thanks for the quick reply.
> 
> So, Jürgen, I think everyone will be happy with this:

I don't think I will be - my prior comment still holds on there not
being any grounds to use a specific OS kernel's (and to be precise
a specific OS kernel version's) requirements for determining
defaults. If there was to be such a dependency, then OS kernel
[variant] should be part of the inputs to such a (set of) formula(s).

Jan

> Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>> I guess you should make two patches 1. duplicate the existing formula
>> (no functional change) 2. change the x86 formula.
>>
>> I would ack the first and be guided by x86 folks for the 2nd.
> 
> Ian.
>
Jürgen Groß April 6, 2020, 11:54 a.m. UTC | #11
On 06.04.20 13:11, Jan Beulich wrote:
> On 06.04.2020 13:00, Ian Jackson wrote:
>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>> There are no correlation between event channels and vCPUs. The number of
>>> event channels only depends on the number of frontend you have in your
>>> guest. So...
>>>
>>> Hi Ian,
>>>
>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>> If ARM folks want to have a different formula for the default then
>>>> that is of course fine but I wonder whether this might do ARMk more
>>>> harm than good in this case.
>>>
>>> ... 1023 event channels is going to be plenty enough for most of the use
>>> cases.
>>
>> OK, thanks for the quick reply.
>>
>> So, Jürgen, I think everyone will be happy with this:
> 
> I don't think I will be - my prior comment still holds on there not
> being any grounds to use a specific OS kernel's (and to be precise
> a specific OS kernel version's) requirements for determining
> defaults. If there was to be such a dependency, then OS kernel
> [variant] should be part of the inputs to such a (set of) formula(s).

IMO this kind of trying to be perfect will completely block a sane
heuristic for being able to boot large guests at all.

The patch isn't about to find an as stringent as possible upper
boundary for huge guests, but a sane value being able to boot most of
those.

And how should Xen know the OS kernel needs exactly after all?

And it is not that we talking about megabytes of additional memory. A
guest with 256 vcpus will just be able to use additional 36 memory
pages. The maximum non-PV domain (the probably only relevant case
of another OS than Linux being used) with 128 vcpus would "waste"
32 kB. In case the guest misbehaves.

The alternative would be to do nothing and having to let the user
experience a somewhat cryptic guest crash. He could google for a
possible solution which would probably end in a rather high static
limit resulting in wasting even more memory.


Juergen
Jan Beulich April 6, 2020, 12:09 p.m. UTC | #12
On 06.04.2020 13:54, Jürgen Groß wrote:
> On 06.04.20 13:11, Jan Beulich wrote:
>> On 06.04.2020 13:00, Ian Jackson wrote:
>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>>> There are no correlation between event channels and vCPUs. The number of
>>>> event channels only depends on the number of frontend you have in your
>>>> guest. So...
>>>>
>>>> Hi Ian,
>>>>
>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>> If ARM folks want to have a different formula for the default then
>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>> harm than good in this case.
>>>>
>>>> ... 1023 event channels is going to be plenty enough for most of the use
>>>> cases.
>>>
>>> OK, thanks for the quick reply.
>>>
>>> So, Jürgen, I think everyone will be happy with this:
>>
>> I don't think I will be - my prior comment still holds on there not
>> being any grounds to use a specific OS kernel's (and to be precise
>> a specific OS kernel version's) requirements for determining
>> defaults. If there was to be such a dependency, then OS kernel
>> [variant] should be part of the inputs to such a (set of) formula(s).
> 
> IMO this kind of trying to be perfect will completely block a sane
> heuristic for being able to boot large guests at all.

This isn't about being perfect - I'm suggesting to leave the
default alone, not to improve the calculation, not the least
because I've been implying ...

> The patch isn't about to find an as stringent as possible upper
> boundary for huge guests, but a sane value being able to boot most of
> those.
> 
> And how should Xen know the OS kernel needs exactly after all?

... the answer of "It can#t" to this question.

> And it is not that we talking about megabytes of additional memory. A
> guest with 256 vcpus will just be able to use additional 36 memory
> pages. The maximum non-PV domain (the probably only relevant case
> of another OS than Linux being used) with 128 vcpus would "waste"
> 32 kB. In case the guest misbehaves.

Any extra page counts, or else - where do you draw the line? Any
single page may decide between Xen (not) being out of memory,
and hence also not being able to fulfill certain other requests.

> The alternative would be to do nothing and having to let the user
> experience a somewhat cryptic guest crash. He could google for a
> possible solution which would probably end in a rather high static
> limit resulting in wasting even more memory.

I realize this. Otoh more people running into this will improve
the chances of later ones finding useful suggestions. Of course
there's also nothing wrong with trying to make the error less
cryptic.

Jan
Jürgen Groß June 2, 2020, 11:06 a.m. UTC | #13
On 06.04.20 14:09, Jan Beulich wrote:
> On 06.04.2020 13:54, Jürgen Groß wrote:
>> On 06.04.20 13:11, Jan Beulich wrote:
>>> On 06.04.2020 13:00, Ian Jackson wrote:
>>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>>>> There are no correlation between event channels and vCPUs. The number of
>>>>> event channels only depends on the number of frontend you have in your
>>>>> guest. So...
>>>>>
>>>>> Hi Ian,
>>>>>
>>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>>> If ARM folks want to have a different formula for the default then
>>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>>> harm than good in this case.
>>>>>
>>>>> ... 1023 event channels is going to be plenty enough for most of the use
>>>>> cases.
>>>>
>>>> OK, thanks for the quick reply.
>>>>
>>>> So, Jürgen, I think everyone will be happy with this:
>>>
>>> I don't think I will be - my prior comment still holds on there not
>>> being any grounds to use a specific OS kernel's (and to be precise
>>> a specific OS kernel version's) requirements for determining
>>> defaults. If there was to be such a dependency, then OS kernel
>>> [variant] should be part of the inputs to such a (set of) formula(s).
>>
>> IMO this kind of trying to be perfect will completely block a sane
>> heuristic for being able to boot large guests at all.
> 
> This isn't about being perfect - I'm suggesting to leave the
> default alone, not to improve the calculation, not the least
> because I've been implying ...
> 
>> The patch isn't about to find an as stringent as possible upper
>> boundary for huge guests, but a sane value being able to boot most of
>> those.
>>
>> And how should Xen know the OS kernel needs exactly after all?
> 
> ... the answer of "It can#t" to this question.
> 
>> And it is not that we talking about megabytes of additional memory. A
>> guest with 256 vcpus will just be able to use additional 36 memory
>> pages. The maximum non-PV domain (the probably only relevant case
>> of another OS than Linux being used) with 128 vcpus would "waste"
>> 32 kB. In case the guest misbehaves.
> 
> Any extra page counts, or else - where do you draw the line? Any
> single page may decide between Xen (not) being out of memory,
> and hence also not being able to fulfill certain other requests.
> 
>> The alternative would be to do nothing and having to let the user
>> experience a somewhat cryptic guest crash. He could google for a
>> possible solution which would probably end in a rather high static
>> limit resulting in wasting even more memory.
> 
> I realize this. Otoh more people running into this will improve
> the chances of later ones finding useful suggestions. Of course
> there's also nothing wrong with trying to make the error less
> cryptic.

Reviving this discussion.

I strongly disagree with your reasoning.

Rejecting to modify tools defaults for large guests to make them boot
is a bad move IMO. We are driving more people away from Xen this way.

The fear of a misbehaving guest of that size to use a few additional
pages on a machine with at least 100 cpus is fine from the academical
point of view, but should not be weighed higher than the usability
aspect in this case IMO.


Juergen
Jan Beulich June 2, 2020, 11:12 a.m. UTC | #14
On 02.06.2020 13:06, Jürgen Groß wrote:
> On 06.04.20 14:09, Jan Beulich wrote:
>> On 06.04.2020 13:54, Jürgen Groß wrote:
>>> On 06.04.20 13:11, Jan Beulich wrote:
>>>> On 06.04.2020 13:00, Ian Jackson wrote:
>>>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>>>>> There are no correlation between event channels and vCPUs. The number of
>>>>>> event channels only depends on the number of frontend you have in your
>>>>>> guest. So...
>>>>>>
>>>>>> Hi Ian,
>>>>>>
>>>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>>>> If ARM folks want to have a different formula for the default then
>>>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>>>> harm than good in this case.
>>>>>>
>>>>>> ... 1023 event channels is going to be plenty enough for most of the use
>>>>>> cases.
>>>>>
>>>>> OK, thanks for the quick reply.
>>>>>
>>>>> So, Jürgen, I think everyone will be happy with this:
>>>>
>>>> I don't think I will be - my prior comment still holds on there not
>>>> being any grounds to use a specific OS kernel's (and to be precise
>>>> a specific OS kernel version's) requirements for determining
>>>> defaults. If there was to be such a dependency, then OS kernel
>>>> [variant] should be part of the inputs to such a (set of) formula(s).
>>>
>>> IMO this kind of trying to be perfect will completely block a sane
>>> heuristic for being able to boot large guests at all.
>>
>> This isn't about being perfect - I'm suggesting to leave the
>> default alone, not to improve the calculation, not the least
>> because I've been implying ...
>>
>>> The patch isn't about to find an as stringent as possible upper
>>> boundary for huge guests, but a sane value being able to boot most of
>>> those.
>>>
>>> And how should Xen know the OS kernel needs exactly after all?
>>
>> ... the answer of "It can#t" to this question.
>>
>>> And it is not that we talking about megabytes of additional memory. A
>>> guest with 256 vcpus will just be able to use additional 36 memory
>>> pages. The maximum non-PV domain (the probably only relevant case
>>> of another OS than Linux being used) with 128 vcpus would "waste"
>>> 32 kB. In case the guest misbehaves.
>>
>> Any extra page counts, or else - where do you draw the line? Any
>> single page may decide between Xen (not) being out of memory,
>> and hence also not being able to fulfill certain other requests.
>>
>>> The alternative would be to do nothing and having to let the user
>>> experience a somewhat cryptic guest crash. He could google for a
>>> possible solution which would probably end in a rather high static
>>> limit resulting in wasting even more memory.
>>
>> I realize this. Otoh more people running into this will improve
>> the chances of later ones finding useful suggestions. Of course
>> there's also nothing wrong with trying to make the error less
>> cryptic.
> 
> Reviving this discussion.
> 
> I strongly disagree with your reasoning.
> 
> Rejecting to modify tools defaults for large guests to make them boot
> is a bad move IMO. We are driving more people away from Xen this way.
> 
> The fear of a misbehaving guest of that size to use a few additional
> pages on a machine with at least 100 cpus is fine from the academical
> point of view, but should not be weighed higher than the usability
> aspect in this case IMO.

Very simple question then: Where do you draw the boundary if you don't
want this to be a pure "is permitted" or "is not permitted" underlying
rule? If we had a model where _all_ resources consumed by a guest were
accounted against its tool stack requested allocation, things would be
easier.

Jan
Jürgen Groß June 2, 2020, 11:23 a.m. UTC | #15
On 02.06.20 13:12, Jan Beulich wrote:
> On 02.06.2020 13:06, Jürgen Groß wrote:
>> On 06.04.20 14:09, Jan Beulich wrote:
>>> On 06.04.2020 13:54, Jürgen Groß wrote:
>>>> On 06.04.20 13:11, Jan Beulich wrote:
>>>>> On 06.04.2020 13:00, Ian Jackson wrote:
>>>>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>>>>>> There are no correlation between event channels and vCPUs. The number of
>>>>>>> event channels only depends on the number of frontend you have in your
>>>>>>> guest. So...
>>>>>>>
>>>>>>> Hi Ian,
>>>>>>>
>>>>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>>>>> If ARM folks want to have a different formula for the default then
>>>>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>>>>> harm than good in this case.
>>>>>>>
>>>>>>> ... 1023 event channels is going to be plenty enough for most of the use
>>>>>>> cases.
>>>>>>
>>>>>> OK, thanks for the quick reply.
>>>>>>
>>>>>> So, Jürgen, I think everyone will be happy with this:
>>>>>
>>>>> I don't think I will be - my prior comment still holds on there not
>>>>> being any grounds to use a specific OS kernel's (and to be precise
>>>>> a specific OS kernel version's) requirements for determining
>>>>> defaults. If there was to be such a dependency, then OS kernel
>>>>> [variant] should be part of the inputs to such a (set of) formula(s).
>>>>
>>>> IMO this kind of trying to be perfect will completely block a sane
>>>> heuristic for being able to boot large guests at all.
>>>
>>> This isn't about being perfect - I'm suggesting to leave the
>>> default alone, not to improve the calculation, not the least
>>> because I've been implying ...
>>>
>>>> The patch isn't about to find an as stringent as possible upper
>>>> boundary for huge guests, but a sane value being able to boot most of
>>>> those.
>>>>
>>>> And how should Xen know the OS kernel needs exactly after all?
>>>
>>> ... the answer of "It can#t" to this question.
>>>
>>>> And it is not that we talking about megabytes of additional memory. A
>>>> guest with 256 vcpus will just be able to use additional 36 memory
>>>> pages. The maximum non-PV domain (the probably only relevant case
>>>> of another OS than Linux being used) with 128 vcpus would "waste"
>>>> 32 kB. In case the guest misbehaves.
>>>
>>> Any extra page counts, or else - where do you draw the line? Any
>>> single page may decide between Xen (not) being out of memory,
>>> and hence also not being able to fulfill certain other requests.
>>>
>>>> The alternative would be to do nothing and having to let the user
>>>> experience a somewhat cryptic guest crash. He could google for a
>>>> possible solution which would probably end in a rather high static
>>>> limit resulting in wasting even more memory.
>>>
>>> I realize this. Otoh more people running into this will improve
>>> the chances of later ones finding useful suggestions. Of course
>>> there's also nothing wrong with trying to make the error less
>>> cryptic.
>>
>> Reviving this discussion.
>>
>> I strongly disagree with your reasoning.
>>
>> Rejecting to modify tools defaults for large guests to make them boot
>> is a bad move IMO. We are driving more people away from Xen this way.
>>
>> The fear of a misbehaving guest of that size to use a few additional
>> pages on a machine with at least 100 cpus is fine from the academical
>> point of view, but should not be weighed higher than the usability
>> aspect in this case IMO.
> 
> Very simple question then: Where do you draw the boundary if you don't
> want this to be a pure "is permitted" or "is not permitted" underlying
> rule? If we had a model where _all_ resources consumed by a guest were
> accounted against its tool stack requested allocation, things would be
> easier.

I'd say it should be allowed in case the additional resource use is much
smaller than the already used implicit resources for such a guest (e.g.
less than an additional 1% of implicitly used memory).

In cases like this, where a very small subset of guests is affected, and
the additional need of resources will apply only in very extreme cases
(I'm considering this case as extreme, as only non-Linux guests with
huge numbers of vcpus _and_ which are misbehaving will need additional
resources) I'd even accept higher margins like 5%.


Juergen
Jan Beulich June 2, 2020, 1:21 p.m. UTC | #16
On 02.06.2020 13:23, Jürgen Groß wrote:
> On 02.06.20 13:12, Jan Beulich wrote:
>> On 02.06.2020 13:06, Jürgen Groß wrote:
>>> On 06.04.20 14:09, Jan Beulich wrote:
>>>> On 06.04.2020 13:54, Jürgen Groß wrote:
>>>>> On 06.04.20 13:11, Jan Beulich wrote:
>>>>>> On 06.04.2020 13:00, Ian Jackson wrote:
>>>>>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus"):
>>>>>>>> There are no correlation between event channels and vCPUs. The number of
>>>>>>>> event channels only depends on the number of frontend you have in your
>>>>>>>> guest. So...
>>>>>>>>
>>>>>>>> Hi Ian,
>>>>>>>>
>>>>>>>> On 06/04/2020 11:47, Ian Jackson wrote:
>>>>>>>>> If ARM folks want to have a different formula for the default then
>>>>>>>>> that is of course fine but I wonder whether this might do ARMk more
>>>>>>>>> harm than good in this case.
>>>>>>>>
>>>>>>>> ... 1023 event channels is going to be plenty enough for most of the use
>>>>>>>> cases.
>>>>>>>
>>>>>>> OK, thanks for the quick reply.
>>>>>>>
>>>>>>> So, Jürgen, I think everyone will be happy with this:
>>>>>>
>>>>>> I don't think I will be - my prior comment still holds on there not
>>>>>> being any grounds to use a specific OS kernel's (and to be precise
>>>>>> a specific OS kernel version's) requirements for determining
>>>>>> defaults. If there was to be such a dependency, then OS kernel
>>>>>> [variant] should be part of the inputs to such a (set of) formula(s).
>>>>>
>>>>> IMO this kind of trying to be perfect will completely block a sane
>>>>> heuristic for being able to boot large guests at all.
>>>>
>>>> This isn't about being perfect - I'm suggesting to leave the
>>>> default alone, not to improve the calculation, not the least
>>>> because I've been implying ...
>>>>
>>>>> The patch isn't about to find an as stringent as possible upper
>>>>> boundary for huge guests, but a sane value being able to boot most of
>>>>> those.
>>>>>
>>>>> And how should Xen know the OS kernel needs exactly after all?
>>>>
>>>> ... the answer of "It can#t" to this question.
>>>>
>>>>> And it is not that we talking about megabytes of additional memory. A
>>>>> guest with 256 vcpus will just be able to use additional 36 memory
>>>>> pages. The maximum non-PV domain (the probably only relevant case
>>>>> of another OS than Linux being used) with 128 vcpus would "waste"
>>>>> 32 kB. In case the guest misbehaves.
>>>>
>>>> Any extra page counts, or else - where do you draw the line? Any
>>>> single page may decide between Xen (not) being out of memory,
>>>> and hence also not being able to fulfill certain other requests.
>>>>
>>>>> The alternative would be to do nothing and having to let the user
>>>>> experience a somewhat cryptic guest crash. He could google for a
>>>>> possible solution which would probably end in a rather high static
>>>>> limit resulting in wasting even more memory.
>>>>
>>>> I realize this. Otoh more people running into this will improve
>>>> the chances of later ones finding useful suggestions. Of course
>>>> there's also nothing wrong with trying to make the error less
>>>> cryptic.
>>>
>>> Reviving this discussion.
>>>
>>> I strongly disagree with your reasoning.
>>>
>>> Rejecting to modify tools defaults for large guests to make them boot
>>> is a bad move IMO. We are driving more people away from Xen this way.
>>>
>>> The fear of a misbehaving guest of that size to use a few additional
>>> pages on a machine with at least 100 cpus is fine from the academical
>>> point of view, but should not be weighed higher than the usability
>>> aspect in this case IMO.
>>
>> Very simple question then: Where do you draw the boundary if you don't
>> want this to be a pure "is permitted" or "is not permitted" underlying
>> rule? If we had a model where _all_ resources consumed by a guest were
>> accounted against its tool stack requested allocation, things would be
>> easier.
> 
> I'd say it should be allowed in case the additional resource use is much
> smaller than the already used implicit resources for such a guest (e.g.
> less than an additional 1% of implicitly used memory).
> 
> In cases like this, where a very small subset of guests is affected, and
> the additional need of resources will apply only in very extreme cases
> (I'm considering this case as extreme, as only non-Linux guests with
> huge numbers of vcpus _and_ which are misbehaving will need additional
> resources) I'd even accept higher margins like 5%.

IOW if we had 20 such cases, doubling resource consumption would be
okay to you? Not to me...

FAOD: If I'm the (almost?) only one to object here, I'll be okay to
be outvoted. But I'd like people agreeing with the change to
explicitly ack that they're fine with the unwanted (as I'd call it)
side effects.

Jan
diff mbox series

Patch

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@  int libxl__domain_build_info_setdefault(libxl__gc *gc,
             b_info->iomem[i].gfn = b_info->iomem[i].start;
 
     if (!b_info->event_channels)
-        b_info->event_channels = 1023;
+        b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);
 
     libxl__arch_domain_build_info_setdefault(gc, b_info);
     libxl_defbool_setdefault(&b_info->dm_restrict, false);