diff mbox

[v2,3/3] tools: introduce parameter max_wp_ram_ranges.

Message ID 1453432840-5319-4-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu Zhang Jan. 22, 2016, 3:20 a.m. UTC
A new parameter - max_wp_ram_ranges is added to set the upper limit
of write-protected ram ranges to be tracked inside one ioreq server
rangeset.

Ioreq server uses a group of rangesets to track the I/O or memory
resources to be emulated. Default limit of ranges that one rangeset
can allocate is set to a small value, due to the fact that these ranges
are allocated in xen heap. Yet for the write-protected ram ranges,
there are circumstances under which the upper limit inside one rangeset
should exceed the default one. E.g. in XenGT, when tracking the
per-process graphic translation tables on intel broadwell platforms,
the number of page tables concerned will be several thousand(normally
in this case, 8192 could be a big enough value). Users who set this
item explicitly are supposed to know the specific scenarios that
necessitate this configuration.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 docs/man/xl.cfg.pod.5           | 18 ++++++++++++++++++
 tools/libxl/libxl.h             |  5 +++++
 tools/libxl/libxl_dom.c         |  3 +++
 tools/libxl/libxl_types.idl     |  1 +
 tools/libxl/xl_cmdimpl.c        |  4 ++++
 xen/arch/x86/hvm/hvm.c          | 10 +++++++++-
 xen/include/public/hvm/params.h |  5 ++++-
 7 files changed, 44 insertions(+), 2 deletions(-)

Comments

Jan Beulich Jan. 22, 2016, 8:01 a.m. UTC | #1
>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> hvm_ioreq_server *s,
>  {
>      unsigned int i;
>      int rc;
> +    unsigned int max_wp_ram_ranges =
> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
> +        MAX_NR_IO_RANGES;

Besides this having stray blanks inside the parentheses it truncates
the value from 64 to 32 bits and would benefit from using the gcc
extension of omitting the middle operand of ?:. But even better
would imo be if you avoided the local variable and ...

> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>          if ( !s->range[i] )
>              goto fail;
>  
> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
> +        else
> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);

... did the entire computation here, using ?: for the second argument
of the function invocation.

> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>      case HVM_PARAM_IOREQ_SERVER_PFN:
>      case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>      case HVM_PARAM_ALTP2M:
> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>          if ( value != 0 && a->value != value )
>              rc = -EEXIST;
>          break;

Is there a particular reason you want this limit to be unchangeable
after having got set once?

Jan
Yu Zhang Jan. 26, 2016, 7:32 a.m. UTC | #2
Thank you, Jan.

On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>> hvm_ioreq_server *s,
>>   {
>>       unsigned int i;
>>       int rc;
>> +    unsigned int max_wp_ram_ranges =
>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>> +        MAX_NR_IO_RANGES;
>
> Besides this having stray blanks inside the parentheses it truncates
> the value from 64 to 32 bits and would benefit from using the gcc
> extension of omitting the middle operand of ?:. But even better
> would imo be if you avoided the local variable and ...
>
After second thought, how about we define a default value for this
parameter in libx.h, and initialize the parameter when creating the
domain with default value if it's not configured.
About this local variable, we keep it, and ...

>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>>           if ( !s->range[i] )
>>               goto fail;
>>
>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>> +        else
>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>
> ... did the entire computation here, using ?: for the second argument
> of the function invocation.
>
... replace the if/else pair with sth. like:
         rangeset_limit(s->range[i],
                        ((i == HVMOP_IO_RANGE_WP_MEM)?
                         max_wp_ram_ranges:
                         MAX_NR_IO_RANGES));
This 'max_wp_ram_ranges' has no particular usages, but the string
"s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] "
is too lengthy, and can easily break the 80 column limitation. :)
Does this approach sounds OK? :)

>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>       case HVM_PARAM_IOREQ_SERVER_PFN:
>>       case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>       case HVM_PARAM_ALTP2M:
>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>           if ( value != 0 && a->value != value )
>>               rc = -EEXIST;
>>           break;
>
> Is there a particular reason you want this limit to be unchangeable
> after having got set once?
>
Well, not exactly. :)
I added this limit because by now we do not have any approach to
change the max range numbers inside ioreq server during run-time.
I can add another patch to introduce an xl command, which can change
it dynamically. But I doubt the necessity of this new command and
am also wonder if this new command would cause more confusion for
the user...
> Jan
>
>
B.R.
Yu
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
Jan Beulich Jan. 26, 2016, 11 a.m. UTC | #3
>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>> hvm_ioreq_server *s,
>>>   {
>>>       unsigned int i;
>>>       int rc;
>>> +    unsigned int max_wp_ram_ranges =
>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>> +        MAX_NR_IO_RANGES;
>>
>> Besides this having stray blanks inside the parentheses it truncates
>> the value from 64 to 32 bits and would benefit from using the gcc
>> extension of omitting the middle operand of ?:. But even better
>> would imo be if you avoided the local variable and ...
>>
> After second thought, how about we define a default value for this
> parameter in libx.h, and initialize the parameter when creating the
> domain with default value if it's not configured.

No, I don't think the tool stack should be determining the default
here (unless you want the default to be zero, and have zero
indeed mean zero).

> About this local variable, we keep it, and ...
> 
>>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> hvm_ioreq_server *s,
>>>           if ( !s->range[i] )
>>>               goto fail;
>>>
>>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>>> +        else
>>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>
>> ... did the entire computation here, using ?: for the second argument
>> of the function invocation.
>>
> ... replace the if/else pair with sth. like:
>          rangeset_limit(s->range[i],
>                         ((i == HVMOP_IO_RANGE_WP_MEM)?
>                          max_wp_ram_ranges:
>                          MAX_NR_IO_RANGES));
> This 'max_wp_ram_ranges' has no particular usages, but the string
> "s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] "
> is too lengthy, and can easily break the 80 column limitation. :)
> Does this approach sounds OK? :)

Seems better than the original, so okay.

>>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>>       case HVM_PARAM_IOREQ_SERVER_PFN:
>>>       case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>>       case HVM_PARAM_ALTP2M:
>>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>>           if ( value != 0 && a->value != value )
>>>               rc = -EEXIST;
>>>           break;
>>
>> Is there a particular reason you want this limit to be unchangeable
>> after having got set once?
>>
> Well, not exactly. :)
> I added this limit because by now we do not have any approach to
> change the max range numbers inside ioreq server during run-time.
> I can add another patch to introduce an xl command, which can change
> it dynamically. But I doubt the necessity of this new command and
> am also wonder if this new command would cause more confusion for
> the user...

And I didn't say you need to expose this to the user. All I asked
was whether you really mean the value to be a set-once one. If
yes, the code above is fine. If no, the code above should be
changed, but there's then still no need to expose a way to
"manually" adjust the value until a need for such arises.

Jan
David Vrabel Jan. 26, 2016, 11:16 a.m. UTC | #4
On 22/01/16 03:20, Yu Zhang wrote:
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -962,6 +962,24 @@ FIFO-based event channel ABI support up to 131,071 event channels.
>  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>  x86).
>  
> +=item B<max_wp_ram_ranges=N>
> +
> +Limit the maximum write-protected ram ranges that can be tracked
> +inside one ioreq server rangeset.
> +
> +Ioreq server uses a group of rangesets to track the I/O or memory
> +resources to be emulated. Default limit of ranges that one rangeset
> +can allocate is set to a small value, due to the fact that these ranges
> +are allocated in xen heap. Yet for the write-protected ram ranges,
> +there are circumstances under which the upper limit inside one rangeset
> +should exceed the default one. E.g. in XenGT, when tracking the per-
> +process graphic translation tables on intel broadwell platforms, the
> +number of page tables concerned will be several thousand(normally
> +in this case, 8192 could be a big enough value). Not configuring this
> +item, or setting its value to 0 will result in the upper limit set
> +to its default one. Users who set his item explicitly are supposed
> +to know the specific scenarios that necessitate this configuration.

This help text isn't very helpful.  How is a user supposed to "know the
specific scenarios" that need this option?

Why doesn't the toolstack (or qemu) automatically set this value based
on whether GVT-g/GVT-d is being used? Then there is no need to even
present this option to the user.

David
Yu Zhang Jan. 27, 2016, 7:01 a.m. UTC | #5
On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>> hvm_ioreq_server *s,
>>>>    {
>>>>        unsigned int i;
>>>>        int rc;
>>>> +    unsigned int max_wp_ram_ranges =
>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>> +        MAX_NR_IO_RANGES;
>>>
>>> Besides this having stray blanks inside the parentheses it truncates
>>> the value from 64 to 32 bits and would benefit from using the gcc
>>> extension of omitting the middle operand of ?:. But even better
>>> would imo be if you avoided the local variable and ...
>>>
>> After second thought, how about we define a default value for this
>> parameter in libx.h, and initialize the parameter when creating the
>> domain with default value if it's not configured.
>
> No, I don't think the tool stack should be determining the default
> here (unless you want the default to be zero, and have zero
> indeed mean zero).
>
Thank you, Jan.
If we do not provide a default value in tool stack, the code above
should be kept, to initialize the local variable with either the one
set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?

>> About this local variable, we keep it, and ...
>>
>>>> @@ -962,7 +966,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>> hvm_ioreq_server *s,
>>>>            if ( !s->range[i] )
>>>>                goto fail;
>>>>
>>>> -        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>>> +        if ( i == HVMOP_IO_RANGE_WP_MEM )
>>>> +            rangeset_limit(s->range[i], max_wp_ram_ranges);
>>>> +        else
>>>> +            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
>>>
>>> ... did the entire computation here, using ?: for the second argument
>>> of the function invocation.
>>>
>> ... replace the if/else pair with sth. like:
>>           rangeset_limit(s->range[i],
>>                          ((i == HVMOP_IO_RANGE_WP_MEM)?
>>                           max_wp_ram_ranges:
>>                           MAX_NR_IO_RANGES));
>> This 'max_wp_ram_ranges' has no particular usages, but the string
>> "s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES]"
>> is too lengthy, and can easily break the 80 column limitation. :)
>> Does this approach sounds OK? :)
>
> Seems better than the original, so okay.
>
>>>> @@ -6009,6 +6016,7 @@ static int hvm_allow_set_param(struct domain *d,
>>>>        case HVM_PARAM_IOREQ_SERVER_PFN:
>>>>        case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>>>>        case HVM_PARAM_ALTP2M:
>>>> +    case HVM_PARAM_MAX_WP_RAM_RANGES:
>>>>            if ( value != 0 && a->value != value )
>>>>                rc = -EEXIST;
>>>>            break;
>>>
>>> Is there a particular reason you want this limit to be unchangeable
>>> after having got set once?
>>>
>> Well, not exactly. :)
>> I added this limit because by now we do not have any approach to
>> change the max range numbers inside ioreq server during run-time.
>> I can add another patch to introduce an xl command, which can change
>> it dynamically. But I doubt the necessity of this new command and
>> am also wonder if this new command would cause more confusion for
>> the user...
>
> And I didn't say you need to expose this to the user. All I asked
> was whether you really mean the value to be a set-once one. If
> yes, the code above is fine. If no, the code above should be
> changed, but there's then still no need to expose a way to
> "manually" adjust the value until a need for such arises.
>

I see. The constraint is not necessary. And I'll remove this code. :)

> Jan
>
>

B.R.
Yu
Yu Zhang Jan. 27, 2016, 7:03 a.m. UTC | #6
On 1/26/2016 7:16 PM, David Vrabel wrote:
> On 22/01/16 03:20, Yu Zhang wrote:
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -962,6 +962,24 @@ FIFO-based event channel ABI support up to 131,071 event channels.
>>   Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>>   x86).
>>
>> +=item B<max_wp_ram_ranges=N>
>> +
>> +Limit the maximum write-protected ram ranges that can be tracked
>> +inside one ioreq server rangeset.
>> +
>> +Ioreq server uses a group of rangesets to track the I/O or memory
>> +resources to be emulated. Default limit of ranges that one rangeset
>> +can allocate is set to a small value, due to the fact that these ranges
>> +are allocated in xen heap. Yet for the write-protected ram ranges,
>> +there are circumstances under which the upper limit inside one rangeset
>> +should exceed the default one. E.g. in XenGT, when tracking the per-
>> +process graphic translation tables on intel broadwell platforms, the
>> +number of page tables concerned will be several thousand(normally
>> +in this case, 8192 could be a big enough value). Not configuring this
>> +item, or setting its value to 0 will result in the upper limit set
>> +to its default one. Users who set his item explicitly are supposed
>> +to know the specific scenarios that necessitate this configuration.
>
> This help text isn't very helpful.  How is a user supposed to "know the
> specific scenarios" that need this option?
>

Thank you for your comment, David. :)

Well, "know the specific scenarios" may seem too ambiguous. Here the
"specific scenarios" means when this parameter is used:
1> for virtual devices other than vGPU in GVT-g;
2> for GVT-g, there also might be some extreme cases, e.g. too many
graphic related applications in one VM, which create a great deal of
per-process graphic translation tables.
3> for GVT-g, future cpu platforms which provide even more PPGGTs.
Other than these cases, 8192 is a suggested value for this option.

So how about we add a section to point out these scenarios in this
text?

> Why doesn't the toolstack (or qemu) automatically set this value based
> on whether GVT-g/GVT-d is being used? Then there is no need to even
> present this option to the user.
>
> David
>

By now, this parameter is used in GVT-g, but we are expecting more
usages for other devices which adopt this mediated pass-through idea.
Indeed, XenGT has an xl configuration flag, and several other XenGT
specific parameters. We have plans to upstream these options later
this year. After these XenGT options are accepted, we can set this
"max_wp_ram_ranges" to a default value if GVT-g is detected and the
"max_wp_ram_ranges" is not explicitly configured.
Jan Beulich Jan. 27, 2016, 10:27 a.m. UTC | #7
>>> On 27.01.16 at 08:01, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>>> hvm_ioreq_server *s,
>>>>>    {
>>>>>        unsigned int i;
>>>>>        int rc;
>>>>> +    unsigned int max_wp_ram_ranges =
>>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>>> +        MAX_NR_IO_RANGES;
>>>>
>>>> Besides this having stray blanks inside the parentheses it truncates
>>>> the value from 64 to 32 bits and would benefit from using the gcc
>>>> extension of omitting the middle operand of ?:. But even better
>>>> would imo be if you avoided the local variable and ...
>>>>
>>> After second thought, how about we define a default value for this
>>> parameter in libx.h, and initialize the parameter when creating the
>>> domain with default value if it's not configured.
>>
>> No, I don't think the tool stack should be determining the default
>> here (unless you want the default to be zero, and have zero
>> indeed mean zero).
>>
> Thank you, Jan.
> If we do not provide a default value in tool stack, the code above
> should be kept, to initialize the local variable with either the one
> set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?

Well, not exactly: For one, the original comment (still present
above) regarding truncation holds. And then another question is:
Do you expect this resource type to be useful with its number of
ranges limited to MAX_NR_IO_RANGES? I ask because if the
answer is "no", having it default to zero might be as reasonable.

Jan
Yu Zhang Jan. 27, 2016, 2:13 p.m. UTC | #8
On 1/27/2016 6:27 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 08:01, <yu.c.zhang@linux.intel.com> wrote:
>
>>
>> On 1/26/2016 7:00 PM, Jan Beulich wrote:
>>>>>> On 26.01.16 at 08:32, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 1/22/2016 4:01 PM, Jan Beulich wrote:
>>>>>>>> On 22.01.16 at 04:20, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>>> @@ -940,6 +940,10 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>>>>> hvm_ioreq_server *s,
>>>>>>     {
>>>>>>         unsigned int i;
>>>>>>         int rc;
>>>>>> +    unsigned int max_wp_ram_ranges =
>>>>>> +        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
>>>>>> +        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
>>>>>> +        MAX_NR_IO_RANGES;
>>>>>
>>>>> Besides this having stray blanks inside the parentheses it truncates
>>>>> the value from 64 to 32 bits and would benefit from using the gcc
>>>>> extension of omitting the middle operand of ?:. But even better
>>>>> would imo be if you avoided the local variable and ...
>>>>>
>>>> After second thought, how about we define a default value for this
>>>> parameter in libx.h, and initialize the parameter when creating the
>>>> domain with default value if it's not configured.
>>>
>>> No, I don't think the tool stack should be determining the default
>>> here (unless you want the default to be zero, and have zero
>>> indeed mean zero).
>>>
>> Thank you, Jan.
>> If we do not provide a default value in tool stack, the code above
>> should be kept, to initialize the local variable with either the one
>> set in the configuration file, or with MAX_NR_IO_RANGES. Is this OK?
>
> Well, not exactly: For one, the original comment (still present
> above) regarding truncation holds. And then another question is:
> Do you expect this resource type to be useful with its number of
> ranges limited to MAX_NR_IO_RANGES? I ask because if the
> answer is "no", having it default to zero might be as reasonable.
>

Thanks, Jan.

About the default value:
   You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
limited conditions. Having it default to zero means XenGT users must
manually configure this option. Since we have plans to push other XenGT
tool stack parameters(including a GVT-g flag), how about we set this
max_wp_ram_ranges to a default one when GVT-g flag is detected, and
till then, max_wp_ram_ranges is supposed to be configured explicitly for
XenGT?

About the truncation issue:
   I do not quite follow. Will this hurt if the value configured does
not exceed 4G? What about a type cast?

B.R.
Yu



> Jan
>
>
Jan Beulich Jan. 27, 2016, 2:32 p.m. UTC | #9
>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
> About the default value:
>    You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
> limited conditions. Having it default to zero means XenGT users must
> manually configure this option. Since we have plans to push other XenGT
> tool stack parameters(including a GVT-g flag), how about we set this
> max_wp_ram_ranges to a default one when GVT-g flag is detected, and
> till then, max_wp_ram_ranges is supposed to be configured explicitly for
> XenGT?

Sounds reasonable, and in line with what iirc was discussed on
the tool stack side.

> About the truncation issue:
>    I do not quite follow. Will this hurt if the value configured does
> not exceed 4G? What about a type cast?

A typecast would not alter behavior in any way. And of course
a problem only arises if the value was above 4 billion. You either
need to refuse such values while the attempt is made to set it.
or you need to deal with the full range of possible values. Likely
the former is the better (and I wonder whether the upper
bound shouldn't be forced even lower than 4 billion).

Jan
Yu Zhang Jan. 27, 2016, 2:56 p.m. UTC | #10
On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>> About the default value:
>>     You are right. :) For XenGT, MAX_NR_IO_RANGES may only work under
>> limited conditions. Having it default to zero means XenGT users must
>> manually configure this option. Since we have plans to push other XenGT
>> tool stack parameters(including a GVT-g flag), how about we set this
>> max_wp_ram_ranges to a default one when GVT-g flag is detected, and
>> till then, max_wp_ram_ranges is supposed to be configured explicitly for
>> XenGT?
>
> Sounds reasonable, and in line with what iirc was discussed on
> the tool stack side.
>

Great, and thanks.

>> About the truncation issue:
>>     I do not quite follow. Will this hurt if the value configured does
>> not exceed 4G? What about a type cast?
>
> A typecast would not alter behavior in any way. And of course
> a problem only arises if the value was above 4 billion. You either
> need to refuse such values while the attempt is made to set it.
> or you need to deal with the full range of possible values. Likely
> the former is the better (and I wonder whether the upper
> bound shouldn't be forced even lower than 4 billion).
>

Oh, I see. A check with the upper bound sounds better. Using 4G as the
upper bound is a little conservative, but I do not have any better
criteria right now. :)

> Jan
>
>

Thanks
Yu
Jan Beulich Jan. 27, 2016, 3:12 p.m. UTC | #11
>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>> About the truncation issue:
>>>     I do not quite follow. Will this hurt if the value configured does
>>> not exceed 4G? What about a type cast?
>>
>> A typecast would not alter behavior in any way. And of course
>> a problem only arises if the value was above 4 billion. You either
>> need to refuse such values while the attempt is made to set it.
>> or you need to deal with the full range of possible values. Likely
>> the former is the better (and I wonder whether the upper
>> bound shouldn't be forced even lower than 4 billion).
> 
> Oh, I see. A check with the upper bound sounds better. Using 4G as the
> upper bound is a little conservative, but I do not have any better
> criteria right now. :)

But when making that decision keep security in mind: How much
memory would it take to populate 4G rangeset nodes?

Jan
Yu Zhang Jan. 27, 2016, 3:23 p.m. UTC | #12
On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>> About the truncation issue:
>>>>      I do not quite follow. Will this hurt if the value configured does
>>>> not exceed 4G? What about a type cast?
>>>
>>> A typecast would not alter behavior in any way. And of course
>>> a problem only arises if the value was above 4 billion. You either
>>> need to refuse such values while the attempt is made to set it.
>>> or you need to deal with the full range of possible values. Likely
>>> the former is the better (and I wonder whether the upper
>>> bound shouldn't be forced even lower than 4 billion).
>>
>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>> upper bound is a little conservative, but I do not have any better
>> criteria right now. :)
>
> But when making that decision keep security in mind: How much
> memory would it take to populate 4G rangeset nodes?
>
Well, for XenGT, one extreme case I can imagine would be half of all
the guest ram is used as the GPU page table, and page frames containing
these page tables are discontinuous (rangeset can combine continuous
ranges). For other virtual devices to leverage the write-protected gfn
rangeset, I believe the same idea applies. :)
Is this logic OK?

Thanks
Yu
Jan Beulich Jan. 27, 2016, 3:58 p.m. UTC | #13
>>> On 27.01.16 at 16:23, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>>> About the truncation issue:
>>>>>      I do not quite follow. Will this hurt if the value configured does
>>>>> not exceed 4G? What about a type cast?
>>>>
>>>> A typecast would not alter behavior in any way. And of course
>>>> a problem only arises if the value was above 4 billion. You either
>>>> need to refuse such values while the attempt is made to set it.
>>>> or you need to deal with the full range of possible values. Likely
>>>> the former is the better (and I wonder whether the upper
>>>> bound shouldn't be forced even lower than 4 billion).
>>>
>>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>>> upper bound is a little conservative, but I do not have any better
>>> criteria right now. :)
>>
>> But when making that decision keep security in mind: How much
>> memory would it take to populate 4G rangeset nodes?
>>
> Well, for XenGT, one extreme case I can imagine would be half of all
> the guest ram is used as the GPU page table, and page frames containing
> these page tables are discontinuous (rangeset can combine continuous
> ranges). For other virtual devices to leverage the write-protected gfn
> rangeset, I believe the same idea applies. :)
> Is this logic OK?

I can follow it, yes, but 4G ranges mean 16Tb of memory put
in page tables, which to be honest doesn't seem reasonable to
me.

Jan
Yu Zhang Jan. 27, 2016, 4:12 p.m. UTC | #14
On 1/27/2016 11:58 PM, Jan Beulich wrote:
>>>> On 27.01.16 at 16:23, <yu.c.zhang@linux.intel.com> wrote:
>
>>
>> On 1/27/2016 11:12 PM, Jan Beulich wrote:
>>>>>> On 27.01.16 at 15:56, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 1/27/2016 10:32 PM, Jan Beulich wrote:
>>>>>>>> On 27.01.16 at 15:13, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> About the truncation issue:
>>>>>>       I do not quite follow. Will this hurt if the value configured does
>>>>>> not exceed 4G? What about a type cast?
>>>>>
>>>>> A typecast would not alter behavior in any way. And of course
>>>>> a problem only arises if the value was above 4 billion. You either
>>>>> need to refuse such values while the attempt is made to set it.
>>>>> or you need to deal with the full range of possible values. Likely
>>>>> the former is the better (and I wonder whether the upper
>>>>> bound shouldn't be forced even lower than 4 billion).
>>>>
>>>> Oh, I see. A check with the upper bound sounds better. Using 4G as the
>>>> upper bound is a little conservative, but I do not have any better
>>>> criteria right now. :)
>>>
>>> But when making that decision keep security in mind: How much
>>> memory would it take to populate 4G rangeset nodes?
>>>
>> Well, for XenGT, one extreme case I can imagine would be half of all
>> the guest ram is used as the GPU page table, and page frames containing
>> these page tables are discontinuous (rangeset can combine continuous
>> ranges). For other virtual devices to leverage the write-protected gfn
>> rangeset, I believe the same idea applies. :)
>> Is this logic OK?
>
> I can follow it, yes, but 4G ranges mean 16Tb of memory put
> in page tables, which to be honest doesn't seem reasonable to
> me.
>

Thanks for your reply, Jan.
In most cases max_memkb in configuration file will not be set to such a
big value. So I'd suggest we use a comparison between the one
calculated from max_memkb and 4G, and choose the smaller one as upper
bound. If some time in the future, it becomes common cases for VMs to
use huge rams, we should use uint64 for the rangeset limit other than a 
uint32.

Yu
diff mbox

Patch

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8899f75..7634c42 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -962,6 +962,24 @@  FIFO-based event channel ABI support up to 131,071 event channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B<max_wp_ram_ranges=N>
+
+Limit the maximum write-protected ram ranges that can be tracked
+inside one ioreq server rangeset.
+
+Ioreq server uses a group of rangesets to track the I/O or memory
+resources to be emulated. Default limit of ranges that one rangeset
+can allocate is set to a small value, due to the fact that these ranges
+are allocated in xen heap. Yet for the write-protected ram ranges,
+there are circumstances under which the upper limit inside one rangeset
+should exceed the default one. E.g. in XenGT, when tracking the per-
+process graphic translation tables on intel broadwell platforms, the
+number of page tables concerned will be several thousand(normally
+in this case, 8192 could be a big enough value). Not configuring this
+item, or setting its value to 0 will result in the upper limit set
+to its default one. Users who set his item explicitly are supposed
+to know the specific scenarios that necessitate this configuration.
+
 =back
 
 =head2 Paravirtualised (PV) Guest Specific Options
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 156c0d5..6698d72 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -136,6 +136,11 @@ 
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
 /*
+ * libxl_domain_build_info has the u.hvm.max_wp_ram_ranges field.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_MAX_WP_RAM_RANGES 1
+
+/*
  * libxl_domain_build_info has the u.hvm.ms_vm_genid field.
  */
 #define LIBXL_HAVE_BUILDINFO_HVM_MS_VM_GENID 1
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 2269998..54173cb 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -288,6 +288,9 @@  static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
                     libxl_defbool_val(info->u.hvm.nested_hvm));
     xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2M,
                     libxl_defbool_val(info->u.hvm.altp2m));
+    if (info->u.hvm.max_wp_ram_ranges > 0)
+        xc_hvm_param_set(handle, domid, HVM_PARAM_MAX_WP_RAM_RANGES,
+                        info->u.hvm.max_wp_ram_ranges);
 }
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..c7d7b5f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -518,6 +518,7 @@  libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("max_wp_ram_ranges", uint32),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..8bb7cc7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1626,6 +1626,10 @@  static void parse_config_data(const char *config_source,
 
         if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
             b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+        if (!xlu_cfg_get_long (config, "max_wp_ram_ranges", &l, 0))
+            b_info->u.hvm.max_wp_ram_ranges = l;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 53d38e7..fd2b697 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -940,6 +940,10 @@  static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
 {
     unsigned int i;
     int rc;
+    unsigned int max_wp_ram_ranges =
+        ( s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] > 0 ) ?
+        s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES] :
+        MAX_NR_IO_RANGES;
 
     if ( is_default )
         goto done;
@@ -962,7 +966,10 @@  static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         if ( !s->range[i] )
             goto fail;
 
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+        if ( i == HVMOP_IO_RANGE_WP_MEM )
+            rangeset_limit(s->range[i], max_wp_ram_ranges);
+        else
+            rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
     }
 
  done:
@@ -6009,6 +6016,7 @@  static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_IOREQ_SERVER_PFN:
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
+    case HVM_PARAM_MAX_WP_RAM_RANGES:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 81f9451..ab3b11d 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -210,6 +210,9 @@ 
 /* Boolean: Enable altp2m */
 #define HVM_PARAM_ALTP2M       35
 
-#define HVM_NR_PARAMS          36
+/* Max write-protected ram ranges to be tracked in one ioreq server rangeset */
+#define HVM_PARAM_MAX_WP_RAM_RANGES  36
+
+#define HVM_NR_PARAMS          37
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */