diff mbox

[v5,for,Xen,4.7,1/4] xen: enable per-VCPU parameter settings for RTDS scheduler

Message ID 1454626244-5511-2-git-send-email-lichong659@gmail.com
State New, archived
Headers show

Commit Message

Chong Li Feb. 4, 2016, 10:50 p.m. UTC
Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
to independently get and set the scheduling parameters of each
vCPU of a domain

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>

---
Changes on PATCH v4:
1) Add uint32_t vcpu_index to struct xen_domctl_scheduler_op.
When processing XEN_DOMCTL_SCHEDOP_get/putvcpuinfo, we call
hypercall_preemption_check in case the current hypercall lasts
too long. If we decide to preempt the current hypercall, we record
the index of the most-recent finished vcpu into the vcpu_index of
struct xen_domctl_scheduler_op. So when we resume the hypercall after
preemption, we start processing from the posion specified by vcpu_index,
and don't need to repeat the work that has already been done in the
hypercall before the preemption.
(This design is based on the do_grant_table_op() in grant_table.c)

2) Coding style changes

Changes on PATCH v3:
1) Remove struct xen_domctl_schedparam_t.

2) Change struct xen_domctl_scheduler_op.

3) Check if period/budget is within a validated range

Changes on PATCH v2:
1) Change struct xen_domctl_scheduler_op, for transferring per-vcpu parameters
between libxc and hypervisor.

2) Handler of XEN_DOMCTL_SCHEDOP_getinfo now just returns the default budget and
period values of RTDS scheduler.

3) Handler of XEN_DOMCTL_SCHEDOP_getvcpuinfo now can return a random subset of
the parameters of the VCPUs of a specific domain

CC: <dario.faggioli@citrix.com>
CC: <george.dunlap@eu.citrix.com>
CC: <dgolomb@seas.upenn.edu>
CC: <mengxu@cis.upenn.edu>
CC: <jbeulich@suse.com>
CC: <lichong659@gmail.com>
---
 xen/common/domctl.c         |   5 ++
 xen/common/sched_credit.c   |   4 ++
 xen/common/sched_credit2.c  |   4 ++
 xen/common/sched_rt.c       | 127 ++++++++++++++++++++++++++++++++++++++------
 xen/common/schedule.c       |  15 ++++--
 xen/include/public/domctl.h |  57 +++++++++++++++-----
 6 files changed, 180 insertions(+), 32 deletions(-)

Comments

Dario Faggioli Feb. 9, 2016, 6:17 p.m. UTC | #1
On Thu, 2016-02-04 at 16:50 -0600, Chong Li wrote:
> Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
> to independently get and set the scheduling parameters of each
> vCPU of a domain
> 
> Signed-off-by: Chong Li <chong.li@wustl.edu>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Sisu Xi <xisisu@gmail.com>
> 
> ---
> Changes on PATCH v4:
> 1) Add uint32_t vcpu_index to struct xen_domctl_scheduler_op.
> When processing XEN_DOMCTL_SCHEDOP_get/putvcpuinfo, we call
> hypercall_preemption_check in case the current hypercall lasts
> too long. If we decide to preempt the current hypercall, we record
> the index of the most-recent finished vcpu into the vcpu_index of
> struct xen_domctl_scheduler_op. So when we resume the hypercall after
> preemption, we start processing from the posion specified by
> vcpu_index,
> and don't need to repeat the work that has already been done in the
> hypercall before the preemption.
> (This design is based on the do_grant_table_op() in grant_table.c)
> 
Ok.

This now looks like it could be fine now. However, I remember asking
you to look at how XEN_SYSCTL_pcitopoinfo is handled, because I thought
that way of doing things is a better fit for this usecase.

So, I've got to ask to check that and give it a try implementing things
that way.

In fact, apart from being a better suited arrangement of the code, it
should, if I'm not mistaken, allow you to avoid having to introduce the
vcpu_index field in the domctl struct.

> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 46b967e..b294221 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -847,9 +847,14 @@ long
> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      }
>  
>      case XEN_DOMCTL_scheduler_op:
> +    {
>          ret = sched_adjust(d, &op->u.scheduler_op);
> +        if ( ret == -ERESTART )
> +            ret = hypercall_create_continuation(
> +                __HYPERVISOR_domctl, "h", u_domctl);
>          copyback = 1;
>          break;
> +    }
> 
As said, have a look at the XEN_SYSCTL_pcitopoinfo case in do_sysctl(),
and at xc_pcitopoinfo().

That basically works by re-issueing the hypercall in libxc, rather than
on create_continuation.

Elaborating a bit more, I think that is a better fit in this case
because:
 - like in that case, you know how many vcpus you want to act upon in 
   libxc already, and you can take advantage of that in there;
 - it makes the xc_ call do something useful;
 - it avoids (again, if I'm not wrong) having to clobber the Xen
   interface with a useless and weirdly-looking vcpu_index field;
 - it makes it very easy to process a bunch of vcpus, and then check
   for preemption, instead of checking for that at each iteration
   (of course, this can be implemented with the 'standard' contination
   approach as well, but again, I like how it's done there).

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 3f1d047..34ae48d 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -86,8 +86,21 @@
>  #define RTDS_DEFAULT_PERIOD     (MICROSECS(10000))
>  #define RTDS_DEFAULT_BUDGET     (MICROSECS(4000))
>  
> +/*
> + * Max period: max delta of time type
> + * Min period: 100 us, considering the scheduling overhead
> + */
>
Comments like these are not super useful. You need to explain why you
are using the maximum delta ("because period is added to the time a
vcpu activates, so this must not overflow etc..."), and express a
little bit better what you mean with "considering the scheduling
overhead".

Moreover...

> +#define RTDS_MAX_PERIOD     (STIME_DELTA_MAX)
> +#define RTDS_MIN_PERIOD     (MICROSECS(10))
> +
Comment says 100us, but then it's 10?

> +/*
> + * Min budget: 100 us
> + */
> +#define RTDS_MIN_BUDGET     (MICROSECS(10))
> +
As above, why? (scheduling overhead again, I would say).

>  #define UPDATE_LIMIT_SHIFT      10
>  #define MAX_SCHEDULE            (MILLISECS(1))
> +
>
Spurious blank line addition.

>  /*
>   * Flags
>   */

> @@ -1163,6 +1168,94 @@ rt_dom_cntl(
>          }
>          spin_unlock_irqrestore(&prv->lock, flags);
>          break;
> +    case XEN_DOMCTL_SCHEDOP_getvcpuinfo: 
> +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
> index++ )
> +        {
> +            spin_lock_irqsave(&prv->lock, flags);
> +            if ( copy_from_guest_offset(&local_sched,
> +                          op->u.v.vcpus, index, 1) )
>
vcpus is an handle, and it needs to be validated with
guest_handle_is_null().

> +            {
> +                rc = -EFAULT;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
> +            }
> +            if ( local_sched.vcpuid >= d->max_vcpus ||
> +                          d->vcpu[local_sched.vcpuid] == NULL )
> +            {
> +                rc = -EINVAL;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
> +            }
> +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
> +
> +            local_sched.s.rtds.budget = svc->budget / MICROSECS(1);
> +            local_sched.s.rtds.period = svc->period / MICROSECS(1);
> +
> +            if ( __copy_to_guest_offset(op->u.v.vcpus, index,
> +                    &local_sched, 1) )
> +            {
> +                rc = -EFAULT;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
>
In any case, i.e., no matter whether the code keeps using this
approach, or becomes more similar to XEN_SYSCTL_pcitopoinfo, can these
failing paths be factored instead than repeated?

> +            }
> +            spin_unlock_irqrestore(&prv->lock, flags);
> +            if ( hypercall_preempt_check() )
> +            {
> +                op->u.v.vcpu_index = index + 1;
> +                /* hypercall (after preemption) will continue at
> vcpu_index */
> +                rc = -ERESTART;
> +                break;
> +            }
> +        }
> +        break;
> +    case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
>
All what I've said about adopting the XEN_SYSCTL_pcitopoinfo, handle
validation and (attempting to) factoring the error paths applies to
both get and put.

> +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
> index++ )
> +        {
> +            spin_lock_irqsave(&prv->lock, flags);
> +            if ( copy_from_guest_offset(&local_sched,
> +                          op->u.v.vcpus, index, 1) )
> +            {
> +                rc = -EFAULT;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
> +            }
> +            if ( local_sched.vcpuid >= d->max_vcpus ||
> +                          d->vcpu[local_sched.vcpuid] == NULL )
> +            {
> +                rc = -EINVAL;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
> +            }
> +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
> +            period = MICROSECS(local_sched.s.rtds.period);
> +            budget = MICROSECS(local_sched.s.rtds.budget);
> +            if ( period > RTDS_MAX_PERIOD || budget <
> RTDS_MIN_BUDGET ||
> +                          budget > period )
>
Isn't checking against RTDS_MIN_PERIOD missing?

> +            {
> +                rc = -EINVAL;
> +                spin_unlock_irqrestore(&prv->lock, flags);
> +                break;
> +            }
> +
> +            /* 
> +             * We accept period/budget less than 100 us, but will
> warn users about
> +             * the large scheduling overhead due to it
> +             */
> +            if ( period < MICROSECS(100) || budget < MICROSECS(100)
> )
> +                printk("Warning: period/budget less than 100 micro-
> secs "
> +                       "results in large scheduling overhead.\n");
> +
"WARNING: period or budget set to less than 100us.\n"
" This may result in high scheduling overhead"

Or something like this. But try to make the two lines as independent as
possible. In general, we try not to break these messages, so that
grepping the source code for any substring of the will work. In this
case, we indeed need to break this, but we still keep grep-ability in
mind.

See, for example, sched_credit2:2147.

> +            svc->period = period;
> +            svc->budget = budget;
> +            spin_unlock_irqrestore(&prv->lock, flags);
> +            if ( hypercall_preempt_check() )
> +            {
> +                op->u.v.vcpu_index = index + 1;
> +                rc = -ERESTART;
> +                break;
> +            }
> +        }
> +        break;
>      }
>  
>      return rc;
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index c195129..f4a4032 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1148,10 +1148,19 @@ long sched_adjust(struct domain *d, struct
> xen_domctl_scheduler_op *op)
>      if ( ret )
>          return ret;
>  
> -    if ( (op->sched_id != DOM2OP(d)->sched_id) ||
> -         ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
> -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
> +    if ( op->sched_id != DOM2OP(d)->sched_id )
>          return -EINVAL;
> +    else
> +        switch ( op->cmd )
> +        {
> +        case XEN_DOMCTL_SCHEDOP_putinfo:
> +        case XEN_DOMCTL_SCHEDOP_getinfo:
> +        case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
> +        case XEN_DOMCTL_SCHEDOP_getvcpuinfo:
> +            break;
> +        default:
> +            return -EINVAL;
> +        }
>  
Maybe I'm misremembering (in which case, sorry), was using a switch
like this instead of an if suggested during one of the previous rounds?

> diff --git a/xen/include/public/domctl.h
> b/xen/include/public/domctl.h
> index 7a56b3f..6f429ec 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -338,24 +338,57 @@
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>  #define XEN_SCHEDULER_ARINC653 7
>  #define XEN_SCHEDULER_RTDS     8
>  
> -/* Set or get info? */
> +typedef struct xen_domctl_sched_credit {
> +    uint16_t weight;
> +    uint16_t cap;
> +} xen_domctl_sched_credit_t;
> +
> +typedef struct xen_domctl_sched_credit2 {
> +    uint16_t weight;
> +} xen_domctl_sched_credit2_t;
> +
> +typedef struct xen_domctl_sched_rtds {
> +    uint32_t period;
> +    uint32_t budget;
> +} xen_domctl_sched_rtds_t;
> +
> +typedef struct xen_domctl_schedparam_vcpu {
> +    union {
> +        xen_domctl_sched_credit_t credit;
> +        xen_domctl_sched_credit2_t credit2;
> +        xen_domctl_sched_rtds_t rtds;
> +    } s;
> +    uint16_t vcpuid;
> +    uint16_t padding[3];
> +} xen_domctl_schedparam_vcpu_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t);
> +
> +/* Set or get info?
>
Comment style (opening '/*').

> + * For schedulers supporting per-vcpu settings (e.g., RTDS):
> + * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
>
kill the "using", and maybe indent the three lines by 1 and/or put a
marker (like '-' or '+') to make this looks like a bulleted list.

> + * using XEN_DOMCTL_SCHEDOP_getinfo gets default params;
> + * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo sets (gets) params of
> vcpus;
>
I'd leave an empty line here (still within the comment).

> + * For schedulers not supporting per-vcpu settings:
> + * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
> + * using XEN_DOMCTL_SCHEDOP_getinfo gets domain-wise params;
>
The wording of these two items above should be the same.

> + * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo returns error code;
>
Just "returns error".

> + */
>  #define XEN_DOMCTL_SCHEDOP_putinfo 0
>  #define XEN_DOMCTL_SCHEDOP_getinfo 1
> +#define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2
> +#define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3
>  struct xen_domctl_scheduler_op {
>      uint32_t sched_id;  /* XEN_SCHEDULER_* */
>      uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
>      union {
> -        struct xen_domctl_sched_credit {
> -            uint16_t weight;
> -            uint16_t cap;
> -        } credit;
> -        struct xen_domctl_sched_credit2 {
> -            uint16_t weight;
> -        } credit2;
> -        struct xen_domctl_sched_rtds {
> -            uint32_t period;
> -            uint32_t budget;
> -        } rtds;
> +        xen_domctl_sched_credit_t credit;
> +        xen_domctl_sched_credit2_t credit2;
> +        xen_domctl_sched_rtds_t rtds;
> +        struct {
> +            XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus;
> +            uint32_t nr_vcpus;
> +            uint32_t vcpu_index;
> +        } v;
>      } u;
>  };
>  typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;

Thanks and Regards,
Dario
Chong Li March 1, 2016, 5:58 p.m. UTC | #2
On Tue, Feb 9, 2016 at 12:17 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Thu, 2016-02-04 at 16:50 -0600, Chong Li wrote:
>> Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
>> to independently get and set the scheduling parameters of each
>> vCPU of a domain
>>
>> Signed-off-by: Chong Li <chong.li@wustl.edu>
>> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
>> Signed-off-by: Sisu Xi <xisisu@gmail.com>
>>
>> ---
>> Changes on PATCH v4:
>> 1) Add uint32_t vcpu_index to struct xen_domctl_scheduler_op.
>> When processing XEN_DOMCTL_SCHEDOP_get/putvcpuinfo, we call
>> hypercall_preemption_check in case the current hypercall lasts
>> too long. If we decide to preempt the current hypercall, we record
>> the index of the most-recent finished vcpu into the vcpu_index of
>> struct xen_domctl_scheduler_op. So when we resume the hypercall after
>> preemption, we start processing from the posion specified by
>> vcpu_index,
>> and don't need to repeat the work that has already been done in the
>> hypercall before the preemption.
>> (This design is based on the do_grant_table_op() in grant_table.c)
>>

>

>> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
>> index 3f1d047..34ae48d 100644
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c

>
>> +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
>> index++ )
>> +        {
>> +            spin_lock_irqsave(&prv->lock, flags);
>> +            if ( copy_from_guest_offset(&local_sched,
>> +                          op->u.v.vcpus, index, 1) )
>> +            {
>> +                rc = -EFAULT;
>> +                spin_unlock_irqrestore(&prv->lock, flags);
>> +                break;
>> +            }
>> +            if ( local_sched.vcpuid >= d->max_vcpus ||
>> +                          d->vcpu[local_sched.vcpuid] == NULL )
>> +            {
>> +                rc = -EINVAL;
>> +                spin_unlock_irqrestore(&prv->lock, flags);
>> +                break;
>> +            }
>> +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
>> +            period = MICROSECS(local_sched.s.rtds.period);
>> +            budget = MICROSECS(local_sched.s.rtds.budget);
>> +            if ( period > RTDS_MAX_PERIOD || budget <
>> RTDS_MIN_BUDGET ||
>> +                          budget > period )
>>
> Isn't checking against RTDS_MIN_PERIOD missing?

Because RTDS_MIN_PERIOD==RTDS_MIN_BUDGET, by checking budget <
RTDS_MIN_BUDGET and budget > period, the checking against
RTDS_MIN_PERIOD is already covered.


>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index c195129..f4a4032 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1148,10 +1148,19 @@ long sched_adjust(struct domain *d, struct
>> xen_domctl_scheduler_op *op)
>>      if ( ret )
>>          return ret;
>>
>> -    if ( (op->sched_id != DOM2OP(d)->sched_id) ||
>> -         ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
>> -          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
>> +    if ( op->sched_id != DOM2OP(d)->sched_id )
>>          return -EINVAL;
>> +    else
>> +        switch ( op->cmd )
>> +        {
>> +        case XEN_DOMCTL_SCHEDOP_putinfo:
>> +        case XEN_DOMCTL_SCHEDOP_getinfo:
>> +        case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
>> +        case XEN_DOMCTL_SCHEDOP_getvcpuinfo:
>> +            break;
>> +        default:
>> +            return -EINVAL;
>> +        }
>>
> Maybe I'm misremembering (in which case, sorry), was using a switch
> like this instead of an if suggested during one of the previous rounds?
>
Yes. In patch v3, Jan suggested that.
Jan Beulich March 2, 2016, 1:36 p.m. UTC | #3
>>> On 01.03.16 at 18:58, <lichong659@gmail.com> wrote:
> On Tue, Feb 9, 2016 at 12:17 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
>> On Thu, 2016-02-04 at 16:50 -0600, Chong Li wrote:
>>> --- a/xen/common/sched_rt.c
>>> +++ b/xen/common/sched_rt.c
> 
>>
>>> +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
>>> index++ )
>>> +        {
>>> +            spin_lock_irqsave(&prv->lock, flags);
>>> +            if ( copy_from_guest_offset(&local_sched,
>>> +                          op->u.v.vcpus, index, 1) )
>>> +            {
>>> +                rc = -EFAULT;
>>> +                spin_unlock_irqrestore(&prv->lock, flags);
>>> +                break;
>>> +            }
>>> +            if ( local_sched.vcpuid >= d->max_vcpus ||
>>> +                          d->vcpu[local_sched.vcpuid] == NULL )
>>> +            {
>>> +                rc = -EINVAL;
>>> +                spin_unlock_irqrestore(&prv->lock, flags);
>>> +                break;
>>> +            }
>>> +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
>>> +            period = MICROSECS(local_sched.s.rtds.period);
>>> +            budget = MICROSECS(local_sched.s.rtds.budget);
>>> +            if ( period > RTDS_MAX_PERIOD || budget <
>>> RTDS_MIN_BUDGET ||
>>> +                          budget > period )
>>>
>> Isn't checking against RTDS_MIN_PERIOD missing?
> 
> Because RTDS_MIN_PERIOD==RTDS_MIN_BUDGET, by checking budget <
> RTDS_MIN_BUDGET and budget > period, the checking against
> RTDS_MIN_PERIOD is already covered.

If you make code dependent upon such value matches, the
dependency should be documented and enforced to be
noticed if broken by a BUILD_BUG_ON().

Jan
George Dunlap March 2, 2016, 2:06 p.m. UTC | #4
On 02/03/16 13:36, Jan Beulich wrote:
>>>> On 01.03.16 at 18:58, <lichong659@gmail.com> wrote:
>> On Tue, Feb 9, 2016 at 12:17 PM, Dario Faggioli
>> <dario.faggioli@citrix.com> wrote:
>>> On Thu, 2016-02-04 at 16:50 -0600, Chong Li wrote:
>>>> --- a/xen/common/sched_rt.c
>>>> +++ b/xen/common/sched_rt.c
>>
>>>
>>>> +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus;
>>>> index++ )
>>>> +        {
>>>> +            spin_lock_irqsave(&prv->lock, flags);
>>>> +            if ( copy_from_guest_offset(&local_sched,
>>>> +                          op->u.v.vcpus, index, 1) )
>>>> +            {
>>>> +                rc = -EFAULT;
>>>> +                spin_unlock_irqrestore(&prv->lock, flags);
>>>> +                break;
>>>> +            }
>>>> +            if ( local_sched.vcpuid >= d->max_vcpus ||
>>>> +                          d->vcpu[local_sched.vcpuid] == NULL )
>>>> +            {
>>>> +                rc = -EINVAL;
>>>> +                spin_unlock_irqrestore(&prv->lock, flags);
>>>> +                break;
>>>> +            }
>>>> +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
>>>> +            period = MICROSECS(local_sched.s.rtds.period);
>>>> +            budget = MICROSECS(local_sched.s.rtds.budget);
>>>> +            if ( period > RTDS_MAX_PERIOD || budget <
>>>> RTDS_MIN_BUDGET ||
>>>> +                          budget > period )
>>>>
>>> Isn't checking against RTDS_MIN_PERIOD missing?
>>
>> Because RTDS_MIN_PERIOD==RTDS_MIN_BUDGET, by checking budget <
>> RTDS_MIN_BUDGET and budget > period, the checking against
>> RTDS_MIN_PERIOD is already covered.
> 
> If you make code dependent upon such value matches, the
> dependency should be documented and enforced to be
> noticed if broken by a BUILD_BUG_ON().

To expand upon this:

Code changes.  At the moment RTDS_MIN_PERIOD == RTDS_MIN_BUDGET, but the
very fact that you have two different macros implies to anyone coming
along later that you can change one.  If someone does change one but not
the other, then that will create a bug in the program which will be very
difficult to detect.  It is likely not to be noticed during patch review
(since it probably won't change the code you're now introducing), and it
may not even be noticed in follow-up testing for some time.

After you've been bitten several times by this sort of bug, you learn to
be paranoid about this sort of thing (which is why Dario noticed it).

Two ways to proceed:

1. Don't assume RTDS_MIN_PERIOD == RTDS_MIN_BUDGET here, and add the
extra check Dario mentioned.

2. Assume RTDS_MIN_PERIOD == RTDS_MIN_BUDGET, and add something to the
code which will break the build if this is ever false (as Jan suggested).

 -George
diff mbox

Patch

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 46b967e..b294221 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -847,9 +847,14 @@  long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
 
     case XEN_DOMCTL_scheduler_op:
+    {
         ret = sched_adjust(d, &op->u.scheduler_op);
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(
+                __HYPERVISOR_domctl, "h", u_domctl);
         copyback = 1;
         break;
+    }
 
     case XEN_DOMCTL_getdomaininfo:
     {
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 0dce790..455c684 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1054,6 +1054,10 @@  csched_dom_cntl(
      * lock. Runq lock not needed anywhere in here. */
     spin_lock_irqsave(&prv->lock, flags);
 
+    if ( op->cmd == XEN_DOMCTL_SCHEDOP_putvcpuinfo ||
+         op->cmd == XEN_DOMCTL_SCHEDOP_getvcpuinfo )
+        return -EINVAL;
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit.weight = sdom->weight;
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3c49ffa..c3049a0 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1421,6 +1421,10 @@  csched2_dom_cntl(
      * runq lock to update csvcs. */
     spin_lock_irqsave(&prv->lock, flags);
 
+    if ( op->cmd == XEN_DOMCTL_SCHEDOP_putvcpuinfo ||
+         op->cmd == XEN_DOMCTL_SCHEDOP_getvcpuinfo )
+        return -EINVAL;
+
     if ( op->cmd == XEN_DOMCTL_SCHEDOP_getinfo )
     {
         op->u.credit2.weight = sdom->weight;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 3f1d047..34ae48d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -86,8 +86,21 @@ 
 #define RTDS_DEFAULT_PERIOD     (MICROSECS(10000))
 #define RTDS_DEFAULT_BUDGET     (MICROSECS(4000))
 
+/*
+ * Max period: max delta of time type
+ * Min period: 100 us, considering the scheduling overhead
+ */
+#define RTDS_MAX_PERIOD     (STIME_DELTA_MAX)
+#define RTDS_MIN_PERIOD     (MICROSECS(10))
+
+/*
+ * Min budget: 100 us
+ */
+#define RTDS_MIN_BUDGET     (MICROSECS(10))
+
 #define UPDATE_LIMIT_SHIFT      10
 #define MAX_SCHEDULE            (MILLISECS(1))
+
 /*
  * Flags
  */
@@ -1129,26 +1142,18 @@  rt_dom_cntl(
     struct vcpu *v;
     unsigned long flags;
     int rc = 0;
-
+    xen_domctl_schedparam_vcpu_t local_sched;
+    s_time_t period, budget;
+    uint32_t index;
     switch ( op->cmd )
     {
-    case XEN_DOMCTL_SCHEDOP_getinfo:
-        if ( d->max_vcpus > 0 )
-        {
-            spin_lock_irqsave(&prv->lock, flags);
-            svc = rt_vcpu(d->vcpu[0]);
-            op->u.rtds.period = svc->period / MICROSECS(1);
-            op->u.rtds.budget = svc->budget / MICROSECS(1);
-            spin_unlock_irqrestore(&prv->lock, flags);
-        }
-        else
-        {
-            /* If we don't have vcpus yet, let's just return the defaults. */
-            op->u.rtds.period = RTDS_DEFAULT_PERIOD;
-            op->u.rtds.budget = RTDS_DEFAULT_BUDGET;
-        }
+    case XEN_DOMCTL_SCHEDOP_getinfo: /* return the default parameters */
+        spin_lock_irqsave(&prv->lock, flags);
+        op->u.rtds.period = RTDS_DEFAULT_PERIOD / MICROSECS(1); /* transfer to us */
+        op->u.rtds.budget = RTDS_DEFAULT_BUDGET / MICROSECS(1);
+        spin_unlock_irqrestore(&prv->lock, flags);
         break;
-    case XEN_DOMCTL_SCHEDOP_putinfo:
+    case XEN_DOMCTL_SCHEDOP_putinfo: /* set parameters for all vcpus */
         if ( op->u.rtds.period == 0 || op->u.rtds.budget == 0 )
         {
             rc = -EINVAL;
@@ -1163,6 +1168,94 @@  rt_dom_cntl(
         }
         spin_unlock_irqrestore(&prv->lock, flags);
         break;
+    case XEN_DOMCTL_SCHEDOP_getvcpuinfo: 
+        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ )
+        {
+            spin_lock_irqsave(&prv->lock, flags);
+            if ( copy_from_guest_offset(&local_sched,
+                          op->u.v.vcpus, index, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            if ( local_sched.vcpuid >= d->max_vcpus ||
+                          d->vcpu[local_sched.vcpuid] == NULL )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+
+            local_sched.s.rtds.budget = svc->budget / MICROSECS(1);
+            local_sched.s.rtds.period = svc->period / MICROSECS(1);
+
+            if ( __copy_to_guest_offset(op->u.v.vcpus, index,
+                    &local_sched, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            spin_unlock_irqrestore(&prv->lock, flags);
+            if ( hypercall_preempt_check() )
+            {
+                op->u.v.vcpu_index = index + 1;
+                /* hypercall (after preemption) will continue at vcpu_index */
+                rc = -ERESTART;
+                break;
+            }
+        }
+        break;
+    case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
+        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ )
+        {
+            spin_lock_irqsave(&prv->lock, flags);
+            if ( copy_from_guest_offset(&local_sched,
+                          op->u.v.vcpus, index, 1) )
+            {
+                rc = -EFAULT;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            if ( local_sched.vcpuid >= d->max_vcpus ||
+                          d->vcpu[local_sched.vcpuid] == NULL )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
+            period = MICROSECS(local_sched.s.rtds.period);
+            budget = MICROSECS(local_sched.s.rtds.budget);
+            if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET ||
+                          budget > period )
+            {
+                rc = -EINVAL;
+                spin_unlock_irqrestore(&prv->lock, flags);
+                break;
+            }
+
+            /* 
+             * We accept period/budget less than 100 us, but will warn users about
+             * the large scheduling overhead due to it
+             */
+            if ( period < MICROSECS(100) || budget < MICROSECS(100) )
+                printk("Warning: period/budget less than 100 micro-secs "
+                       "results in large scheduling overhead.\n");
+
+            svc->period = period;
+            svc->budget = budget;
+            spin_unlock_irqrestore(&prv->lock, flags);
+            if ( hypercall_preempt_check() )
+            {
+                op->u.v.vcpu_index = index + 1;
+                rc = -ERESTART;
+                break;
+            }
+        }
+        break;
     }
 
     return rc;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c195129..f4a4032 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1148,10 +1148,19 @@  long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op)
     if ( ret )
         return ret;
 
-    if ( (op->sched_id != DOM2OP(d)->sched_id) ||
-         ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) &&
-          (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) )
+    if ( op->sched_id != DOM2OP(d)->sched_id )
         return -EINVAL;
+    else
+        switch ( op->cmd )
+        {
+        case XEN_DOMCTL_SCHEDOP_putinfo:
+        case XEN_DOMCTL_SCHEDOP_getinfo:
+        case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
+        case XEN_DOMCTL_SCHEDOP_getvcpuinfo:
+            break;
+        default:
+            return -EINVAL;
+        }
 
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7a56b3f..6f429ec 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -338,24 +338,57 @@  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
 #define XEN_SCHEDULER_ARINC653 7
 #define XEN_SCHEDULER_RTDS     8
 
-/* Set or get info? */
+typedef struct xen_domctl_sched_credit {
+    uint16_t weight;
+    uint16_t cap;
+} xen_domctl_sched_credit_t;
+
+typedef struct xen_domctl_sched_credit2 {
+    uint16_t weight;
+} xen_domctl_sched_credit2_t;
+
+typedef struct xen_domctl_sched_rtds {
+    uint32_t period;
+    uint32_t budget;
+} xen_domctl_sched_rtds_t;
+
+typedef struct xen_domctl_schedparam_vcpu {
+    union {
+        xen_domctl_sched_credit_t credit;
+        xen_domctl_sched_credit2_t credit2;
+        xen_domctl_sched_rtds_t rtds;
+    } s;
+    uint16_t vcpuid;
+    uint16_t padding[3];
+} xen_domctl_schedparam_vcpu_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_schedparam_vcpu_t);
+
+/* Set or get info?
+ * For schedulers supporting per-vcpu settings (e.g., RTDS):
+ * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
+ * using XEN_DOMCTL_SCHEDOP_getinfo gets default params;
+ * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo sets (gets) params of vcpus;
+ * For schedulers not supporting per-vcpu settings:
+ * using XEN_DOMCTL_SCHEDOP_putinfo sets params for all vcpus;
+ * using XEN_DOMCTL_SCHEDOP_getinfo gets domain-wise params;
+ * using XEN_DOMCTL_SCHEDOP_put(get)vcpuinfo returns error code;
+ */
 #define XEN_DOMCTL_SCHEDOP_putinfo 0
 #define XEN_DOMCTL_SCHEDOP_getinfo 1
+#define XEN_DOMCTL_SCHEDOP_putvcpuinfo 2
+#define XEN_DOMCTL_SCHEDOP_getvcpuinfo 3
 struct xen_domctl_scheduler_op {
     uint32_t sched_id;  /* XEN_SCHEDULER_* */
     uint32_t cmd;       /* XEN_DOMCTL_SCHEDOP_* */
     union {
-        struct xen_domctl_sched_credit {
-            uint16_t weight;
-            uint16_t cap;
-        } credit;
-        struct xen_domctl_sched_credit2 {
-            uint16_t weight;
-        } credit2;
-        struct xen_domctl_sched_rtds {
-            uint32_t period;
-            uint32_t budget;
-        } rtds;
+        xen_domctl_sched_credit_t credit;
+        xen_domctl_sched_credit2_t credit2;
+        xen_domctl_sched_rtds_t rtds;
+        struct {
+            XEN_GUEST_HANDLE_64(xen_domctl_schedparam_vcpu_t) vcpus;
+            uint32_t nr_vcpus;
+            uint32_t vcpu_index;
+        } v;
     } u;
 };
 typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t;