diff mbox series

[v2,13/48] xen/sched: add is_running indicator to struct sched_unit

Message ID 20190809145833.1020-14-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: add core scheduling support | expand

Commit Message

Jürgen Groß Aug. 9, 2019, 2:57 p.m. UTC
Add an is_running indicator to struct sched_unit which will be set
whenever the unit is being scheduled. Switch scheduler code to use
unit->is_running instead of vcpu->is_running for scheduling decisions.

At the same time introduce a state_entry_time field in struct
sched_unit being updated whenever the is_running indicator is changed.
Use that new field in the schedulers instead of the similar vcpu field.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: fix arm build, don't drop v->is_running
---
 xen/common/sched_credit.c  | 12 +++++++-----
 xen/common/sched_credit2.c | 18 +++++++++---------
 xen/common/sched_rt.c      |  2 +-
 xen/common/schedule.c      | 15 +++++++++++----
 xen/include/xen/sched.h    |  5 +++++
 5 files changed, 33 insertions(+), 19 deletions(-)

Comments

Jan Beulich Sept. 4, 2019, 3:06 p.m. UTC | #1
On 09.08.2019 16:57, Juergen Gross wrote:
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>      {
>          get_sched_res(v->processor)->curr = unit;
>          v->is_running = 1;
> +        unit->is_running = 1;
> +        unit->state_entry_time = NOW();
>      }
>      else
>      {

Are idle vCPU-s also going to get grouped into units (I'm sorry,
I don't recall)? If so, just like further down I'd be putting
under question the multiple writing of the field. I'd kind of
expect the unit and all vCPU-s within it to get an identical
state entry time stored.

Also both here and further down I'm puzzled to see that the
unit's field doesn't get set at the same place in code where
the vCPU's field gets set.

> @@ -1663,8 +1666,10 @@ static void schedule(void)
>       * switch, else lost_records resume will not work properly.
>       */
>  
> -    ASSERT(!next->is_running);
> +    ASSERT(!next->sched_unit->is_running);
>      next->is_running = 1;
> +    next->sched_unit->is_running = 1;
> +    next->sched_unit->state_entry_time = now;

Isn't the ASSERT() you delete still supposed to be true? In which case
wouldn't it be worthwhile to retain it?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -272,6 +272,11 @@ struct sched_unit {
>      struct sched_resource *res;
>      int                    unit_id;
>  
> +    /* Last time unit got (de-)scheduled. */
> +    uint64_t               state_entry_time;
> +
> +    /* Currently running on a CPU? */
> +    bool                   is_running;
>      /* Does soft affinity actually play a role (given hard affinity)? */
>      bool                   soft_aff_effective;
>      /* Bitmask of CPUs on which this VCPU may run. */

I'm noticing this here, but it may well have been an issue earlier
already (and there may well be later adjustments invalidating my
remark, and of course it's the end result of this series which
matters in the long run): Could you see about adding/removing
fields to this struct (and generally of course also others)
minimizing the number / overall size of holes?

Jan
Jürgen Groß Sept. 11, 2019, 1:44 p.m. UTC | #2
On 04.09.19 17:06, Jan Beulich wrote:
> On 09.08.2019 16:57, Juergen Gross wrote:
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>       {
>>           get_sched_res(v->processor)->curr = unit;
>>           v->is_running = 1;
>> +        unit->is_running = 1;
>> +        unit->state_entry_time = NOW();
>>       }
>>       else
>>       {
> 
> Are idle vCPU-s also going to get grouped into units (I'm sorry,
> I don't recall)? If so, just like further down I'd be putting
> under question the multiple writing of the field. I'd kind of
> expect the unit and all vCPU-s within it to get an identical
> state entry time stored.

When an idle vcpu is being created its cpu is in no cpupool yet
(a free cpu). Those cpus are always subject to cpu scheduling
(granularity 1). Only when cpus are put into a cpupool the
granularity is adjusted accordingly and the idle-vcpus are
possibly grouped in units (see patch 45).

Regarding the state entry time: different vcpus of a unit can be
in different states (blocked/running, etc.), so their state entry
times will generally differ. A unit's state entry time will
reflect its scheduling events (being scheduled/descheduled).

> Also both here and further down I'm puzzled to see that the
> unit's field doesn't get set at the same place in code where
> the vCPU's field gets set.

Right. The states of a vcpu and the unit it is belonging to are
coupled, but not identical.

> 
>> @@ -1663,8 +1666,10 @@ static void schedule(void)
>>        * switch, else lost_records resume will not work properly.
>>        */
>>   
>> -    ASSERT(!next->is_running);
>> +    ASSERT(!next->sched_unit->is_running);
>>       next->is_running = 1;
>> +    next->sched_unit->is_running = 1;
>> +    next->sched_unit->state_entry_time = now;
> 
> Isn't the ASSERT() you delete still supposed to be true? In which case
> wouldn't it be worthwhile to retain it?

No, it is not true any longer. All but one vcpus of a unit can be
blocked resulting in the related idle vcpus to be running. This means
that even with one unit being replaced by another one the vcpu can be
the same.

> 
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -272,6 +272,11 @@ struct sched_unit {
>>       struct sched_resource *res;
>>       int                    unit_id;
>>   
>> +    /* Last time unit got (de-)scheduled. */
>> +    uint64_t               state_entry_time;
>> +
>> +    /* Currently running on a CPU? */
>> +    bool                   is_running;
>>       /* Does soft affinity actually play a role (given hard affinity)? */
>>       bool                   soft_aff_effective;
>>       /* Bitmask of CPUs on which this VCPU may run. */
> 
> I'm noticing this here, but it may well have been an issue earlier
> already (and there may well be later adjustments invalidating my
> remark, and of course it's the end result of this series which
> matters in the long run): Could you see about adding/removing
> fields to this struct (and generally of course also others)
> minimizing the number / overall size of holes?

Hmm, yes.


Juergen
Jan Beulich Sept. 11, 2019, 3:06 p.m. UTC | #3
On 11.09.2019 15:44, Juergen Gross wrote:
> On 04.09.19 17:06, Jan Beulich wrote:
>> On 09.08.2019 16:57, Juergen Gross wrote:
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>>       {
>>>           get_sched_res(v->processor)->curr = unit;
>>>           v->is_running = 1;
>>> +        unit->is_running = 1;
>>> +        unit->state_entry_time = NOW();
>>>       }
>>>       else
>>>       {
>>
>> Are idle vCPU-s also going to get grouped into units (I'm sorry,
>> I don't recall)? If so, just like further down I'd be putting
>> under question the multiple writing of the field. I'd kind of
>> expect the unit and all vCPU-s within it to get an identical
>> state entry time stored.
> 
> When an idle vcpu is being created its cpu is in no cpupool yet
> (a free cpu). Those cpus are always subject to cpu scheduling
> (granularity 1). Only when cpus are put into a cpupool the
> granularity is adjusted accordingly and the idle-vcpus are
> possibly grouped in units (see patch 45).
> 
> Regarding the state entry time: different vcpus of a unit can be
> in different states (blocked/running, etc.), so their state entry
> times will generally differ. A unit's state entry time will
> reflect its scheduling events (being scheduled/descheduled).

But this doesn't (to me) clarify whether the multiple writing
(once per vCPU) is actually correct. Obviously whether this is
merely cosmetic depends on what the consumers of this data are.

>> Also both here and further down I'm puzzled to see that the
>> unit's field doesn't get set at the same place in code where
>> the vCPU's field gets set.
> 
> Right. The states of a vcpu and the unit it is belonging to are
> coupled, but not identical.

And this is not (going to be) a problem?

Jan
Jürgen Groß Sept. 11, 2019, 3:32 p.m. UTC | #4
On 11.09.19 17:06, Jan Beulich wrote:
> On 11.09.2019 15:44, Juergen Gross wrote:
>> On 04.09.19 17:06, Jan Beulich wrote:
>>> On 09.08.2019 16:57, Juergen Gross wrote:
>>>> --- a/xen/common/schedule.c
>>>> +++ b/xen/common/schedule.c
>>>> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>>>>        {
>>>>            get_sched_res(v->processor)->curr = unit;
>>>>            v->is_running = 1;
>>>> +        unit->is_running = 1;
>>>> +        unit->state_entry_time = NOW();
>>>>        }
>>>>        else
>>>>        {
>>>
>>> Are idle vCPU-s also going to get grouped into units (I'm sorry,
>>> I don't recall)? If so, just like further down I'd be putting
>>> under question the multiple writing of the field. I'd kind of
>>> expect the unit and all vCPU-s within it to get an identical
>>> state entry time stored.
>>
>> When an idle vcpu is being created its cpu is in no cpupool yet
>> (a free cpu). Those cpus are always subject to cpu scheduling
>> (granularity 1). Only when cpus are put into a cpupool the
>> granularity is adjusted accordingly and the idle-vcpus are
>> possibly grouped in units (see patch 45).
>>
>> Regarding the state entry time: different vcpus of a unit can be
>> in different states (blocked/running, etc.), so their state entry
>> times will generally differ. A unit's state entry time will
>> reflect its scheduling events (being scheduled/descheduled).
> 
> But this doesn't (to me) clarify whether the multiple writing
> (once per vCPU) is actually correct. Obviously whether this is
> merely cosmetic depends on what the consumers of this data are.

There is no multiple writing. This is a preparatory patch for being
able to remove the vcpu dependencies from the single schedulers.

Later when multiple vcpus per unit are introduced there will be a
clear distinction between vcpu switching (context switches) and
sched_unit switching (scheduling events). vcpu data will be adjusted
at context_switch time, while unit data will be adjusted only when
scheduling units.

> 
>>> Also both here and further down I'm puzzled to see that the
>>> unit's field doesn't get set at the same place in code where
>>> the vCPU's field gets set.
>>
>> Right. The states of a vcpu and the unit it is belonging to are
>> coupled, but not identical.
> 
> And this is not (going to be) a problem?

With a proper distinction as explained above, no.


Juergen
diff mbox series

Patch

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 339702f3ec..143c7543e8 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -709,7 +709,7 @@  __csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc,
      * The caller is supposed to have already checked that vc is also
      * not running.
      */
-    ASSERT(!vc->is_running);
+    ASSERT(!vc->sched_unit->is_running);
 
     return !__csched_vcpu_is_cache_hot(prv, svc) &&
            cpumask_test_cpu(dest_cpu, mask);
@@ -1033,7 +1033,8 @@  csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
 
     lock = unit_schedule_lock_irq(unit);
 
-    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
+    if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) &&
+         !vc->sched_unit->is_running )
         runq_insert(svc);
 
     unit_schedule_unlock_irq(lock, unit);
@@ -1646,8 +1647,9 @@  csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * vCPUs with useful soft affinities in some sort of bitmap
          * or counter.
          */
-        if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc->sched_unit)) )
+        if ( vc->sched_unit->is_running ||
+             (balance_step == BALANCE_SOFT_AFFINITY &&
+              !has_soft_affinity(vc->sched_unit)) )
             continue;
 
         affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
@@ -1855,7 +1857,7 @@  csched_schedule(
                     (unsigned char *)&d);
     }
 
-    runtime = now - current->runstate.state_entry_time;
+    runtime = now - current->sched_unit->state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ed83381ddb..5487d087e1 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1284,7 +1284,7 @@  runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
 
     ASSERT(&svc->rqd->runq == runq);
     ASSERT(!is_idle_vcpu(svc->vcpu));
-    ASSERT(!svc->vcpu->is_running);
+    ASSERT(!svc->vcpu->sched_unit->is_running);
     ASSERT(!(svc->flags & CSFLAG_scheduled));
 
     list_for_each( iter, runq )
@@ -1341,8 +1341,8 @@  static inline bool is_preemptable(const struct csched2_unit *svc,
     if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE )
         return true;
 
-    ASSERT(svc->vcpu->is_running);
-    return now - svc->vcpu->runstate.state_entry_time >
+    ASSERT(svc->vcpu->sched_unit->is_running);
+    return now - svc->vcpu->sched_unit->state_entry_time >
            ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
 }
 
@@ -2932,7 +2932,7 @@  csched2_dom_cntl(
                 {
                     svc = csched2_unit(v->sched_unit);
                     lock = unit_schedule_lock(svc->vcpu->sched_unit);
-                    if ( v->is_running )
+                    if ( v->sched_unit->is_running )
                     {
                         unsigned int cpu = v->processor;
                         struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
@@ -3205,8 +3205,8 @@  csched2_runtime(const struct scheduler *ops, int cpu,
     if ( prv->ratelimit_us )
     {
         s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us);
-        if ( snext->vcpu->is_running )
-            ratelimit_min = snext->vcpu->runstate.state_entry_time +
+        if ( snext->vcpu->sched_unit->is_running )
+            ratelimit_min = snext->vcpu->sched_unit->state_entry_time +
                             MICROSECS(prv->ratelimit_us) - now;
         if ( ratelimit_min > min_time )
             min_time = ratelimit_min;
@@ -3303,7 +3303,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
      * no point forcing it to do so until rate limiting expires.
      */
     if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->runstate.state_entry_time) <
+         (now - scurr->vcpu->sched_unit->state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3314,7 +3314,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
             } d;
             d.dom = scurr->vcpu->domain->domain_id;
             d.vcpu = scurr->vcpu->vcpu_id;
-            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            d.runtime = now - scurr->vcpu->sched_unit->state_entry_time;
             __trace_var(TRC_CSCHED2_RATELIMIT, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -3562,7 +3562,7 @@  csched2_schedule(
         if ( snext != scurr )
         {
             ASSERT(snext->rqd == rqd);
-            ASSERT(!snext->vcpu->is_running);
+            ASSERT(!snext->vcpu->sched_unit->is_running);
 
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index b662924483..df1a874c97 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -906,7 +906,7 @@  rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit)
     {
         replq_insert(ops, svc);
 
-        if ( !vc->is_running )
+        if ( !unit->is_running )
             runq_insert(ops, svc);
     }
     unit_schedule_unlock_irq(lock, unit);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 8cbb6320c6..03119af25c 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -407,6 +407,8 @@  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     {
         get_sched_res(v->processor)->curr = unit;
         v->is_running = 1;
+        unit->is_running = 1;
+        unit->state_entry_time = NOW();
     }
     else
     {
@@ -727,7 +729,8 @@  static void vcpu_migrate_finish(struct vcpu *v)
      * context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
-    if ( v->is_running || !test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( v->sched_unit->is_running ||
+         !test_bit(_VPF_migrating, &v->pause_flags) )
         return;
 
     old_cpu = new_cpu = v->processor;
@@ -781,7 +784,7 @@  static void vcpu_migrate_finish(struct vcpu *v)
      * because they both happen in (different) spinlock regions, and those
      * regions are strictly serialised.
      */
-    if ( v->is_running ||
+    if ( v->sched_unit->is_running ||
          !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
     {
         sched_spin_unlock_double(old_lock, new_lock, flags);
@@ -809,7 +812,7 @@  void vcpu_force_reschedule(struct vcpu *v)
 {
     spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit);
 
-    if ( v->is_running )
+    if ( v->sched_unit->is_running )
         vcpu_migrate_start(v);
 
     unit_schedule_unlock_irq(lock, v->sched_unit);
@@ -1663,8 +1666,10 @@  static void schedule(void)
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->is_running);
+    ASSERT(!next->sched_unit->is_running);
     next->is_running = 1;
+    next->sched_unit->is_running = 1;
+    next->sched_unit->state_entry_time = now;
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
@@ -1686,6 +1691,8 @@  void context_saved(struct vcpu *prev)
     smp_wmb();
 
     prev->is_running = 0;
+    prev->sched_unit->is_running = 0;
+    prev->sched_unit->state_entry_time = NOW();
 
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index fa2dbf47ee..f506c0cbd4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,6 +272,11 @@  struct sched_unit {
     struct sched_resource *res;
     int                    unit_id;
 
+    /* Last time unit got (de-)scheduled. */
+    uint64_t               state_entry_time;
+
+    /* Currently running on a CPU? */
+    bool                   is_running;
     /* Does soft affinity actually play a role (given hard affinity)? */
     bool                   soft_aff_effective;
     /* Bitmask of CPUs on which this VCPU may run. */