diff mbox series

[v3,29/47] xen/sched: introduce unit_runnable_state()

Message ID 20190914085251.18816-30-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: add core scheduling support | expand

Commit Message

Jürgen Groß Sept. 14, 2019, 8:52 a.m. UTC
Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- new patch
V3:
- add vcpu loop to unit_runnable_state() right now instead of doing
  so in next patch (Jan Beulich, Dario Faggioli)
- make new_state unsigned int (Jan Beulich)
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++---------------------
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 23 +++++++++++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 66 insertions(+), 30 deletions(-)

Comments

Jan Beulich Sept. 23, 2019, 3:24 p.m. UTC | #1
On 14.09.2019 10:52, Juergen Gross wrote:
> Today the vcpu runstate of a new scheduled vcpu is always set to
> "running" even if at that time vcpu_runnable() is already returning
> false due to a race (e.g. with pausing the vcpu).

I can see this part, ...

> With core scheduling this can no longer work as not all vcpus of a
> schedule unit have to be "running" when being scheduled. So the vcpu's
> new runstate has to be selected at the same time as the runnability of
> the related schedule unit is probed.

... but I continue having trouble here. If it has been okay to set
a vCPU no longer runnable to "running" nevertheless, why would the
same not be true for schedule units? Part of the problem may be
that ...

> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -76,6 +76,29 @@ static inline bool unit_runnable(const struct sched_unit *unit)
>      return vcpu_runnable(unit->vcpu_list);

... this clearly still isn't doing the (I suppose) intended loop,
and hence ...

>  }
>  
> +static inline bool unit_runnable_state(const struct sched_unit *unit)
> +{
> +    struct vcpu *v;
> +    bool runnable, ret = false;
> +
> +    if ( is_idle_unit(unit) )
> +        return true;
> +
> +    for_each_sched_unit_vcpu ( unit, v )
> +    {
> +        runnable = vcpu_runnable(v);
> +
> +        v->new_state = runnable ? RUNSTATE_running
> +                                : (v->pause_flags & VPF_blocked)
> +                                  ? RUNSTATE_blocked : RUNSTATE_offline;
> +
> +        if ( runnable )
> +            ret = true;
> +    }
> +
> +    return ret;
> +}

... it's not obvious what the eventual difference between the two is
going to be.

Furthermore I think a function of the given name, returning bool, and
taking a pointer to const deserves a comment as to the (possibly
slightly unexpected) state change it does. This comment might then be
worthwhile to extend to also outline the usage difference between it
and its sibling above.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -174,6 +174,7 @@ struct vcpu
>          XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
>      } runstate_guest; /* guest address */
>  #endif
> +    unsigned int     new_state;

Similarly I think it would be nice for this field to gain a brief
comment as to its purpose compared to runstate.state.

Jan
Jürgen Groß Sept. 24, 2019, 2:25 p.m. UTC | #2
On 23.09.19 17:24, Jan Beulich wrote:
> On 14.09.2019 10:52, Juergen Gross wrote:
>> Today the vcpu runstate of a new scheduled vcpu is always set to
>> "running" even if at that time vcpu_runnable() is already returning
>> false due to a race (e.g. with pausing the vcpu).
> 
> I can see this part, ...
> 
>> With core scheduling this can no longer work as not all vcpus of a
>> schedule unit have to be "running" when being scheduled. So the vcpu's
>> new runstate has to be selected at the same time as the runnability of
>> the related schedule unit is probed.
> 
> ... but I continue having trouble here. If it has been okay to set
> a vCPU no longer runnable to "running" nevertheless, why would the
> same not be true for schedule units? Part of the problem may be
> that ...

The difference is the need to drop the scheduling lock for doing the
rendezvous. vcpu_sleep() or vcpu_wake() could now interfere with
scheduling in a way which was not possible before.

> 
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -76,6 +76,29 @@ static inline bool unit_runnable(const struct sched_unit *unit)
>>       return vcpu_runnable(unit->vcpu_list);
> 
> ... this clearly still isn't doing the (I suppose) intended loop,
> and hence ...
> 
>>   }
>>   
>> +static inline bool unit_runnable_state(const struct sched_unit *unit)
>> +{
>> +    struct vcpu *v;
>> +    bool runnable, ret = false;
>> +
>> +    if ( is_idle_unit(unit) )
>> +        return true;
>> +
>> +    for_each_sched_unit_vcpu ( unit, v )
>> +    {
>> +        runnable = vcpu_runnable(v);
>> +
>> +        v->new_state = runnable ? RUNSTATE_running
>> +                                : (v->pause_flags & VPF_blocked)
>> +                                  ? RUNSTATE_blocked : RUNSTATE_offline;
>> +
>> +        if ( runnable )
>> +            ret = true;
>> +    }
>> +
>> +    return ret;
>> +}
> 
> ... it's not obvious what the eventual difference between the two is
> going to be.
> 
> Furthermore I think a function of the given name, returning bool, and
> taking a pointer to const deserves a comment as to the (possibly
> slightly unexpected) state change it does. This comment might then be
> worthwhile to extend to also outline the usage difference between it
> and its sibling above.

I'll add that.

> 
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -174,6 +174,7 @@ struct vcpu
>>           XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
>>       } runstate_guest; /* guest address */
>>   #endif
>> +    unsigned int     new_state;
> 
> Similarly I think it would be nice for this field to gain a brief
> comment as to its purpose compared to runstate.state.

Okay.


Juergen
Jan Beulich Sept. 24, 2019, 2:45 p.m. UTC | #3
On 24.09.2019 16:25, Jürgen Groß wrote:
> On 23.09.19 17:24, Jan Beulich wrote:
>> On 14.09.2019 10:52, Juergen Gross wrote:
>>> Today the vcpu runstate of a new scheduled vcpu is always set to
>>> "running" even if at that time vcpu_runnable() is already returning
>>> false due to a race (e.g. with pausing the vcpu).
>>
>> I can see this part, ...
>>
>>> With core scheduling this can no longer work as not all vcpus of a
>>> schedule unit have to be "running" when being scheduled. So the vcpu's
>>> new runstate has to be selected at the same time as the runnability of
>>> the related schedule unit is probed.
>>
>> ... but I continue having trouble here. If it has been okay to set
>> a vCPU no longer runnable to "running" nevertheless, why would the
>> same not be true for schedule units? Part of the problem may be
>> that ...
> 
> The difference is the need to drop the scheduling lock for doing the
> rendezvous. vcpu_sleep() or vcpu_wake() could now interfere with
> scheduling in a way which was not possible before.

Ah, right. But this would be worth mentioning explicitly here to
help readers, wouldn't it? Doesn't need to be more than half a
sentence ...

Jan
Jürgen Groß Sept. 24, 2019, 2:48 p.m. UTC | #4
On 24.09.19 16:45, Jan Beulich wrote:
> On 24.09.2019 16:25, Jürgen Groß wrote:
>> On 23.09.19 17:24, Jan Beulich wrote:
>>> On 14.09.2019 10:52, Juergen Gross wrote:
>>>> Today the vcpu runstate of a new scheduled vcpu is always set to
>>>> "running" even if at that time vcpu_runnable() is already returning
>>>> false due to a race (e.g. with pausing the vcpu).
>>>
>>> I can see this part, ...
>>>
>>>> With core scheduling this can no longer work as not all vcpus of a
>>>> schedule unit have to be "running" when being scheduled. So the vcpu's
>>>> new runstate has to be selected at the same time as the runnability of
>>>> the related schedule unit is probed.
>>>
>>> ... but I continue having trouble here. If it has been okay to set
>>> a vCPU no longer runnable to "running" nevertheless, why would the
>>> same not be true for schedule units? Part of the problem may be
>>> that ...
>>
>> The difference is the need to drop the scheduling lock for doing the
>> rendezvous. vcpu_sleep() or vcpu_wake() could now interfere with
>> scheduling in a way which was not possible before.
> 
> Ah, right. But this would be worth mentioning explicitly here to
> help readers, wouldn't it? Doesn't need to be more than half a
> sentence ...

Okay.


Juergen
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7a1be85be9..fa4023936b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -157,6 +157,7 @@  struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
+        v->new_state = RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 96f3e844d2..1e88da40b1 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -557,7 +557,7 @@  a653sched_do_schedule(
     if ( !((new_task != NULL)
            && (AUNIT(new_task) != NULL)
            && AUNIT(new_task)->awake
-           && unit_runnable(new_task)) )
+           && unit_runnable_state(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 5802a67784..b25a7d2270 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1894,7 +1894,7 @@  static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && unit_runnable(unit)
+         && unit_runnable_state(unit)
          && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
@@ -1939,33 +1939,36 @@  static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
 
-    snext = __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
-        snext->pri = CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable UNIT has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext = __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle UNIT context) overrides all else. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
+            snext->pri = CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable UNIT has already eaten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext, but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+
+    } while ( !unit_runnable_state(snext->unit) );
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 414ac8f5b6..708643be7e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3289,7 +3289,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+    if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) &&
          (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3343,7 +3343,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
+    if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_unit(sched_idle_unit(cpu));
@@ -3403,7 +3403,8 @@  runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) &&
+             unit_runnable_state(svc->unit) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index d7deef07b8..34ac018cca 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -864,7 +864,8 @@  static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
             cpumask_set_cpu(sched_cpu, &prv->cpus_free);
     }
 
-    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+    if ( unlikely(prev->next_task == NULL ||
+                  !unit_runnable_state(prev->next_task)) )
         prev->next_task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 91bc3d56fb..9f4e397334 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1093,12 +1093,18 @@  rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
     else
     {
         snext = runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext == NULL )
             snext = rt_unit(sched_idle_unit(sched_cpu));
+        else if ( !unit_runnable_state(snext->unit) )
+        {
+            q_remove(snext);
+            snext = rt_unit(sched_idle_unit(sched_cpu));
+        }
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
-             unit_runnable(currunit) &&
+             unit_runnable_state(currunit) &&
              scurr->cur_budget > 0 &&
              ( is_idle_unit(snext->unit) ||
                compare_unit_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 78b47acedf..03bcf796ae 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -269,7 +269,7 @@  static inline void sched_unit_runstate_change(struct sched_unit *unit,
     struct vcpu *v = unit->vcpu_list;
 
     if ( running )
-        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+        vcpu_runstate_change(v, v->new_state, new_entry_time);
     else
         vcpu_runstate_change(v,
             ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 4872570612..25ba6f25c9 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -76,6 +76,29 @@  static inline bool unit_runnable(const struct sched_unit *unit)
     return vcpu_runnable(unit->vcpu_list);
 }
 
+static inline bool unit_runnable_state(const struct sched_unit *unit)
+{
+    struct vcpu *v;
+    bool runnable, ret = false;
+
+    if ( is_idle_unit(unit) )
+        return true;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        runnable = vcpu_runnable(v);
+
+        v->new_state = runnable ? RUNSTATE_running
+                                : (v->pause_flags & VPF_blocked)
+                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+
+        if ( runnable )
+            ret = true;
+    }
+
+    return ret;
+}
+
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c0e4dc2dc3..5b805eac58 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@  struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    unsigned int     new_state;
 
     /* Has the FPU been initialised? */
     bool             fpu_initialised;