diff mbox series

[v2] xen/sched: rework and rename vcpu_force_reschedule()

Message ID 20190914064217.4877-1-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series [v2] xen/sched: rework and rename vcpu_force_reschedule() | expand

Commit Message

Jürgen Groß Sept. 14, 2019, 6:42 a.m. UTC
vcpu_force_reschedule() is only used for modifying the periodic timer
of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
is kind of brutal.

So instead of doing the reschedule dance just operate on the timer
directly. By protecting periodic timer modifications against concurrent
timer activation via a per-vcpu lock it is even no longer required to
bother the target vcpu at all for updating its timer.

Rename the function to vcpu_set_periodic_timer() as this now reflects
the functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
- Carved out from my core scheduling series
- Reworked to avoid deadlock when 2 vcpus are trying to modify each
  others periodic timers, leading to address all comments by Jan
  Beulich.
V2:
- test periodic_period again in vcpu_periodic_timer_work() when lock
  obtained (Jan Beulich)
---
 xen/arch/x86/pv/shim.c  |  4 +---
 xen/common/domain.c     |  6 ++----
 xen/common/schedule.c   | 53 ++++++++++++++++++++++++++++---------------------
 xen/include/xen/sched.h |  3 ++-
 4 files changed, 35 insertions(+), 31 deletions(-)

Comments

Jan Beulich Sept. 16, 2019, 9:20 a.m. UTC | #1
On 14.09.2019 08:42, Juergen Gross wrote:
> vcpu_force_reschedule() is only used for modifying the periodic timer
> of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
> is kind of brutal.
> 
> So instead of doing the reschedule dance just operate on the timer
> directly. By protecting periodic timer modifications against concurrent
> timer activation via a per-vcpu lock it is even no longer required to
> bother the target vcpu at all for updating its timer.
> 
> Rename the function to vcpu_set_periodic_timer() as this now reflects
> the functionality.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

I continue to be unhappy about there being no word at all about ...

> @@ -724,24 +725,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
>      vcpu_wake(v);
>  }
>  
> -/*
> - * Force a VCPU through a deschedule/reschedule path.
> - * For example, using this when setting the periodic timer period means that
> - * most periodic-timer state need only be touched from within the scheduler
> - * which can thus be done without need for synchronisation.
> - */
> -void vcpu_force_reschedule(struct vcpu *v)

... the originally intended synchronization-free handling. Forcing
the vCPU through the scheduler may seem harsh (and quite some
overhead), yes, but I don't think the above was written (and
decided) without consideration. One effect of this can be seen by
you ...

> +void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
> +{
> +    spin_lock(&v->periodic_timer_lock);
> +
> +    stop_timer(&v->periodic_timer);

... introducing a new stop_timer() here, i.e. which doesn't replace
any existing one. The implication is that other than before the
periodic timer may now not run (for a brief moment) despite it
being supposed to run - after all it has been active so far
whenever a vCPU was running.

Then again, looking at the involved code paths yet again, I wonder
whether this has been working right at all: There's an early exit
from schedule() when prev == next, which bypasses
vcpu_periodic_timer_work(). And I can't seem to be able to spot
anything on the vcpu_force_reschedule() path which would guarantee
this shortcut to not be taken.

Jan
Jürgen Groß Sept. 16, 2019, 12:49 p.m. UTC | #2
On 16.09.19 11:20, Jan Beulich wrote:
> On 14.09.2019 08:42, Juergen Gross wrote:
>> vcpu_force_reschedule() is only used for modifying the periodic timer
>> of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
>> is kind of brutal.
>>
>> So instead of doing the reschedule dance just operate on the timer
>> directly. By protecting periodic timer modifications against concurrent
>> timer activation via a per-vcpu lock it is even no longer required to
>> bother the target vcpu at all for updating its timer.
>>
>> Rename the function to vcpu_set_periodic_timer() as this now reflects
>> the functionality.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> I continue to be unhappy about there being no word at all about ...
> 
>> @@ -724,24 +725,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
>>       vcpu_wake(v);
>>   }
>>   
>> -/*
>> - * Force a VCPU through a deschedule/reschedule path.
>> - * For example, using this when setting the periodic timer period means that
>> - * most periodic-timer state need only be touched from within the scheduler
>> - * which can thus be done without need for synchronisation.
>> - */
>> -void vcpu_force_reschedule(struct vcpu *v)
> 
> ... the originally intended synchronization-free handling. Forcing
> the vCPU through the scheduler may seem harsh (and quite some
> overhead), yes, but I don't think the above was written (and
> decided) without consideration. One effect of this can be seen by
> you ...
> 
>> +void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
>> +{
>> +    spin_lock(&v->periodic_timer_lock);
>> +
>> +    stop_timer(&v->periodic_timer);
> 
> ... introducing a new stop_timer() here, i.e. which doesn't replace
> any existing one. The implication is that other than before the
> periodic timer may now not run (for a brief moment) despite it
> being supposed to run - after all it has been active so far
> whenever a vCPU was running.
> 
> Then again, looking at the involved code paths yet again, I wonder
> whether this has been working right at all: There's an early exit
> from schedule() when prev == next, which bypasses
> vcpu_periodic_timer_work(). And I can't seem to be able to spot
> anything on the vcpu_force_reschedule() path which would guarantee
> this shortcut to not be taken.

First, the current "synchronization-free" handling is not existing. The
synchronization is just hidden in the calls of vcpu_migrate_*() and it
is done via the scheduler lock.

Yes, I'm adding a stop_timer(), but the related stop_timer() call in
the old code was in schedule(). So statically you are right, but
dynamically there is no new stop_timer() call involved.

And last: the case prev == next would not occur today, as the migrate
flag being set in vcpu->pause_flags would cause the vcpu to be taken
away from the cpu. So it is working today, but setting the periodic
timer requires two scheduling events in case the target vcpu is
currently running.


Juergen
Jan Beulich Sept. 16, 2019, 2:39 p.m. UTC | #3
On 16.09.2019 14:49, Juergen Gross wrote:
> On 16.09.19 11:20, Jan Beulich wrote:
>> On 14.09.2019 08:42, Juergen Gross wrote:
>>> vcpu_force_reschedule() is only used for modifying the periodic timer
>>> of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
>>> is kind of brutal.
>>>
>>> So instead of doing the reschedule dance just operate on the timer
>>> directly. By protecting periodic timer modifications against concurrent
>>> timer activation via a per-vcpu lock it is even no longer required to
>>> bother the target vcpu at all for updating its timer.
>>>
>>> Rename the function to vcpu_set_periodic_timer() as this now reflects
>>> the functionality.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> I continue to be unhappy about there being no word at all about ...
>>
>>> @@ -724,24 +725,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
>>>       vcpu_wake(v);
>>>   }
>>>   
>>> -/*
>>> - * Force a VCPU through a deschedule/reschedule path.
>>> - * For example, using this when setting the periodic timer period means that
>>> - * most periodic-timer state need only be touched from within the scheduler
>>> - * which can thus be done without need for synchronisation.
>>> - */
>>> -void vcpu_force_reschedule(struct vcpu *v)
>>
>> ... the originally intended synchronization-free handling. Forcing
>> the vCPU through the scheduler may seem harsh (and quite some
>> overhead), yes, but I don't think the above was written (and
>> decided) without consideration. One effect of this can be seen by
>> you ...
>>
>>> +void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
>>> +{
>>> +    spin_lock(&v->periodic_timer_lock);
>>> +
>>> +    stop_timer(&v->periodic_timer);
>>
>> ... introducing a new stop_timer() here, i.e. which doesn't replace
>> any existing one. The implication is that other than before the
>> periodic timer may now not run (for a brief moment) despite it
>> being supposed to run - after all it has been active so far
>> whenever a vCPU was running.
>>
>> Then again, looking at the involved code paths yet again, I wonder
>> whether this has been working right at all: There's an early exit
>> from schedule() when prev == next, which bypasses
>> vcpu_periodic_timer_work(). And I can't seem to be able to spot
>> anything on the vcpu_force_reschedule() path which would guarantee
>> this shortcut to not be taken.
> 
> First, the current "synchronization-free" handling is not existing. The
> synchronization is just hidden in the calls of vcpu_migrate_*() and it
> is done via the scheduler lock.

Sure, but the scheduler lock needs to be taken during scheduling
of the vCPU anyway. There was no "extra" synchronization involved.

> Yes, I'm adding a stop_timer(), but the related stop_timer() call in
> the old code was in schedule(). So statically you are right, but
> dynamically there is no new stop_timer() call involved.

I did specifically check that my comment is not just about the
"static" part (as you call it). As said - there was no stop_timer()
before behind a running vCPU's back. This is what worries me.

> And last: the case prev == next would not occur today, as the migrate
> flag being set in vcpu->pause_flags would cause the vcpu to be taken
> away from the cpu. So it is working today, but setting the periodic
> timer requires two scheduling events in case the target vcpu is
> currently running.

I'm not going to claim I fully understood the code when looking at
it in the morning, but I couldn't find the place(s) guaranteeing
that by the time the migration of the vCPU is over it wouldn't be
runnable again right away, and hence potentially re-chosen as the
vCPU to run on the pCPU is was running on before.

Jan
Jürgen Groß Sept. 16, 2019, 5:23 p.m. UTC | #4
On 16.09.19 16:39, Jan Beulich wrote:
> On 16.09.2019 14:49, Juergen Gross wrote:
>> On 16.09.19 11:20, Jan Beulich wrote:
>>> On 14.09.2019 08:42, Juergen Gross wrote:
>>>> vcpu_force_reschedule() is only used for modifying the periodic timer
>>>> of a vcpu. Forcing a vcpu to give up the physical cpu for that purpose
>>>> is kind of brutal.
>>>>
>>>> So instead of doing the reschedule dance just operate on the timer
>>>> directly. By protecting periodic timer modifications against concurrent
>>>> timer activation via a per-vcpu lock it is even no longer required to
>>>> bother the target vcpu at all for updating its timer.
>>>>
>>>> Rename the function to vcpu_set_periodic_timer() as this now reflects
>>>> the functionality.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> I continue to be unhappy about there being no word at all about ...
>>>
>>>> @@ -724,24 +725,6 @@ static void vcpu_migrate_finish(struct vcpu *v)
>>>>        vcpu_wake(v);
>>>>    }
>>>>    
>>>> -/*
>>>> - * Force a VCPU through a deschedule/reschedule path.
>>>> - * For example, using this when setting the periodic timer period means that
>>>> - * most periodic-timer state need only be touched from within the scheduler
>>>> - * which can thus be done without need for synchronisation.
>>>> - */
>>>> -void vcpu_force_reschedule(struct vcpu *v)
>>>
>>> ... the originally intended synchronization-free handling. Forcing
>>> the vCPU through the scheduler may seem harsh (and quite some
>>> overhead), yes, but I don't think the above was written (and
>>> decided) without consideration. One effect of this can be seen by
>>> you ...
>>>
>>>> +void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
>>>> +{
>>>> +    spin_lock(&v->periodic_timer_lock);
>>>> +
>>>> +    stop_timer(&v->periodic_timer);
>>>
>>> ... introducing a new stop_timer() here, i.e. which doesn't replace
>>> any existing one. The implication is that other than before the
>>> periodic timer may now not run (for a brief moment) despite it
>>> being supposed to run - after all it has been active so far
>>> whenever a vCPU was running.
>>>
>>> Then again, looking at the involved code paths yet again, I wonder
>>> whether this has been working right at all: There's an early exit
>>> from schedule() when prev == next, which bypasses
>>> vcpu_periodic_timer_work(). And I can't seem to be able to spot
>>> anything on the vcpu_force_reschedule() path which would guarantee
>>> this shortcut to not be taken.
>>
>> First, the current "synchronization-free" handling is not existing. The
>> synchronization is just hidden in the calls of vcpu_migrate_*() and it
>> is done via the scheduler lock.
> 
> Sure, but the scheduler lock needs to be taken during scheduling
> of the vCPU anyway. There was no "extra" synchronization involved.

Of course there was. The scheduling path was forced to happen, this
resulted in the extra synchronization (twice in fact, once for
de-scheduling and once for scheduling in again).

> 
>> Yes, I'm adding a stop_timer(), but the related stop_timer() call in
>> the old code was in schedule(). So statically you are right, but
>> dynamically there is no new stop_timer() call involved.
> 
> I did specifically check that my comment is not just about the
> "static" part (as you call it). As said - there was no stop_timer()
> before behind a running vCPU's back. This is what worries me.

There is a stop_timer() for the periodic timer happening each time the
vcpu is de-scheduled (look into schedule()). And as setting the periodic
timer today results in a de-scheduling of the vcpu the stop_timer() is
happening. I assume here that the de-scheduling would not have happened
without setting the timer, of course. In case the vcpu is not running
when the timer is being set, my patch will call stop_timer() for a
timer already being stopped.

> 
>> And last: the case prev == next would not occur today, as the migrate
>> flag being set in vcpu->pause_flags would cause the vcpu to be taken
>> away from the cpu. So it is working today, but setting the periodic
>> timer requires two scheduling events in case the target vcpu is
>> currently running.
> 
> I'm not going to claim I fully understood the code when looking at
> it in the morning, but I couldn't find the place(s) guaranteeing
> that by the time the migration of the vCPU is over it wouldn't be
> runnable again right away, and hence potentially re-chosen as the
> vCPU to run on the pCPU is was running on before.

vcpu_migrate_start() is setting _VPF_migrating in the vcpu's
pause_flags and then initiates a scheduling event via
vcpu_sleep_nosync(). vcpu_migrate_finish() will only reset the flag
if the vcpu is not running (so either it wasn't running before, or the
scheduling event already has happened). If it is still running it will
not reset the flag, but this will only be done via context_saved(),
which is called when the vcpus has been de-scheduled. The vcpu can
only be selected to be running again when the migrated flag is not set.


Juergen
Dario Faggioli Sept. 17, 2019, 2:33 p.m. UTC | #5
On Sat, 2019-09-14 at 08:42 +0200, Juergen Gross wrote:
> vcpu_force_reschedule() is only used for modifying the periodic timer
> of a vcpu. Forcing a vcpu to give up the physical cpu for that
> purpose
> is kind of brutal.
> 
> So instead of doing the reschedule dance just operate on the timer
> directly. By protecting periodic timer modifications against
> concurrent
> timer activation via a per-vcpu lock it is even no longer required to
> bother the target vcpu at all for updating its timer.
> 
> Rename the function to vcpu_set_periodic_timer() as this now reflects
> the functionality.
> 
Personally, I'm rather happy to see the code which was doing that very
weird "let's go through rescheduling" dance going away. I, FWIW, never
understood why periodic timer handling was implemented that way (and
looking back at relevant changelogs does not help).

The code, as it results after applying this patch, is a lot better, and
easier to understand.

Performance and scalability wise, I don't have benchmarks for this
specific patch (but the ones I did included it, as it back then was
part of the core-scheduling series), but I agree with Juergen. I.e., I
think the patch is either neutral or, if it does something, it improves
things.

Furthermore, periodic timer is *not* used any longer (and since quite
some time/kernel versions). Basically, all we do with the periodic
timer is to disable it during boot. At least for Linux, but I think
this is the case for FreeBSD too. So, even if the patch would have a
negative impact (which again I don't think it's the case), we probably
won't see them.

On this grounds (and, of course, on the one that I've looked at the
code, and think it's correct), for the scheduling part:

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Then, if some words about the outcome of the discussion in this thread,
e.g., a mention to the fact that the old code wasn't really lockless,
and that the new code is a lot more straightforward, it'd be even
better.

But my Rev-by stands, with or without this.

Regards
diff mbox series

Patch

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 324ca27f93..5edbcd9ac5 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -410,7 +410,7 @@  int pv_shim_shutdown(uint8_t reason)
         unmap_vcpu_info(v);
 
         /* Reset the periodic timer to the default value. */
-        v->periodic_period = MILLISECS(10);
+        vcpu_set_periodic_timer(v, MILLISECS(10));
         /* Stop the singleshot timer. */
         stop_timer(&v->singleshot_timer);
 
@@ -419,8 +419,6 @@  int pv_shim_shutdown(uint8_t reason)
 
         if ( v != current )
             vcpu_unpause_by_systemcontroller(v);
-        else
-            vcpu_force_reschedule(v);
     }
 
     return 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 9a48b2504b..0cff749bbe 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1494,15 +1494,13 @@  long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( set.period_ns > STIME_DELTA_MAX )
             return -EINVAL;
 
-        v->periodic_period = set.period_ns;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, set.period_ns);
 
         break;
     }
 
     case VCPUOP_stop_periodic_timer:
-        v->periodic_period = 0;
-        vcpu_force_reschedule(v);
+        vcpu_set_periodic_timer(v, 0);
         break;
 
     case VCPUOP_set_singleshot_timer:
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index fdeec10c3b..13b5ffc7cf 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -312,6 +312,7 @@  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     v->processor = processor;
 
     /* Initialise the per-vcpu timers. */
+    spin_lock_init(&v->periodic_timer_lock);
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
     init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn,
@@ -724,24 +725,6 @@  static void vcpu_migrate_finish(struct vcpu *v)
     vcpu_wake(v);
 }
 
-/*
- * Force a VCPU through a deschedule/reschedule path.
- * For example, using this when setting the periodic timer period means that
- * most periodic-timer state need only be touched from within the scheduler
- * which can thus be done without need for synchronisation.
- */
-void vcpu_force_reschedule(struct vcpu *v)
-{
-    spinlock_t *lock = vcpu_schedule_lock_irq(v);
-
-    if ( v->is_running )
-        vcpu_migrate_start(v);
-
-    vcpu_schedule_unlock_irq(lock, v);
-
-    vcpu_migrate_finish(v);
-}
-
 void restore_vcpu_affinity(struct domain *d)
 {
     unsigned int cpu = smp_processor_id();
@@ -1458,14 +1441,11 @@  long sched_adjust_global(struct xen_sysctl_scheduler_op *op)
     return rc;
 }
 
-static void vcpu_periodic_timer_work(struct vcpu *v)
+static void vcpu_periodic_timer_work_locked(struct vcpu *v)
 {
     s_time_t now;
     s_time_t periodic_next_event;
 
-    if ( v->periodic_period == 0 )
-        return;
-
     now = NOW();
     periodic_next_event = v->periodic_last_event + v->periodic_period;
 
@@ -1476,10 +1456,37 @@  static void vcpu_periodic_timer_work(struct vcpu *v)
         periodic_next_event = now + v->periodic_period;
     }
 
-    migrate_timer(&v->periodic_timer, smp_processor_id());
+    migrate_timer(&v->periodic_timer, v->processor);
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
+static void vcpu_periodic_timer_work(struct vcpu *v)
+{
+    if ( v->periodic_period == 0 )
+        return;
+
+    spin_lock(&v->periodic_timer_lock);
+    if ( v->periodic_period )
+        vcpu_periodic_timer_work_locked(v);
+    spin_unlock(&v->periodic_timer_lock);
+}
+
+/*
+ * Set the periodic timer of a vcpu.
+ */
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value)
+{
+    spin_lock(&v->periodic_timer_lock);
+
+    stop_timer(&v->periodic_timer);
+
+    v->periodic_period = value;
+    if ( value )
+        vcpu_periodic_timer_work_locked(v);
+
+    spin_unlock(&v->periodic_timer_lock);
+}
+
 /*
  * The main function
  * - deschedule the current domain (scheduler independent).
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e3601c1935..40097ff334 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -153,6 +153,7 @@  struct vcpu
 
     struct vcpu     *next_in_list;
 
+    spinlock_t       periodic_timer_lock;
     s_time_t         periodic_period;
     s_time_t         periodic_last_event;
     struct timer     periodic_timer;
@@ -864,7 +865,7 @@  struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
-void vcpu_force_reschedule(struct vcpu *v);
+void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
 void sched_set_affinity(struct vcpu *v, const cpumask_t *hard,