diff mbox series

timers/migration: Return early on deactivation

Message ID 87bk6pvzql.fsf@somnus (mailing list archive)
State Superseded
Headers show
Series timers/migration: Return early on deactivation | expand

Commit Message

Anna-Maria Behnsen April 4, 2024, 4:50 p.m. UTC
Commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
deactivation") removed the logic to return early in tmigr_update_events()
on deactivation. With this the problem with a not properly updated first
global event in a hierarchy containing only a single group was fixed.

But when having a look at this code path with a hierarchy with more than a
single level, now unnecessary work is done (example is partially copied
from the message of the commit mentioned above):

                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:0i, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = 0              migrator = NONE
           active   = 0              active   = NONE
           nextevt  = T0i, T1        nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
      active         idle            idle       idle

0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2

                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:0i, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = NONE           migrator = NONE
           active   = NONE           active   = NONE
           nextevt  = T1             nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
        idle         idle            idle         idle

1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
pushed as its next expiry and its own event kept as "ignore". Without this
early return the following steps happen in tmigr_update_events() when
child = null and group = GRP0:0 :

  lock(GRP0:0->lock);
  timerqueue_del(GRP0:0, T0i);
  unlock(GRP0:0->lock);


                            [GRP1:0]
                         migrator = NONE
                         active   = NONE
                         nextevt  = T0:0, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = NONE           migrator = NONE
           active   = NONE           active   = NONE
           nextevt  = T1             nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
        idle         idle            idle         idle

2) The change now propagates up to the top. Then tmigr_update_events()
updates the group event of GRP0:0 and executes the following steps
(child = GRP0:0 and group = GRP0:0):

  lock(GRP0:0->lock);
  lock(GRP1:0->lock);
  evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events
					in GRP0:0
  ... update GRP1:0 group event and timerqueue ...
  unlock(GRP1:0->lock);
  unlock(GRP0:0->lock);

So the dance in 1) with locking the GRP0:0->lock and removing the T0i from
the timerqueue is redundand as this is done nevertheless in 2) when
tmigr_next_groupevt(GRP0:0) is executed.

Revert commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
deactivation") and add a condition into return path to skip the return
only, when hierarchy contains a single group.

Fixes: 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation")
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
 kernel/time/timer_migration.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Frederic Weisbecker April 4, 2024, 10:19 p.m. UTC | #1
Le Thu, Apr 04, 2024 at 06:50:26PM +0200, Anna-Maria Behnsen a écrit :
> Commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
> deactivation") removed the logic to return early in tmigr_update_events()
> on deactivation. With this the problem with a not properly updated first
> global event in a hierarchy containing only a single group was fixed.
> 
> But when having a look at this code path with a hierarchy with more than a
> single level, now unnecessary work is done (example is partially copied
> from the message of the commit mentioned above):
> 
>                             [GRP1:0]
>                          migrator = GRP0:0
>                          active   = GRP0:0
>                          nextevt  = T0:0i, T0:1
>                          /              \
>               [GRP0:0]                  [GRP0:1]
>            migrator = 0              migrator = NONE
>            active   = 0              active   = NONE
>            nextevt  = T0i, T1        nextevt  = T2
>            /         \                /         \
>           0 (T0i)     1 (T1)         2 (T2)      3
>       active         idle            idle       idle
> 
> 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
> upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
> CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2
> 
>                             [GRP1:0]
>                          migrator = GRP0:0
>                          active   = GRP0:0
>                          nextevt  = T0:0i, T0:1
>                          /              \
>               [GRP0:0]                  [GRP0:1]
>            migrator = NONE           migrator = NONE
>            active   = NONE           active   = NONE
>            nextevt  = T1             nextevt  = T2
>            /         \                /         \
>           0 (T0i)     1 (T1)         2 (T2)      3
>         idle         idle            idle         idle
> 
> 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
> pushed as its next expiry and its own event kept as "ignore". Without this
> early return the following steps happen in tmigr_update_events() when
> child = null and group = GRP0:0 :
> 
>   lock(GRP0:0->lock);
>   timerqueue_del(GRP0:0, T0i);
>   unlock(GRP0:0->lock);
> 
> 
>                             [GRP1:0]
>                          migrator = NONE
>                          active   = NONE
>                          nextevt  = T0:0, T0:1
>                          /              \
>               [GRP0:0]                  [GRP0:1]
>            migrator = NONE           migrator = NONE
>            active   = NONE           active   = NONE
>            nextevt  = T1             nextevt  = T2
>            /         \                /         \
>           0 (T0i)     1 (T1)         2 (T2)      3
>         idle         idle            idle         idle
> 
> 2) The change now propagates up to the top. Then tmigr_update_events()
> updates the group event of GRP0:0 and executes the following steps
> (child = GRP0:0 and group = GRP0:0):
> 
>   lock(GRP0:0->lock);
>   lock(GRP1:0->lock);
>   evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events
> 					in GRP0:0
>   ... update GRP1:0 group event and timerqueue ...
>   unlock(GRP1:0->lock);
>   unlock(GRP0:0->lock);
> 
> So the dance in 1) with locking the GRP0:0->lock and removing the T0i from
> the timerqueue is redundand as this is done nevertheless in 2) when
> tmigr_next_groupevt(GRP0:0) is executed.
> 
> Revert commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
> deactivation") and add a condition into return path to skip the return
> only, when hierarchy contains a single group.
> 
> Fixes: 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation")
> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Just some comment nits:

> ---
>  kernel/time/timer_migration.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -751,6 +751,31 @@ bool tmigr_update_events(struct tmigr_gr
>  
>  		first_childevt = evt = data->evt;
>  
> +		/*
> +		 * Walking the hierarchy is required in any case when a
> +		 * remote expiry was done before. This ensures to not lose
> +		 * already queued events in non active groups (see section
> +		 * "Required event and timerqueue update after a remote
> +		 * expiry" in the documentation at the top).
> +		 *
> +		 * The two call sites which are executed without a remote expiry
> +		 * before, are not prevented from propagating changes through
> +		 * the hierarchy by the return:
> +		 *  - When entering this path by tmigr_new_timer(), @evt->ignore
> +		 *    is never set.
> +		 *  - tmigr_inactive_up() takes care of the propagation by
> +		 *    itself and ignores the return value. But an immediate
> +		 *    return is required because nothing has to be done in this
> +		 *    level as the event could be ignored.

It's not exactly required, it's an optimization. How about:

"""
But an immediate return is possible if there is a parent, sparing group
locking at this level, because the upper walking call to the parent will
take care about removing this event from within the group and update
next_expiry accordingly.
"""

> +		 *
> +		 * But, if the hierarchy has only a single level so @group is
> +		 * the top level group, make sure first event information of the
> +		 * group is updated properly and also handled properly, so skip
> +		 * this fast return path.

"""
However if there is no parent, ie: the hierarchy has only a single level so
@group is the top level group, make sure the first event information of the
group is updated properly and also handled properly, so skip
this fast return path.
"""

Thanks.

> +		 */
> +		if (evt->ignore && !remote && group->parent)
> +			return true;
> +
>  		raw_spin_lock(&group->lock);
>  
>  		childstate.state = 0;
diff mbox series

Patch

--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -751,6 +751,31 @@  bool tmigr_update_events(struct tmigr_gr
 
 		first_childevt = evt = data->evt;
 
+		/*
+		 * Walking the hierarchy is required in any case when a
+		 * remote expiry was done before. This ensures to not lose
+		 * already queued events in non active groups (see section
+		 * "Required event and timerqueue update after a remote
+		 * expiry" in the documentation at the top).
+		 *
+		 * The two call sites which are executed without a remote expiry
+		 * before, are not prevented from propagating changes through
+		 * the hierarchy by the return:
+		 *  - When entering this path by tmigr_new_timer(), @evt->ignore
+		 *    is never set.
+		 *  - tmigr_inactive_up() takes care of the propagation by
+		 *    itself and ignores the return value. But an immediate
+		 *    return is required because nothing has to be done in this
+		 *    level as the event could be ignored.
+		 *
+		 * But, if the hierarchy has only a single level so @group is
+		 * the top level group, make sure first event information of the
+		 * group is updated properly and also handled properly, so skip
+		 * this fast return path.
+		 */
+		if (evt->ignore && !remote && group->parent)
+			return true;
+
 		raw_spin_lock(&group->lock);
 
 		childstate.state = 0;