Message ID | 1436293469-25707-12-git-send-email-morten.rasmussen@arm.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Tue, Jul 07, 2015 at 07:23:54PM +0100, Morten Rasmussen wrote: > Tasks being dequeued for the last time (state == TASK_DEAD) are dequeued > with the DEQUEUE_SLEEP flag which causes their load and utilization > contributions to be added to the runqueue blocked load and utilization. > Hence they will contain load or utilization that is gone away. The issue > only exists for the root cfs_rq as cgroup_exit() doesn't set > DEQUEUE_SLEEP for task group exits. > > If runnable+blocked load is to be used as a better estimate for cpu > load the dead task contributions need to be removed to prevent > load_balance() (idle_balance() in particular) from over-estimating the > cpu load. > > cc: Ingo Molnar <mingo@redhat.com> > cc: Peter Zijlstra <peterz@infradead.org> > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > --- > kernel/sched/fair.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 775b0c7..fa12ce5 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3217,6 +3217,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > * Update run-time statistics of the 'current'. > */ > update_curr(cfs_rq); > + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) > + flags &= !DEQUEUE_SLEEP; So flags will be set to zero? Could be replaced by "flags &= ~DEQUEUE_SLEEP"? > dequeue_entity_load_avg(cfs_rq, se, flags & DEQUEUE_SLEEP); > > update_stats_dequeue(cfs_rq, se); > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 22, 2015 at 02:51:01PM +0800, Leo Yan wrote: > On Tue, Jul 07, 2015 at 07:23:54PM +0100, Morten Rasmussen wrote: > > Tasks being dequeued for the last time (state == TASK_DEAD) are dequeued > > with the DEQUEUE_SLEEP flag which causes their load and utilization > > contributions to be added to the runqueue blocked load and utilization. > > Hence they will contain load or utilization that is gone away. The issue > > only exists for the root cfs_rq as cgroup_exit() doesn't set > > DEQUEUE_SLEEP for task group exits. > > > > If runnable+blocked load is to be used as a better estimate for cpu > > load the dead task contributions need to be removed to prevent > > load_balance() (idle_balance() in particular) from over-estimating the > > cpu load. > > > > cc: Ingo Molnar <mingo@redhat.com> > > cc: Peter Zijlstra <peterz@infradead.org> > > > > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> > > --- > > kernel/sched/fair.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 775b0c7..fa12ce5 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3217,6 +3217,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > > * Update run-time statistics of the 'current'. > > */ > > update_curr(cfs_rq); > > + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) > > + flags &= !DEQUEUE_SLEEP; > > So flags will be set to zero? Could be replaced by "flags &= ~DEQUEUE_SLEEP"? Not could, should :) I meant to clear the flag, but used the wrong operator. We only have DEQUEUE_SLEEP and 0 at the moment so it doesn't matter, but it might later. Thanks, Morten -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 07, 2015 at 07:23:54PM +0100, Morten Rasmussen wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 775b0c7..fa12ce5 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3217,6 +3217,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > * Update run-time statistics of the 'current'. > */ > update_curr(cfs_rq); > + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) > + flags &= !DEQUEUE_SLEEP; > dequeue_entity_load_avg(cfs_rq, se, flags & DEQUEUE_SLEEP); > > update_stats_dequeue(cfs_rq, se); I know this is entirely redundant at this point (we took Yuyang's patches), but this is the wrong way to go about doing this. You add extra code the hot dequeue path for something that 'never' happens. We have the sched_class::task_dead call for that. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 11, 2015 at 01:39:27PM +0200, Peter Zijlstra wrote: > On Tue, Jul 07, 2015 at 07:23:54PM +0100, Morten Rasmussen wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 775b0c7..fa12ce5 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3217,6 +3217,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > > * Update run-time statistics of the 'current'. > > */ > > update_curr(cfs_rq); > > + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) > > + flags &= !DEQUEUE_SLEEP; > > dequeue_entity_load_avg(cfs_rq, se, flags & DEQUEUE_SLEEP); > > > > update_stats_dequeue(cfs_rq, se); > > I know this is entirely redundant at this point (we took Yuyang's > patches), but this is the wrong way to go about doing this. Yes, I'm still working my way through Yuyang's changes. > You add extra code the hot dequeue path for something that 'never' > happens. We have the sched_class::task_dead call for that. I don't mind using sched_class::task_dead() instead. The reason why I didn't go that way is that we have to retake the rq->lock or mess with cfs_rq::removed_load instead of just not adding the utilization in the first place when we have the rq->lock. Anyway, it is probably redundant by now. I will check Yuyang's code to see if he already fixed this problem. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 11, 2015 at 03:58:48PM +0100, Morten Rasmussen wrote: > On Tue, Aug 11, 2015 at 01:39:27PM +0200, Peter Zijlstra wrote: > > You add extra code the hot dequeue path for something that 'never' > > happens. We have the sched_class::task_dead call for that. > > I don't mind using sched_class::task_dead() instead. The reason why I > didn't go that way is that we have to retake the rq->lock or mess with > cfs_rq::removed_load instead of just not adding the utilization in > the first place when we have the rq->lock. > > Anyway, it is probably redundant by now. I will check Yuyang's code to > see if he already fixed this problem. He did, he used the removed_load stuff, same as migration does. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 11, 2015 at 07:23:44PM +0200, Peter Zijlstra wrote: > On Tue, Aug 11, 2015 at 03:58:48PM +0100, Morten Rasmussen wrote: > > On Tue, Aug 11, 2015 at 01:39:27PM +0200, Peter Zijlstra wrote: > > > > You add extra code the hot dequeue path for something that 'never' > > > happens. We have the sched_class::task_dead call for that. > > > > I don't mind using sched_class::task_dead() instead. The reason why I > > didn't go that way is that we have to retake the rq->lock or mess with > > cfs_rq::removed_load instead of just not adding the utilization in > > the first place when we have the rq->lock. > > > > Anyway, it is probably redundant by now. I will check Yuyang's code to > > see if he already fixed this problem. > > He did, he used the removed_load stuff, same as migration does. Nice. One less patch to worry about :) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 775b0c7..fa12ce5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3217,6 +3217,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * Update run-time statistics of the 'current'. */ update_curr(cfs_rq); + if (entity_is_task(se) && task_of(se)->state == TASK_DEAD) + flags &= !DEQUEUE_SLEEP; dequeue_entity_load_avg(cfs_rq, se, flags & DEQUEUE_SLEEP); update_stats_dequeue(cfs_rq, se);
Tasks being dequeued for the last time (state == TASK_DEAD) are dequeued with the DEQUEUE_SLEEP flag which causes their load and utilization contributions to be added to the runqueue blocked load and utilization. Hence they will contain load or utilization that is gone away. The issue only exists for the root cfs_rq as cgroup_exit() doesn't set DEQUEUE_SLEEP for task group exits. If runnable+blocked load is to be used as a better estimate for cpu load the dead task contributions need to be removed to prevent load_balance() (idle_balance() in particular) from over-estimating the cpu load. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+)