Message ID | 1406569906-9763-2-git-send-email-vincent.guittot@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Vincent, On 7/29/14, 1:51 AM, Vincent Guittot wrote: > The imbalance flag can stay set whereas there is no imbalance. > > Let assume that we have 3 tasks that run on a dual cores /dual cluster system. > We will have some idle load balance which are triggered during tick. > Unfortunately, the tick is also used to queue background work so we can reach > the situation where short work has been queued on a CPU which already runs a > task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle > CPU) and will try to pull the waiting task on the idle CPU. The waiting task is > a worker thread that is pinned on a CPU so an imbalance due to pinned task is > detected and the imbalance flag is set. The waiting task is the third task or one the '2 tasks on 1 CPU' ? Regards, Wanpeng Li > Then, we will not be able to clear the flag because we have at most 1 task on > each CPU but the imbalance flag will trig to useless active load balance > between the idle CPU and the busy CPU. > > We need to reset of the imbalance flag as soon as we have reached a balanced > state. If all tasks are pinned, we don't consider that as a balanced state and > let the imbalance flag set. > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 923fe32..7eb9126 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq, > if (sd_parent) { > int *group_imbalance = &sd_parent->groups->sgc->imbalance; > > - if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) { > + if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) > *group_imbalance = 1; > - } else if (*group_imbalance) > - *group_imbalance = 0; > } > > /* All tasks on this runqueue were pinned by CPU affinity */ > @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > env.loop_break = sched_nr_migrate_break; > goto redo; > } > - goto out_balanced; > + goto out_all_pinned; > } > } > > @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq, > goto out; > > out_balanced: > + /* > + * We reach balance although we may have faced some affinity > + * constraints. Clear the imbalance flag if it was set. > + */ > + if (sd_parent) { > + int *group_imbalance = &sd_parent->groups->sgc->imbalance; > + > + if (*group_imbalance) > + *group_imbalance = 0; > + } > + > +out_all_pinned: > + /* > + * We reach balance because all tasks are pinned at this level so > + * we can't migrate them. Let the imbalance flag set so parent level > + * can try to migrate them. > + */ > schedstat_inc(sd, lb_balanced[idle]); > > sd->nr_balance_failed = 0;
On 23 November 2014 at 11:25, Wanpeng Li <kernellwp@gmail.com> wrote: > Hi Vincent, > On 7/29/14, 1:51 AM, Vincent Guittot wrote: >> >> The imbalance flag can stay set whereas there is no imbalance. >> >> Let assume that we have 3 tasks that run on a dual cores /dual cluster >> system. >> We will have some idle load balance which are triggered during tick. >> Unfortunately, the tick is also used to queue background work so we can >> reach >> the situation where short work has been queued on a CPU which already runs >> a >> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an >> idle >> CPU) and will try to pull the waiting task on the idle CPU. The waiting >> task is >> a worker thread that is pinned on a CPU so an imbalance due to pinned task >> is >> detected and the imbalance flag is set. > > > The waiting task is the third task or one the '2 tasks on 1 CPU' ? The waiting task is one of the 2 tasks on 1 CPU (the worker) Regards, Vincent > > Regards, > Wanpeng Li > > >> Then, we will not be able to clear the flag because we have at most 1 task >> on >> each CPU but the imbalance flag will trig to useless active load balance >> between the idle CPU and the busy CPU. >> >> We need to reset of the imbalance flag as soon as we have reached a >> balanced >> state. If all tasks are pinned, we don't consider that as a balanced state >> and >> let the imbalance flag set. >> >> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> --- >> kernel/sched/fair.c | 23 +++++++++++++++++++---- >> 1 file changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 923fe32..7eb9126 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> if (sd_parent) { >> int *group_imbalance = >> &sd_parent->groups->sgc->imbalance; >> - if ((env.flags & LBF_SOME_PINNED) && env.imbalance >> > 0) { >> + if ((env.flags & LBF_SOME_PINNED) && env.imbalance >> > 0) >> *group_imbalance = 1; >> - } else if (*group_imbalance) >> - *group_imbalance = 0; >> } >> /* All tasks on this runqueue were pinned by CPU affinity >> */ >> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> env.loop_break = sched_nr_migrate_break; >> goto redo; >> } >> - goto out_balanced; >> + goto out_all_pinned; >> } >> } >> @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> goto out; >> out_balanced: >> + /* >> + * We reach balance although we may have faced some affinity >> + * constraints. Clear the imbalance flag if it was set. >> + */ >> + if (sd_parent) { >> + int *group_imbalance = &sd_parent->groups->sgc->imbalance; >> + >> + if (*group_imbalance) >> + *group_imbalance = 0; >> + } >> + >> +out_all_pinned: >> + /* >> + * We reach balance because all tasks are pinned at this level so >> + * we can't migrate them. Let the imbalance flag set so parent >> level >> + * can try to migrate them. >> + */ >> schedstat_inc(sd, lb_balanced[idle]); >> sd->nr_balance_failed = 0; > >
Hi Vincent, On 7/29/14, 1:51 AM, Vincent Guittot wrote: > The imbalance flag can stay set whereas there is no imbalance. > > Let assume that we have 3 tasks that run on a dual cores /dual cluster system. > We will have some idle load balance which are triggered during tick. > Unfortunately, the tick is also used to queue background work so we can reach > the situation where short work has been queued on a CPU which already runs a > task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle > CPU) and will try to pull the waiting task on the idle CPU. The waiting task is > a worker thread that is pinned on a CPU so an imbalance due to pinned task is > detected and the imbalance flag is set. > Then, we will not be able to clear the flag because we have at most 1 task on > each CPU but the imbalance flag will trig to useless active load balance > between the idle CPU and the busy CPU. > > We need to reset of the imbalance flag as soon as we have reached a balanced > state. If all tasks are pinned, we don't consider that as a balanced state and > let the imbalance flag set. > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 923fe32..7eb9126 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq, > if (sd_parent) { > int *group_imbalance = &sd_parent->groups->sgc->imbalance; > > - if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) { > + if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) > *group_imbalance = 1; > - } else if (*group_imbalance) > - *group_imbalance = 0; As you mentioned above " We need to reset of the imbalance flag as soon as we have reached a balanced state. " I think the codes before your patch have already do this, where I miss? Great thanks for your patient. ;-) Regards, Wanpeng Li > } > > /* All tasks on this runqueue were pinned by CPU affinity */ > @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > env.loop_break = sched_nr_migrate_break; > goto redo; > } > - goto out_balanced; > + goto out_all_pinned; > } > } > > @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq, > goto out; > > out_balanced: > + /* > + * We reach balance although we may have faced some affinity > + * constraints. Clear the imbalance flag if it was set. > + */ > + if (sd_parent) { > + int *group_imbalance = &sd_parent->groups->sgc->imbalance; > + > + if (*group_imbalance) > + *group_imbalance = 0; > + } > + > +out_all_pinned: > + /* > + * We reach balance because all tasks are pinned at this level so > + * we can't migrate them. Let the imbalance flag set so parent level > + * can try to migrate them. > + */ > schedstat_inc(sd, lb_balanced[idle]); > > sd->nr_balance_failed = 0;
On 25 November 2014 at 00:47, Wanpeng Li <kernellwp@gmail.com> wrote: > Hi Vincent, > On 7/29/14, 1:51 AM, Vincent Guittot wrote: >> >> The imbalance flag can stay set whereas there is no imbalance. >> >> Let assume that we have 3 tasks that run on a dual cores /dual cluster >> system. >> We will have some idle load balance which are triggered during tick. >> Unfortunately, the tick is also used to queue background work so we can >> reach >> the situation where short work has been queued on a CPU which already runs >> a >> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an >> idle >> CPU) and will try to pull the waiting task on the idle CPU. The waiting >> task is >> a worker thread that is pinned on a CPU so an imbalance due to pinned task >> is >> detected and the imbalance flag is set. >> Then, we will not be able to clear the flag because we have at most 1 task >> on >> each CPU but the imbalance flag will trig to useless active load balance >> between the idle CPU and the busy CPU. >> >> We need to reset of the imbalance flag as soon as we have reached a >> balanced >> state. If all tasks are pinned, we don't consider that as a balanced state >> and >> let the imbalance flag set. >> >> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> --- >> kernel/sched/fair.c | 23 +++++++++++++++++++---- >> 1 file changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 923fe32..7eb9126 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> if (sd_parent) { >> int *group_imbalance = >> &sd_parent->groups->sgc->imbalance; >> - if ((env.flags & LBF_SOME_PINNED) && env.imbalance >> > 0) { >> + if ((env.flags & LBF_SOME_PINNED) && env.imbalance >> > 0) >> *group_imbalance = 1; >> - } else if (*group_imbalance) >> - *group_imbalance = 0; > > > As you mentioned above " We need to reset of the imbalance flag as soon as > we have reached a balanced state. " I think the codes before your patch have > already do this, where I miss? Great thanks for your patient. ;-) The previous code was called only when busiest->nr_running > 1. The background activity will be on the rq only 1 tick per few seconds and we will set qroup_imbalance when the background activity is on the rq. Then, during the next load balances, the qroup_imbalance is still set but we can't clear qroup_imbalance because we have only 1 task per rq Regards, Vincent > > Regards, > Wanpeng Li > > >> } >> /* All tasks on this runqueue were pinned by CPU affinity >> */ >> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> env.loop_break = sched_nr_migrate_break; >> goto redo; >> } >> - goto out_balanced; >> + goto out_all_pinned; >> } >> } >> @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> goto out; >> out_balanced: >> + /* >> + * We reach balance although we may have faced some affinity >> + * constraints. Clear the imbalance flag if it was set. >> + */ >> + if (sd_parent) { >> + int *group_imbalance = &sd_parent->groups->sgc->imbalance; >> + >> + if (*group_imbalance) >> + *group_imbalance = 0; >> + } >> + >> +out_all_pinned: >> + /* >> + * We reach balance because all tasks are pinned at this level so >> + * we can't migrate them. Let the imbalance flag set so parent >> level >> + * can try to migrate them. >> + */ >> schedstat_inc(sd, lb_balanced[idle]); >> sd->nr_balance_failed = 0; > >
Hi Vincent, On 11/25/14, 5:04 PM, Vincent Guittot wrote: > On 25 November 2014 at 00:47, Wanpeng Li <kernellwp@gmail.com> wrote: >> Hi Vincent, >> On 7/29/14, 1:51 AM, Vincent Guittot wrote: >>> The imbalance flag can stay set whereas there is no imbalance. >>> >>> Let assume that we have 3 tasks that run on a dual cores /dual cluster >>> system. >>> We will have some idle load balance which are triggered during tick. >>> Unfortunately, the tick is also used to queue background work so we can >>> reach >>> the situation where short work has been queued on a CPU which already runs >>> a >>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an >>> idle >>> CPU) and will try to pull the waiting task on the idle CPU. The waiting >>> task is >>> a worker thread that is pinned on a CPU so an imbalance due to pinned task >>> is >>> detected and the imbalance flag is set. >>> Then, we will not be able to clear the flag because we have at most 1 task >>> on >>> each CPU but the imbalance flag will trig to useless active load balance >>> between the idle CPU and the busy CPU. >>> >>> We need to reset of the imbalance flag as soon as we have reached a >>> balanced >>> state. If all tasks are pinned, we don't consider that as a balanced state >>> and >>> let the imbalance flag set. >>> >>> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> >>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >>> --- >>> kernel/sched/fair.c | 23 +++++++++++++++++++---- >>> 1 file changed, 19 insertions(+), 4 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 923fe32..7eb9126 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq >>> *this_rq, >>> if (sd_parent) { >>> int *group_imbalance = >>> &sd_parent->groups->sgc->imbalance; >>> - if ((env.flags & LBF_SOME_PINNED) && env.imbalance >>>> 0) { >>> + if ((env.flags & LBF_SOME_PINNED) && env.imbalance >>>> 0) >>> *group_imbalance = 1; >>> - } else if (*group_imbalance) >>> - *group_imbalance = 0; >> >> As you mentioned above " We need to reset of the imbalance flag as soon as >> we have reached a balanced state. " I think the codes before your patch have >> already do this, where I miss? Great thanks for your patient. ;-) > The previous code was called only when busiest->nr_running > 1. The > background activity will be on the rq only 1 tick per few seconds and > we will set qroup_imbalance when the background activity is on the rq. > Then, during the next load balances, the qroup_imbalance is still set > but we can't clear qroup_imbalance because we have only 1 task per rq There is no load balance I think since busiest->nr_running > 1 is not true even if the patch is not applied. Regards, Wanpeng Li > > Regards, > Vincent > >> Regards, >> Wanpeng Li >> >> >>> } >>> /* All tasks on this runqueue were pinned by CPU affinity >>> */ >>> @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq >>> *this_rq, >>> env.loop_break = sched_nr_migrate_break; >>> goto redo; >>> } >>> - goto out_balanced; >>> + goto out_all_pinned; >>> } >>> } >>> @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq >>> *this_rq, >>> goto out; >>> out_balanced: >>> + /* >>> + * We reach balance although we may have faced some affinity >>> + * constraints. Clear the imbalance flag if it was set. >>> + */ >>> + if (sd_parent) { >>> + int *group_imbalance = &sd_parent->groups->sgc->imbalance; >>> + >>> + if (*group_imbalance) >>> + *group_imbalance = 0; >>> + } >>> + >>> +out_all_pinned: >>> + /* >>> + * We reach balance because all tasks are pinned at this level so >>> + * we can't migrate them. Let the imbalance flag set so parent >>> level >>> + * can try to migrate them. >>> + */ >>> schedstat_inc(sd, lb_balanced[idle]); >>> sd->nr_balance_failed = 0; >>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 923fe32..7eb9126 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6672,10 +6672,8 @@ static int load_balance(int this_cpu, struct rq *this_rq, if (sd_parent) { int *group_imbalance = &sd_parent->groups->sgc->imbalance; - if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) { + if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) *group_imbalance = 1; - } else if (*group_imbalance) - *group_imbalance = 0; } /* All tasks on this runqueue were pinned by CPU affinity */ @@ -6686,7 +6684,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.loop_break = sched_nr_migrate_break; goto redo; } - goto out_balanced; + goto out_all_pinned; } } @@ -6760,6 +6758,23 @@ static int load_balance(int this_cpu, struct rq *this_rq, goto out; out_balanced: + /* + * We reach balance although we may have faced some affinity + * constraints. Clear the imbalance flag if it was set. + */ + if (sd_parent) { + int *group_imbalance = &sd_parent->groups->sgc->imbalance; + + if (*group_imbalance) + *group_imbalance = 0; + } + +out_all_pinned: + /* + * We reach balance because all tasks are pinned at this level so + * we can't migrate them. Let the imbalance flag set so parent level + * can try to migrate them. + */ schedstat_inc(sd, lb_balanced[idle]); sd->nr_balance_failed = 0;