Message ID | 20220514075223.GA11384@pc (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/oom_kill: break evaluation when a task has been selected | expand |
On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote: > oom points no longer need to be calculated if a task is oom_task_origin(), > so return 1 to stop the oom_evaluate_task(). This doesn't really explain why this is really desired. Is this a fix, optimization? Please also note that this change has some side effects. For one, the task marked as oom origin will get killed even if there is still a pending oom victim which hasn't been fully dismantled. Is this intentional? > Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com> > --- > mm/oom_kill.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 3996301450e8..b407fba21d19 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -308,7 +308,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) > static int oom_evaluate_task(struct task_struct *task, void *arg) > { > struct oom_control *oc = arg; > - long points; > + long points = 0; > > if (oom_unkillable_task(task)) > goto next; > @@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) > oc->chosen = task; > oc->chosen_points = points; > next: > - return 0; > + return points == LONG_MAX; > abort: > if (oc->chosen) > put_task_struct(oc->chosen); > -- > 2.17.1
On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote: > On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote: >> oom points no longer need to be calculated if a task is oom_task_origin(), >> so return 1 to stop the oom_evaluate_task(). > > This doesn't really explain why this is really desired. Is this a fix, > optimization? > > Please also note that this change has some side effects. For one, the > task marked as oom origin will get killed even if there is still a > pending oom victim which hasn't been fully dismantled. Is this > intentional? Thank you very much for reminding. From my point of view, the victim was marked in the last oom, and now it has entered the oom again, which means that the system still has no deprecated memory available. In order to ensure that the system can return to normal as soon as possible, killing the origin task immediately should be A good choice, and the role of this patch is to end oom_evaluate_task and return true as soon as the origin task is found. Maybe this patch is not the optimal solution, it has a trade-off. It is an honor to discuss with you, thank you very much! zackary >> Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com> >> --- >> mm/oom_kill.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 3996301450e8..b407fba21d19 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -308,7 +308,7 @@ static enum oom_constraint >> constrained_alloc(struct oom_control *oc) >> static int oom_evaluate_task(struct task_struct *task, void *arg) >> { >> struct oom_control *oc = arg; >> - long points; >> + long points = 0; >> >> if (oom_unkillable_task(task)) >> goto next; >> @@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct >> *task, void *arg) >> oc->chosen = task; >> oc->chosen_points = points; >> next: >> - return 0; >> + return points == LONG_MAX; >> abort: >> if (oc->chosen) >> put_task_struct(oc->chosen); >> -- >> 2.17.1 > > -- > Michal Hocko > SUSE Labs >
On Sat 04-06-22 18:35:19, Zackary Liu wrote: > > On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote: > > > On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote: > >> oom points no longer need to be calculated if a task is oom_task_origin(), > >> so return 1 to stop the oom_evaluate_task(). > > > > This doesn't really explain why this is really desired. Is this a fix, > > optimization? > > > > Please also note that this change has some side effects. For one, the > > task marked as oom origin will get killed even if there is still a > > pending oom victim which hasn't been fully dismantled. Is this > > intentional? > > Thank you very much for reminding. > > From my point of view, the victim was marked in the last oom, and now it > has entered the oom again, which means that the system still has no > deprecated memory available. This is not an unusual situation. OOM victims can take some time to die and release their memory. The oom_reaper is there to fast forward that process and guarantee a forward progress. But this can still take some time. Our general policy is to back off when there is an alive oom victim encountered. Have a look at the tsk_is_oom_victim test in oom_evaluate_task. For that heuristic to be effective the whole task list (wether the global one or memcg) has to be evaluated. > In order to ensure that the system can > return to normal as soon as possible, killing the origin task > immediately should be A good choice, and the role of this patch is to > end oom_evaluate_task and return true as soon as the origin task is found. Could you be more specific how does this patch guarantees a forward progress? What is the actual usecase that benefits from this change? These are all important information for future reference. Please note I am not saying the patch is wrong. I just still do not see why it is useful. > Maybe this patch is not the optimal solution, it has a trade-off. If there are trade-offs, please document them in the changelog. The way I see it is that oom_task_origin heuristic has been introduced to help killing swapoff operation because the swapped out memory doesn't fit into memory. This is a very reasonable thing to do in general but it also represents an early failure visible to the userspace. If there is a pre-existing oom victim then I would argue that we should try to avoid the failure.
On Mon 06-06-22 10:33:30, Michal Hocko wrote: > On Sat 04-06-22 18:35:19, Zackary Liu wrote: > > > > On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote: > > > > > On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote: > > >> oom points no longer need to be calculated if a task is oom_task_origin(), > > >> so return 1 to stop the oom_evaluate_task(). > > > > > > This doesn't really explain why this is really desired. Is this a fix, > > > optimization? > > > > > > Please also note that this change has some side effects. For one, the > > > task marked as oom origin will get killed even if there is still a > > > pending oom victim which hasn't been fully dismantled. Is this > > > intentional? > > > > Thank you very much for reminding. > > > > From my point of view, the victim was marked in the last oom, and now it > > has entered the oom again, which means that the system still has no > > deprecated memory available. > > This is not an unusual situation. OOM victims can take some time to die > and release their memory. The oom_reaper is there to fast forward that > process and guarantee a forward progress. But this can still take some > time. Our general policy is to back off when there is an alive oom > victim encountered. Have a look at the tsk_is_oom_victim test in > oom_evaluate_task. For that heuristic to be effective the whole task > list (wether the global one or memcg) has to be evaluated. > > > In order to ensure that the system can > > return to normal as soon as possible, killing the origin task > > immediately should be A good choice, and the role of this patch is to > > end oom_evaluate_task and return true as soon as the origin task is found. > > Could you be more specific how does this patch guarantees a forward > progress? What is the actual usecase that benefits from this change? > > These are all important information for future reference. Please note I > am not saying the patch is wrong. I just still do not see why it is > useful. > > > Maybe this patch is not the optimal solution, it has a trade-off. > > If there are trade-offs, please document them in the changelog. > > The way I see it is that oom_task_origin heuristic has been introduced > to help killing swapoff operation because the swapped out memory doesn't > fit into memory. This is a very reasonable thing to do in general but it > also represents an early failure visible to the userspace. If there is a > pre-existing oom victim then I would argue that we should try to avoid > the failure. Andrew, please drop this patch from your tree. I do not see any real justification here. Thanks!
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 3996301450e8..b407fba21d19 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -308,7 +308,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) static int oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; - long points; + long points = 0; if (oom_unkillable_task(task)) goto next; @@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) oc->chosen = task; oc->chosen_points = points; next: - return 0; + return points == LONG_MAX; abort: if (oc->chosen) put_task_struct(oc->chosen);
oom points no longer need to be calculated if a task is oom_task_origin(), so return 1 to stop the oom_evaluate_task(). Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com> --- mm/oom_kill.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)