Message ID | 1574914633-2020-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm, memcg: fix the stupid OOM killer when shrinking memcg hard limit | expand |
> On Nov 27, 2019, at 11:17 PM, Yafang Shao <laoar.shao@gmail.com> wrote: > > When there are no more processes in a memcg (e.g., due to OOM > group), we can still have file pages in the page cache. > > If these pages are protected by memory.min, they can't be reclaimed. > Especially if there won't be another process in this memcg and the memcg > is kept online, we do want to drop these pages from the page cache. > > By dropping these page caches we can avoid reclaimers (e.g., kswapd or > direct) to scan and reclaim pages from all memcgs in the system - > because the reclaimers will try to fairly reclaim pages from all memcgs > in the system when under memory pressure. > > By setting the hard limit of such a memcg to 0, we allow to drop the > page cache of such memcgs. Unfortunately, this may invoke the OOM killer > and generate a lot of output. The OOM output is not expected by an admin > who wants to drop these caches and knows that there are no processes in > this memcg anymore. > > Therefore, if a memcg is not populated, we should not invoke the OOM > killer - there is nothing to kill. The next time a new process is > started in the memcg and the "max" is still below usage, the OOM killer > will be invoked and the new process will be killed. > > [ Above commit log is contributed by David ] > > What's worse about this issue is that when there're killable tasks and the > OOM killer killed the last task, and what will happen then ? As nr_reclaims > is already 0 and drained is alreay true, the OOM killer will try to kill > nothing (because he knows he has killed the last task), what's a stupid > behavior. > > Someone may worry that the admins may not going to see that the memcg was > OOM due to the limit change. But this is not a issue, because the admins > changes the limit and then the admins must check the result of his change > - by checking memory.{max, current, stat} he can get all he wants. > > Cc: David Hildenbrand <david@redhat.com> > Nacked-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Surely too big a turkey to swallow ? — unprofessional wording and carrying a NACK from one of the maintainers. > > --- > Changes since v2: Refresh the subject and commit log. The original > subject is "mm, memcg: avoid oom if cgroup is not populated" > --- > mm/memcontrol.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1c4c08b..e936f1b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > continue; > } > > - memcg_memory_event(memcg, MEMCG_OOM); > - if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > + /* If there's no procesess, we don't need to invoke the OOM > + * killer. Then next time when you try to start a process > + * in this memcg, the max may still bellow usage, and then > + * this OOM killer will be invoked. This can be considered > + * as lazy OOM, that is we have been always doing in the > + * kernel. Pls. Michal, that is really consistency. > + */ > + if (cgroup_is_populated(memcg->css.cgroup)) { > + memcg_memory_event(memcg, MEMCG_OOM); > + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > + break; > + } else { > break; > + } > } > > memcg_wb_domain_size_changed(memcg); > -- > 1.8.3.1 > >
On Thu, Nov 28, 2019 at 12:28 PM Qian Cai <cai@lca.pw> wrote: > > > > > On Nov 27, 2019, at 11:17 PM, Yafang Shao <laoar.shao@gmail.com> wrote: > > > > When there are no more processes in a memcg (e.g., due to OOM > > group), we can still have file pages in the page cache. > > > > If these pages are protected by memory.min, they can't be reclaimed. > > Especially if there won't be another process in this memcg and the memcg > > is kept online, we do want to drop these pages from the page cache. > > > > By dropping these page caches we can avoid reclaimers (e.g., kswapd or > > direct) to scan and reclaim pages from all memcgs in the system - > > because the reclaimers will try to fairly reclaim pages from all memcgs > > in the system when under memory pressure. > > > > By setting the hard limit of such a memcg to 0, we allow to drop the > > page cache of such memcgs. Unfortunately, this may invoke the OOM killer > > and generate a lot of output. The OOM output is not expected by an admin > > who wants to drop these caches and knows that there are no processes in > > this memcg anymore. > > > > Therefore, if a memcg is not populated, we should not invoke the OOM > > killer - there is nothing to kill. The next time a new process is > > started in the memcg and the "max" is still below usage, the OOM killer > > will be invoked and the new process will be killed. > > > > [ Above commit log is contributed by David ] > > > > What's worse about this issue is that when there're killable tasks and the > > OOM killer killed the last task, and what will happen then ? As nr_reclaims > > is already 0 and drained is alreay true, the OOM killer will try to kill > > nothing (because he knows he has killed the last task), what's a stupid > > behavior. > > > > Someone may worry that the admins may not going to see that the memcg was > > OOM due to the limit change. But this is not a issue, because the admins > > changes the limit and then the admins must check the result of his change > > - by checking memory.{max, current, stat} he can get all he wants. > > > > Cc: David Hildenbrand <david@redhat.com> > > Nacked-by: Michal Hocko <mhocko@suse.com> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Surely too big a turkey to swallow ? — unprofessional wording and carrying a NACK > from one of the maintainers. > I'm sorry if there're any unprofessional wording. I just want to make the kernel more stable. Thanks Yafang
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c4c08b..e936f1b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, continue; } - memcg_memory_event(memcg, MEMCG_OOM); - if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) + /* If there's no procesess, we don't need to invoke the OOM + * killer. Then next time when you try to start a process + * in this memcg, the max may still bellow usage, and then + * this OOM killer will be invoked. This can be considered + * as lazy OOM, that is we have been always doing in the + * kernel. Pls. Michal, that is really consistency. + */ + if (cgroup_is_populated(memcg->css.cgroup)) { + memcg_memory_event(memcg, MEMCG_OOM); + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) + break; + } else { break; + } } memcg_wb_domain_size_changed(memcg);