Message ID | 20180917230846.31027-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation | expand |
On Mon, Sep 17, 2018 at 04:08:46PM -0700, Roman Gushchin wrote: > The memcg OOM killer is never invoked due to a failed high-order > allocation, however the MEMCG_OOM event can be raised. > > As shown below, it can happen under conditions, which are very > far from a real OOM: e.g. there is plenty of clean pagecache > and low memory pressure. > > There is no sense in raising an OOM event in such a case, > as it might confuse a user and lead to wrong and excessive actions. > > Let's look at the charging path in try_caharge(). If the memory usage > is about memory.max, which is absolutely natural for most memory cgroups, > we try to reclaim some pages. Even if we were able to reclaim > enough memory for the allocation, the following check can fail due to > a race with another concurrent allocation: > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > goto retry; > > For regular pages the following condition will save us from triggering > the OOM: > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > goto retry; > > But for high-order allocation this condition will intentionally fail. > The reason behind is that we'll likely fall to regular pages anyway, > so it's ok and even preferred to return ENOMEM. > > In this case the idea of raising MEMCG_OOM looks dubious. > > Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after > allocation order check, so that the event won't be raised for high > order allocations. This change doesn't affect regular pages allocation > and charging. > > Signed-off-by: Roman Gushchin <guro@fb.com> > Acked-by: David Rientjes <rientjes@google.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com> I've tried to address all concerns and questions in the updated changelog, so, hopefully, now it's clear why do we need this change. Are there any comments, thoughts or objections left? Thanks!
On Mon 17-09-18 23:10:59, Roman Gushchin wrote: > The memcg OOM killer is never invoked due to a failed high-order > allocation, however the MEMCG_OOM event can be raised. > > As shown below, it can happen under conditions, which are very > far from a real OOM: e.g. there is plenty of clean pagecache > and low memory pressure. > > There is no sense in raising an OOM event in such a case, > as it might confuse a user and lead to wrong and excessive actions. > > Let's look at the charging path in try_caharge(). If the memory usage > is about memory.max, which is absolutely natural for most memory cgroups, > we try to reclaim some pages. Even if we were able to reclaim > enough memory for the allocation, the following check can fail due to > a race with another concurrent allocation: > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > goto retry; > > For regular pages the following condition will save us from triggering > the OOM: > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > goto retry; > > But for high-order allocation this condition will intentionally fail. > The reason behind is that we'll likely fall to regular pages anyway, > so it's ok and even preferred to return ENOMEM. > > In this case the idea of raising MEMCG_OOM looks dubious. I would really appreciate an example of application that would get confused by consuming this event and an explanation why. I do agree that the event itself is kinda weird because it doesn't give you any context for what kind of requests the memcg is OOM. Costly orders are a little different story than others and users shouldn't care about this because this is a mere implementation detail. In other words, do we have any users to actually care about this half baked event at all? Shouldn't we simply stop emiting it (or make it an alias of OOM_KILL) rather than making it slightly better but yet kinda incomplete? Jeez, we really suck at defining proper interfaces. Things seem so cool when they are proposed, then those users come and ruin our lives...
On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote: > On Mon 17-09-18 23:10:59, Roman Gushchin wrote: > > The memcg OOM killer is never invoked due to a failed high-order > > allocation, however the MEMCG_OOM event can be raised. > > > > As shown below, it can happen under conditions, which are very > > far from a real OOM: e.g. there is plenty of clean pagecache > > and low memory pressure. > > > > There is no sense in raising an OOM event in such a case, > > as it might confuse a user and lead to wrong and excessive actions. > > > > Let's look at the charging path in try_caharge(). If the memory usage > > is about memory.max, which is absolutely natural for most memory cgroups, > > we try to reclaim some pages. Even if we were able to reclaim > > enough memory for the allocation, the following check can fail due to > > a race with another concurrent allocation: > > > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > > goto retry; > > > > For regular pages the following condition will save us from triggering > > the OOM: > > > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > > goto retry; > > > > But for high-order allocation this condition will intentionally fail. > > The reason behind is that we'll likely fall to regular pages anyway, > > so it's ok and even preferred to return ENOMEM. > > > > In this case the idea of raising MEMCG_OOM looks dubious. > > I would really appreciate an example of application that would get > confused by consuming this event and an explanation why. I do agree that > the event itself is kinda weird because it doesn't give you any context > for what kind of requests the memcg is OOM. Costly orders are a little > different story than others and users shouldn't care about this because > this is a mere implementation detail. Our container management system (called Tupperware) used the OOM event as a signal that a workload might be affected by the OOM killer, so it restarted the corresponding container. I started looking at this problem, when I was reported, that it sometimes happens when there is a plenty of inactive page cache, and also there were no signs that the OOM killer has been invoking at all. The proposed patch resolves this problem. > > In other words, do we have any users to actually care about this half > baked event at all? Shouldn't we simply stop emiting it (or make it an > alias of OOM_KILL) rather than making it slightly better but yet kinda > incomplete? The only problem with OOM_KILL I see is that OOM_KILL might not be raised at all, if the OOM killer is not able to find an appropriate victim. For instance, if all tasks are oom protected (oom_score_adj set to -1000). Also, with oom.group it might be raised multiple times. I'm not against an idea to deprecate it in some way (neither I'm completely sold), but in any case the proposed change isn't a step into a wrong direction: under the conditions I described there is absolutely no sense in raising the OOM event. > > Jeez, we really suck at defining proper interfaces. Things seem so cool > when they are proposed, then those users come and ruin our lives... :) Thanks!
On Wed 26-09-18 09:13:43, Roman Gushchin wrote: > On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote: > > On Mon 17-09-18 23:10:59, Roman Gushchin wrote: > > > The memcg OOM killer is never invoked due to a failed high-order > > > allocation, however the MEMCG_OOM event can be raised. > > > > > > As shown below, it can happen under conditions, which are very > > > far from a real OOM: e.g. there is plenty of clean pagecache > > > and low memory pressure. > > > > > > There is no sense in raising an OOM event in such a case, > > > as it might confuse a user and lead to wrong and excessive actions. > > > > > > Let's look at the charging path in try_caharge(). If the memory usage > > > is about memory.max, which is absolutely natural for most memory cgroups, > > > we try to reclaim some pages. Even if we were able to reclaim > > > enough memory for the allocation, the following check can fail due to > > > a race with another concurrent allocation: > > > > > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > > > goto retry; > > > > > > For regular pages the following condition will save us from triggering > > > the OOM: > > > > > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > > > goto retry; > > > > > > But for high-order allocation this condition will intentionally fail. > > > The reason behind is that we'll likely fall to regular pages anyway, > > > so it's ok and even preferred to return ENOMEM. > > > > > > In this case the idea of raising MEMCG_OOM looks dubious. > > > > I would really appreciate an example of application that would get > > confused by consuming this event and an explanation why. I do agree that > > the event itself is kinda weird because it doesn't give you any context > > for what kind of requests the memcg is OOM. Costly orders are a little > > different story than others and users shouldn't care about this because > > this is a mere implementation detail. > > Our container management system (called Tupperware) used the OOM event > as a signal that a workload might be affected by the OOM killer, so > it restarted the corresponding container. > > I started looking at this problem, when I was reported, that it sometimes > happens when there is a plenty of inactive page cache, and also there were > no signs that the OOM killer has been invoking at all. > The proposed patch resolves this problem. Thanks! This is exactly the kind of information that should be in the changelog. With the changelog updated and an explicit note in the documentation that the event is triggered only when the memcg is _going_ to consider the oom killer as the only option you can add Acked-by: Michal Hocko <mhocko@suse.com> > > In other words, do we have any users to actually care about this half > > baked event at all? Shouldn't we simply stop emiting it (or make it an > > alias of OOM_KILL) rather than making it slightly better but yet kinda > > incomplete? > > The only problem with OOM_KILL I see is that OOM_KILL might not be raised > at all, if the OOM killer is not able to find an appropriate victim. > For instance, if all tasks are oom protected (oom_score_adj set to -1000). This is a very good point.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fcec9b39e2a3..103ca3c31c04 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; + memcg_memory_event(memcg, MEMCG_OOM); + /* * We are in the middle of the charge context here, so we * don't want to block when potentially sitting on a callstack @@ -2250,8 +2252,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (fatal_signal_pending(current)) goto force; - memcg_memory_event(mem_over_limit, MEMCG_OOM); - /* * keep retrying as long as the memcg oom killer is able to make * a forward progress or bypass the charge if the oom killer