diff mbox series

[RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation

Message ID 20180917230846.31027-1-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation | expand

Commit Message

Roman Gushchin Sept. 17, 2018, 11:10 p.m. UTC
The memcg OOM killer is never invoked due to a failed high-order
allocation, however the MEMCG_OOM event can be raised.

As shown below, it can happen under conditions, which are very
far from a real OOM: e.g. there is plenty of clean pagecache
and low memory pressure.

There is no sense in raising an OOM event in such a case,
as it might confuse a user and lead to wrong and excessive actions.

Let's look at the charging path in try_caharge(). If the memory usage
is about memory.max, which is absolutely natural for most memory cgroups,
we try to reclaim some pages. Even if we were able to reclaim
enough memory for the allocation, the following check can fail due to
a race with another concurrent allocation:

    if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
        goto retry;

For regular pages the following condition will save us from triggering
the OOM:

   if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
       goto retry;

But for high-order allocation this condition will intentionally fail.
The reason behind is that we'll likely fall to regular pages anyway,
so it's ok and even preferred to return ENOMEM.

In this case the idea of raising MEMCG_OOM looks dubious.

Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after
allocation order check, so that the event won't be raised for high
order allocations. This change doesn't affect regular pages allocation
and charging.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Roman Gushchin Sept. 25, 2018, 3:58 p.m. UTC | #1
On Mon, Sep 17, 2018 at 04:08:46PM -0700, Roman Gushchin wrote:
> The memcg OOM killer is never invoked due to a failed high-order
> allocation, however the MEMCG_OOM event can be raised.
> 
> As shown below, it can happen under conditions, which are very
> far from a real OOM: e.g. there is plenty of clean pagecache
> and low memory pressure.
> 
> There is no sense in raising an OOM event in such a case,
> as it might confuse a user and lead to wrong and excessive actions.
> 
> Let's look at the charging path in try_caharge(). If the memory usage
> is about memory.max, which is absolutely natural for most memory cgroups,
> we try to reclaim some pages. Even if we were able to reclaim
> enough memory for the allocation, the following check can fail due to
> a race with another concurrent allocation:
> 
>     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>         goto retry;
> 
> For regular pages the following condition will save us from triggering
> the OOM:
> 
>    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
>        goto retry;
> 
> But for high-order allocation this condition will intentionally fail.
> The reason behind is that we'll likely fall to regular pages anyway,
> so it's ok and even preferred to return ENOMEM.
> 
> In this case the idea of raising MEMCG_OOM looks dubious.
> 
> Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after
> allocation order check, so that the event won't be raised for high
> order allocations. This change doesn't affect regular pages allocation
> and charging.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>

I've tried to address all concerns and questions in the updated
changelog, so, hopefully, now it's clear why do we need this change.

Are there any comments, thoughts or objections left?

Thanks!
Michal Hocko Sept. 25, 2018, 6:58 p.m. UTC | #2
On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> The memcg OOM killer is never invoked due to a failed high-order
> allocation, however the MEMCG_OOM event can be raised.
> 
> As shown below, it can happen under conditions, which are very
> far from a real OOM: e.g. there is plenty of clean pagecache
> and low memory pressure.
> 
> There is no sense in raising an OOM event in such a case,
> as it might confuse a user and lead to wrong and excessive actions.
> 
> Let's look at the charging path in try_caharge(). If the memory usage
> is about memory.max, which is absolutely natural for most memory cgroups,
> we try to reclaim some pages. Even if we were able to reclaim
> enough memory for the allocation, the following check can fail due to
> a race with another concurrent allocation:
> 
>     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>         goto retry;
> 
> For regular pages the following condition will save us from triggering
> the OOM:
> 
>    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
>        goto retry;
> 
> But for high-order allocation this condition will intentionally fail.
> The reason behind is that we'll likely fall to regular pages anyway,
> so it's ok and even preferred to return ENOMEM.
> 
> In this case the idea of raising MEMCG_OOM looks dubious.

I would really appreciate an example of application that would get
confused by consuming this event and an explanation why. I do agree that
the event itself is kinda weird because it doesn't give you any context
for what kind of requests the memcg is OOM. Costly orders are a little
different story than others and users shouldn't care about this because
this is a mere implementation detail.

In other words, do we have any users to actually care about this half
baked event at all? Shouldn't we simply stop emiting it (or make it an
alias of OOM_KILL) rather than making it slightly better but yet kinda
incomplete?

Jeez, we really suck at defining proper interfaces. Things seem so cool
when they are proposed, then those users come and ruin our lives...
Roman Gushchin Sept. 26, 2018, 8:13 a.m. UTC | #3
On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote:
> On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> > The memcg OOM killer is never invoked due to a failed high-order
> > allocation, however the MEMCG_OOM event can be raised.
> > 
> > As shown below, it can happen under conditions, which are very
> > far from a real OOM: e.g. there is plenty of clean pagecache
> > and low memory pressure.
> > 
> > There is no sense in raising an OOM event in such a case,
> > as it might confuse a user and lead to wrong and excessive actions.
> > 
> > Let's look at the charging path in try_caharge(). If the memory usage
> > is about memory.max, which is absolutely natural for most memory cgroups,
> > we try to reclaim some pages. Even if we were able to reclaim
> > enough memory for the allocation, the following check can fail due to
> > a race with another concurrent allocation:
> > 
> >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> >         goto retry;
> > 
> > For regular pages the following condition will save us from triggering
> > the OOM:
> > 
> >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> >        goto retry;
> > 
> > But for high-order allocation this condition will intentionally fail.
> > The reason behind is that we'll likely fall to regular pages anyway,
> > so it's ok and even preferred to return ENOMEM.
> > 
> > In this case the idea of raising MEMCG_OOM looks dubious.
> 
> I would really appreciate an example of application that would get
> confused by consuming this event and an explanation why. I do agree that
> the event itself is kinda weird because it doesn't give you any context
> for what kind of requests the memcg is OOM. Costly orders are a little
> different story than others and users shouldn't care about this because
> this is a mere implementation detail.

Our container management system (called Tupperware) used the OOM event
as a signal that a workload might be affected by the OOM killer, so
it restarted the corresponding container.

I started looking at this problem, when I was reported, that it sometimes
happens when there is a plenty of inactive page cache, and also there were
no signs that the OOM killer has been invoking at all.
The proposed patch resolves this problem.

> 
> In other words, do we have any users to actually care about this half
> baked event at all? Shouldn't we simply stop emiting it (or make it an
> alias of OOM_KILL) rather than making it slightly better but yet kinda
> incomplete?

The only problem with OOM_KILL I see is that OOM_KILL might not be raised
at all, if the OOM killer is not able to find an appropriate victim.
For instance, if all tasks are oom protected (oom_score_adj set to -1000).
Also, with oom.group it might be raised multiple times.

I'm not against an idea to deprecate it in some way (neither I'm completely
sold), but in any case the proposed change isn't a step into a wrong direction:
under the conditions I described there is absolutely no sense in raising
the OOM event.

> 
> Jeez, we really suck at defining proper interfaces. Things seem so cool
> when they are proposed, then those users come and ruin our lives...

:)


Thanks!
Michal Hocko Sept. 26, 2018, 8:24 a.m. UTC | #4
On Wed 26-09-18 09:13:43, Roman Gushchin wrote:
> On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote:
> > On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> > > The memcg OOM killer is never invoked due to a failed high-order
> > > allocation, however the MEMCG_OOM event can be raised.
> > > 
> > > As shown below, it can happen under conditions, which are very
> > > far from a real OOM: e.g. there is plenty of clean pagecache
> > > and low memory pressure.
> > > 
> > > There is no sense in raising an OOM event in such a case,
> > > as it might confuse a user and lead to wrong and excessive actions.
> > > 
> > > Let's look at the charging path in try_caharge(). If the memory usage
> > > is about memory.max, which is absolutely natural for most memory cgroups,
> > > we try to reclaim some pages. Even if we were able to reclaim
> > > enough memory for the allocation, the following check can fail due to
> > > a race with another concurrent allocation:
> > > 
> > >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> > >         goto retry;
> > > 
> > > For regular pages the following condition will save us from triggering
> > > the OOM:
> > > 
> > >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> > >        goto retry;
> > > 
> > > But for high-order allocation this condition will intentionally fail.
> > > The reason behind is that we'll likely fall to regular pages anyway,
> > > so it's ok and even preferred to return ENOMEM.
> > > 
> > > In this case the idea of raising MEMCG_OOM looks dubious.
> > 
> > I would really appreciate an example of application that would get
> > confused by consuming this event and an explanation why. I do agree that
> > the event itself is kinda weird because it doesn't give you any context
> > for what kind of requests the memcg is OOM. Costly orders are a little
> > different story than others and users shouldn't care about this because
> > this is a mere implementation detail.
> 
> Our container management system (called Tupperware) used the OOM event
> as a signal that a workload might be affected by the OOM killer, so
> it restarted the corresponding container.
> 
> I started looking at this problem, when I was reported, that it sometimes
> happens when there is a plenty of inactive page cache, and also there were
> no signs that the OOM killer has been invoking at all.
> The proposed patch resolves this problem.

Thanks! This is exactly the kind of information that should be in the
changelog. With the changelog updated and an explicit note in the
documentation that the event is triggered only when the memcg is _going_
to consider the oom killer as the only option you can add

Acked-by: Michal Hocko <mhocko@suse.com>

> > In other words, do we have any users to actually care about this half
> > baked event at all? Shouldn't we simply stop emiting it (or make it an
> > alias of OOM_KILL) rather than making it slightly better but yet kinda
> > incomplete?
> 
> The only problem with OOM_KILL I see is that OOM_KILL might not be raised
> at all, if the OOM killer is not able to find an appropriate victim.
> For instance, if all tasks are oom protected (oom_score_adj set to -1000).

This is a very good point.
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fcec9b39e2a3..103ca3c31c04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1669,6 +1669,8 @@  static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 	if (order > PAGE_ALLOC_COSTLY_ORDER)
 		return OOM_SKIPPED;
 
+	memcg_memory_event(memcg, MEMCG_OOM);
+
 	/*
 	 * We are in the middle of the charge context here, so we
 	 * don't want to block when potentially sitting on a callstack
@@ -2250,8 +2252,6 @@  static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (fatal_signal_pending(current))
 		goto force;
 
-	memcg_memory_event(mem_over_limit, MEMCG_OOM);
-
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer