epoll: account epitem and eppoll_entry to kmemcg

Message ID	20171003021519.23907-1-shakeelb@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Shakeel Butt <shakeelb@google.com> To: Alexander Viro <viro@zeniv.linux.org.uk>, Vladimir Davydov <vdavydov.dev@gmail.com>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Greg Thelen <gthelen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Shakeel Butt <shakeelb@google.com> Subject: [PATCH] epoll: account epitem and eppoll_entry to kmemcg Date: Mon, 2 Oct 2017 19:15:19 -0700 Message-Id: <20171003021519.23907-1-shakeelb@google.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

20171003021519.23907-1-shakeelb@google.com (mailing list archive)

State

New, archived

Headers

From: Shakeel Butt <shakeelb@google.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>,
	Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeelb@google.com>
Subject: [PATCH] epoll: account epitem and eppoll_entry to kmemcg
Date: Mon,  2 Oct 2017 19:15:19 -0700
Message-Id: <20171003021519.23907-1-shakeelb@google.com>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Shakeel Butt Oct. 3, 2017, 2:15 a.m. UTC

The user space application can directly trigger the allocations
from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
application can consume a significant amount of system memory by
triggering such allocations. Indeed we have seen in production
where a buggy application was leaking the epoll references and
causing a burst of eventpoll_epi and eventpoll_pwq slab
allocations. This patch opt-in the charging of eventpoll_epi
and eventpoll_pwq slabs.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 fs/eventpoll.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Michal Hocko Oct. 4, 2017, 1:17 p.m. UTC | #1

On Mon 02-10-17 19:15:19, Shakeel Butt wrote:
> The user space application can directly trigger the allocations
> from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious
> application can consume a significant amount of system memory by
> triggering such allocations. Indeed we have seen in production
> where a buggy application was leaking the epoll references and
> causing a burst of eventpoll_epi and eventpoll_pwq slab
> allocations. This patch opt-in the charging of eventpoll_epi
> and eventpoll_pwq slabs.

I am not objecting to the patch I would just like to understand the
runaway case. ep_insert seems to limit the maximum number of watches to
max_user_watches which should be ~4% of lowmem if I am following the
code properly. pwq_cache should be bound by the number of watches as
well, or am I misunderstanding the code?

> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  fs/eventpoll.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 2fabd19cdeea..a45360444895 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void)
>  
>  	/* Allocates slab cache used to allocate "struct epitem" items */
>  	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
> -			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> +			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	/* Allocates slab cache used to allocate "struct eppoll_entry" */
>  	pwq_cache = kmem_cache_create("eventpoll_pwq",
> -			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
> +		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
>  	return 0;
>  }
> -- 
> 2.14.2.822.g60be5d43e6-goog

Shakeel Butt Oct. 4, 2017, 7:33 p.m. UTC | #2

>
> I am not objecting to the patch I would just like to understand the
> runaway case. ep_insert seems to limit the maximum number of watches to
> max_user_watches which should be ~4% of lowmem if I am following the
> code properly. pwq_cache should be bound by the number of watches as
> well, or am I misunderstanding the code?
>

You are absolutely right that there is a per-user limit (~4% of total
memory if no highmem) on these caches. I think it is too generous
particularly in the scenario where jobs of multiple users are running
on the system and the administrator is reducing cost by overcomitting
the memory. This is unaccounted kernel memory and will not be
considered by the oom-killer. I think by accounting it to kmemcg, for
systems with kmem accounting enabled, we can provide better isolation
between jobs of different users.

Michal Hocko Oct. 5, 2017, 8:21 a.m. UTC | #3

On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> >
> > I am not objecting to the patch I would just like to understand the
> > runaway case. ep_insert seems to limit the maximum number of watches to
> > max_user_watches which should be ~4% of lowmem if I am following the
> > code properly. pwq_cache should be bound by the number of watches as
> > well, or am I misunderstanding the code?
> >
> 
> You are absolutely right that there is a per-user limit (~4% of total
> memory if no highmem) on these caches. I think it is too generous
> particularly in the scenario where jobs of multiple users are running
> on the system and the administrator is reducing cost by overcomitting
> the memory. This is unaccounted kernel memory and will not be
> considered by the oom-killer. I think by accounting it to kmemcg, for
> systems with kmem accounting enabled, we can provide better isolation
> between jobs of different users.

Thanks for the clarification. For some reason I didn't figure that the
limit is per user, even though the name suggests so.

Michal Hocko Oct. 6, 2017, 7:48 a.m. UTC | #4

On Thu 05-10-17 10:21:18, Michal Hocko wrote:
> On Wed 04-10-17 12:33:14, Shakeel Butt wrote:
> > >
> > > I am not objecting to the patch I would just like to understand the
> > > runaway case. ep_insert seems to limit the maximum number of watches to
> > > max_user_watches which should be ~4% of lowmem if I am following the
> > > code properly. pwq_cache should be bound by the number of watches as
> > > well, or am I misunderstanding the code?
> > >
> > 
> > You are absolutely right that there is a per-user limit (~4% of total
> > memory if no highmem) on these caches. I think it is too generous
> > particularly in the scenario where jobs of multiple users are running
> > on the system and the administrator is reducing cost by overcomitting
> > the memory. This is unaccounted kernel memory and will not be
> > considered by the oom-killer. I think by accounting it to kmemcg, for
> > systems with kmem accounting enabled, we can provide better isolation
> > between jobs of different users.
> 
> Thanks for the clarification. For some reason I didn't figure that the
> limit is per user, even though the name suggests so.

Completely forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19cdeea..a45360444895 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2329,11 +2329,11 @@  static int __init eventpoll_init(void)
 
 	/* Allocates slab cache used to allocate "struct epitem" items */
 	epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
-			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	/* Allocates slab cache used to allocate "struct eppoll_entry" */
 	pwq_cache = kmem_cache_create("eventpoll_pwq",
-			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
+		sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
 	return 0;
 }

epoll: account epitem and eppoll_entry to kmemcg

Commit Message

Comments

Patch