Message ID | 20171003021519.23907-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon 02-10-17 19:15:19, Shakeel Butt wrote: > The user space application can directly trigger the allocations > from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious > application can consume a significant amount of system memory by > triggering such allocations. Indeed we have seen in production > where a buggy application was leaking the epoll references and > causing a burst of eventpoll_epi and eventpoll_pwq slab > allocations. This patch opt-in the charging of eventpoll_epi > and eventpoll_pwq slabs. I am not objecting to the patch I would just like to understand the runaway case. ep_insert seems to limit the maximum number of watches to max_user_watches which should be ~4% of lowmem if I am following the code properly. pwq_cache should be bound by the number of watches as well, or am I misunderstanding the code? > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > fs/eventpoll.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 2fabd19cdeea..a45360444895 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void) > > /* Allocates slab cache used to allocate "struct epitem" items */ > epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem), > - 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); > + 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); > > /* Allocates slab cache used to allocate "struct eppoll_entry" */ > pwq_cache = kmem_cache_create("eventpoll_pwq", > - sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL); > + sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); > > return 0; > } > -- > 2.14.2.822.g60be5d43e6-goog
> > I am not objecting to the patch I would just like to understand the > runaway case. ep_insert seems to limit the maximum number of watches to > max_user_watches which should be ~4% of lowmem if I am following the > code properly. pwq_cache should be bound by the number of watches as > well, or am I misunderstanding the code? > You are absolutely right that there is a per-user limit (~4% of total memory if no highmem) on these caches. I think it is too generous particularly in the scenario where jobs of multiple users are running on the system and the administrator is reducing cost by overcomitting the memory. This is unaccounted kernel memory and will not be considered by the oom-killer. I think by accounting it to kmemcg, for systems with kmem accounting enabled, we can provide better isolation between jobs of different users.
On Wed 04-10-17 12:33:14, Shakeel Butt wrote: > > > > I am not objecting to the patch I would just like to understand the > > runaway case. ep_insert seems to limit the maximum number of watches to > > max_user_watches which should be ~4% of lowmem if I am following the > > code properly. pwq_cache should be bound by the number of watches as > > well, or am I misunderstanding the code? > > > > You are absolutely right that there is a per-user limit (~4% of total > memory if no highmem) on these caches. I think it is too generous > particularly in the scenario where jobs of multiple users are running > on the system and the administrator is reducing cost by overcomitting > the memory. This is unaccounted kernel memory and will not be > considered by the oom-killer. I think by accounting it to kmemcg, for > systems with kmem accounting enabled, we can provide better isolation > between jobs of different users. Thanks for the clarification. For some reason I didn't figure that the limit is per user, even though the name suggests so.
On Thu 05-10-17 10:21:18, Michal Hocko wrote: > On Wed 04-10-17 12:33:14, Shakeel Butt wrote: > > > > > > I am not objecting to the patch I would just like to understand the > > > runaway case. ep_insert seems to limit the maximum number of watches to > > > max_user_watches which should be ~4% of lowmem if I am following the > > > code properly. pwq_cache should be bound by the number of watches as > > > well, or am I misunderstanding the code? > > > > > > > You are absolutely right that there is a per-user limit (~4% of total > > memory if no highmem) on these caches. I think it is too generous > > particularly in the scenario where jobs of multiple users are running > > on the system and the administrator is reducing cost by overcomitting > > the memory. This is unaccounted kernel memory and will not be > > considered by the oom-killer. I think by accounting it to kmemcg, for > > systems with kmem accounting enabled, we can provide better isolation > > between jobs of different users. > > Thanks for the clarification. For some reason I didn't figure that the > limit is per user, even though the name suggests so. Completely forgot to add Acked-by: Michal Hocko <mhocko@suse.com>
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 2fabd19cdeea..a45360444895 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2329,11 +2329,11 @@ static int __init eventpoll_init(void) /* Allocates slab cache used to allocate "struct epitem" items */ epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem), - 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); + 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); /* Allocates slab cache used to allocate "struct eppoll_entry" */ pwq_cache = kmem_cache_create("eventpoll_pwq", - sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL); + sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); return 0; }
The user space application can directly trigger the allocations from eventpoll_epi and eventpoll_pwq slabs. A buggy or malicious application can consume a significant amount of system memory by triggering such allocations. Indeed we have seen in production where a buggy application was leaking the epoll references and causing a burst of eventpoll_epi and eventpoll_pwq slab allocations. This patch opt-in the charging of eventpoll_epi and eventpoll_pwq slabs. Signed-off-by: Shakeel Butt <shakeelb@google.com> --- fs/eventpoll.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)