Message ID | 201601111226.u0BCQ1gv031473@mail.home.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jan 11, 2016 at 12:26:20PM +0000, Willy Tarreau wrote: > This patch makes it possible to enforce a per-user limit above which > new pipes will be limited to a single page, effectively limiting them > to 4 kB each. This has the effect of protecting the system against > memory abuse without hurting other users, and still allowing pipes to > work correctly though with less data at once. > > The limit is controlled by the new sysctl user-max-pipe-pages, and may > be disabled by setting it to zero. The default limit allows the default > number of FDs per process (1024) to create pipes of the default size > (64kB), thus reaching a limit of 64MB before starting to create only > smaller pipes. With 256 processes limited to 1024 FDs each, this results > in 1024*64kB + (256*1024 - 1024) * 4kB = 1084 MB of memory allocated for > a user. Regarding this, I was wondering if we shouldn't go a bit further and provide two limits instead of one : a soft and a hard limit. The soft limit would be the number of pages per user above which pipes are limited to a single page (what is implemented in the current patch). The hard limit would make any pipe creation attempt fail once reached. This way it would be possible to enforce a strict limit without limiting the number of processes or FDs too aggressively. This could be done easily in alloc_pipe_info() : + if (too_many_pipe_buffers_hard(user)) + return NULL; + if (too_many_pipe_buffers(user)) pipe_bufs = 1; I'm just having a hard time imagining acceptable names for the syscalls :-/ Willy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 11, 2016 at 01:37:47PM +0100, Willy Tarreau wrote:
> I'm just having a hard time imagining acceptable names for the syscalls :-/
s/syscalls/sysctls/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Willy Tarreau wrote: > @@ -1066,7 +1094,8 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) > if (!nr_pages) > goto out; > > - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { > + if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN) && > + (size > pipe_max_size || too_many_pipe_buffers(pipe->user))) { > ret = -EPERM; > goto out; > } I think we should not check capable(CAP_SYS_ADMIN) for size > pipe_max_size case, for checking capable(CAP_SYS_ADMIN) needlessly generates audit logs and also loosens permission required for setting size > pipe_max_size. Also, I think we should not check capable(CAP_SYS_ADMIN) unless too_many_pipe_buffers(pipe->user) is true, for checking capable(CAP_SYS_ADMIN) needlessly generates audit logs. Since too_many_unix_fds() requires capable(CAP_SYS_ADMIN) || capable(CAP_SYS_ADMIN), I think what we want is something like below? if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { ret = -EPERM; goto out; } else if (too_many_pipe_buffers(pipe->user) && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { ret = -EPERM; goto out; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 12, 2016 at 01:19:00AM +0900, Tetsuo Handa wrote: > Willy Tarreau wrote: > > @@ -1066,7 +1094,8 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) > > if (!nr_pages) > > goto out; > > > > - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { > > + if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN) && > > + (size > pipe_max_size || too_many_pipe_buffers(pipe->user))) { > > ret = -EPERM; > > goto out; > > } > > I think we should not check capable(CAP_SYS_ADMIN) for size > pipe_max_size > case, for checking capable(CAP_SYS_ADMIN) needlessly generates audit logs and > also loosens permission required for setting size > pipe_max_size. > > Also, I think we should not check capable(CAP_SYS_ADMIN) unless > too_many_pipe_buffers(pipe->user) is true, for checking capable(CAP_SYS_ADMIN) > needlessly generates audit logs. > > Since too_many_unix_fds() requires capable(CAP_SYS_ADMIN) || capable(CAP_SYS_ADMIN), > I think what we want is something like below? > > if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { > ret = -EPERM; > goto out; > } else if (too_many_pipe_buffers(pipe->user) && > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { > ret = -EPERM; > goto out; > } OK that works for me. Do you have an opinion regarding my other proposal of soft vs hard limit ? Thanks, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Willy Tarreau wrote: > OK that works for me. Do you have an opinion regarding my other proposal of > soft vs hard limit ? No. Maybe further restriction should be controlled by kmem cgroup... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 88152f2..a9751b1 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs: - suid_dumpable - super-max - super-nr +- user-max-pipe-pages ============================================================== @@ -234,6 +235,15 @@ allows you to. ============================================================== +user-max-pipe-pages: + +Maximum total number of pages a non-privileged user may allocate for pipes. +When this limit is reached, new pipes will be limited to a single page in size +in order to limit total memory usage. The default value allows to allocate up +to 1024 pipes at their default size. When set to 0, no limit is enforced. + +============================================================== + aio-nr & aio-max-nr: aio-nr shows the current system-wide number of asynchronous io diff --git a/fs/pipe.c b/fs/pipe.c index 42cf8dd..cc4874b 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -38,6 +38,9 @@ unsigned int pipe_max_size = 1048576; */ unsigned int pipe_min_size = PAGE_SIZE; +/* Maximum allocatable pages per user, matches default values */ +unsigned long user_max_pipe_pages = PIPE_DEF_BUFFERS * INR_OPEN_CUR; + /* * We use a start+len construction, which provides full use of the * allocated memory. @@ -583,20 +587,41 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } +static void account_pipe_buffers(struct pipe_inode_info *pipe, + unsigned long old, unsigned long new) +{ + atomic_long_add(new - old, &pipe->user->pipe_bufs); +} + +static bool too_many_pipe_buffers(struct user_struct *user) +{ + return user_max_pipe_pages && + atomic_long_read(&user->pipe_bufs) >= user_max_pipe_pages; +} + struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); if (pipe) { - pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL); + unsigned long pipe_bufs = PIPE_DEF_BUFFERS; + struct user_struct *user = get_current_user(); + + if (too_many_pipe_buffers(user)) + pipe_bufs = 1; + + pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * pipe_bufs, GFP_KERNEL); if (pipe->bufs) { init_waitqueue_head(&pipe->wait); pipe->r_counter = pipe->w_counter = 1; - pipe->buffers = PIPE_DEF_BUFFERS; + pipe->buffers = pipe_bufs; + pipe->user = user; + account_pipe_buffers(pipe, 0, pipe_bufs); mutex_init(&pipe->mutex); return pipe; } + free_uid(user); kfree(pipe); } @@ -607,6 +632,8 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; + account_pipe_buffers(pipe, pipe->buffers, 0); + free_uid(pipe->user); for (i = 0; i < pipe->buffers; i++) { struct pipe_buffer *buf = pipe->bufs + i; if (buf->ops) @@ -998,6 +1025,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages) memcpy(bufs + head, pipe->bufs, tail * sizeof(struct pipe_buffer)); } + account_pipe_buffers(pipe, pipe->buffers, nr_pages); pipe->curbuf = 0; kfree(pipe->bufs); pipe->bufs = bufs; @@ -1066,7 +1094,8 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) if (!nr_pages) goto out; - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { + if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN) && + (size > pipe_max_size || too_many_pipe_buffers(pipe->user))) { ret = -EPERM; goto out; } diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index eb8b8ac..a4cee7c 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -57,6 +57,7 @@ struct pipe_inode_info { struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers; struct pipe_buffer *bufs; + struct user_struct *user; }; /* @@ -123,6 +124,7 @@ void pipe_unlock(struct pipe_inode_info *); void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *); extern unsigned int pipe_max_size, pipe_min_size; +extern unsigned long user_max_pipe_pages; int pipe_proc_fn(struct ctl_table *, int, void __user *, size_t *, loff_t *); diff --git a/include/linux/sched.h b/include/linux/sched.h index fbf25f1..93643762 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -831,6 +831,7 @@ struct user_struct { #endif unsigned long locked_shm; /* How many pages of mlocked shm ? */ unsigned long unix_inflight; /* How many files in flight in unix sockets */ + atomic_long_t pipe_bufs; /* how many pages are allocated in pipe buffers */ #ifdef CONFIG_KEYS struct key *uid_keyring; /* UID specific keyring */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index dc6858d..a288edf 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1735,6 +1735,13 @@ static struct ctl_table fs_table[] = { .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, }, + { + .procname = "user-max-pipe-pages", + .data = &user_max_pipe_pages, + .maxlen = sizeof(user_max_pipe_pages), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, { } };
On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit is controlled by the new sysctl user-max-pipe-pages, and may be disabled by setting it to zero. The default limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB = 1084 MB of memory allocated for a user. Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> --- v2: - added && !capable(CAP_SYS_ADMIN) - removed EXPORT_SYMBOL(user_max_pipe_pages) - added reported-by + mitigates, explain better example. v1: I don't know if it's better that the limit is not enforced by default (easy to change), nor do I know if it's better to specify the limit in bytes instead of pages. Some settings use pages already so I don't think it's a big deal. --- Documentation/sysctl/fs.txt | 10 ++++++++++ fs/pipe.c | 35 ++++++++++++++++++++++++++++++++--- include/linux/pipe_fs_i.h | 2 ++ include/linux/sched.h | 1 + kernel/sysctl.c | 7 +++++++ 5 files changed, 52 insertions(+), 3 deletions(-)