[7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
diff mbox

Message ID 1474211117-16674-8-git-send-email-jann@thejh.net
State New
Headers show

Commit Message

Jann Horn Sept. 18, 2016, 3:05 p.m. UTC
This ensures that VFS implementations don't call ptrace_may_access() from
VFS read or write handlers. In order for file descriptor passing to have
its intended security properties, VFS read/write handlers must not do any
kind of privilege checking.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/aio.c              |  2 ++
 fs/read_write.c       | 12 ++++++++++++
 fs/splice.c           | 12 ++++++++++--
 include/linux/sched.h |  4 ++++
 kernel/ptrace.c       |  7 +++++++
 5 files changed, 35 insertions(+), 2 deletions(-)

Comments

Ben Hutchings Sept. 18, 2016, 6:38 p.m. UTC | #1
On Sun, Sep 18, 2016 at 05:05:15PM +0200, Jann Horn wrote:
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.
[...]
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -302,6 +302,13 @@ ok:
>  bool ptrace_may_access(struct task_struct *task, unsigned int mode)
>  {
>  	int err;
> +
> +	/* If you have to check for ptrace access from a VFS method, use
> +	 * ptrace_may_access_noncurrent() instead.
> +	 */
> +	if (WARN_ON(current->in_unprivileged_vfs != 0))

Shouldn't this be WARN_ON_ONCE(), so that any such bug can't e used
to spam the log?

Ben.

> +		return false;
> +
>  	task_lock(task);
>  	err = __ptrace_may_access(task, mode, current_cred(),
>  				  &current->self_privunit_id);
Jann Horn Sept. 18, 2016, 6:40 p.m. UTC | #2
On Sun, Sep 18, 2016 at 07:38:48PM +0100, Ben Hutchings wrote:
> On Sun, Sep 18, 2016 at 05:05:15PM +0200, Jann Horn wrote:
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> [...]
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -302,6 +302,13 @@ ok:
> >  bool ptrace_may_access(struct task_struct *task, unsigned int mode)
> >  {
> >  	int err;
> > +
> > +	/* If you have to check for ptrace access from a VFS method, use
> > +	 * ptrace_may_access_noncurrent() instead.
> > +	 */
> > +	if (WARN_ON(current->in_unprivileged_vfs != 0))
> 
> Shouldn't this be WARN_ON_ONCE(), so that any such bug can't e used
> to spam the log?

Hm, makes sense. I'll change it in v2.
Andy Lutomirski Sept. 18, 2016, 7:57 p.m. UTC | #3
On Sep 18, 2016 5:05 AM, "Jann Horn" <jann@thejh.net> wrote:
>
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.
>

Ooh, nifty!  Can you warn about capable() too?

Warning about all access to current->cred could be fun.  I expect we
have zillions of these bugs.  Think keys, netlink, proc, etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Sept. 18, 2016, 8:18 p.m. UTC | #4
On Sun, Sep 18, 2016 at 8:05 AM, Jann Horn <jann@thejh.net> wrote:
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.

Quite frankly, this smells like it should be a static check, not some
kind of runtime one. Or if runtime, it should be abstracted out so
that you can do an occasional "let's run a checking pass" rather than
enable it unconditionally and universally.

It's just too specialized. Soon you'll want to do other random context
checking, and we can't just keep adding those kinds of ad-hoc things
without it becoming a maintenance nightmare. I can well imagine
somebody ending up writing some stupid patch to take that
"in_unprivileged_vfs" thing into account for some semantics, and then
we're *really* screwed. So there are many reasons to make sure this is
*not* something that people actually expect to always be there.

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jann Horn Sept. 18, 2016, 8:38 p.m. UTC | #5
On Sun, Sep 18, 2016 at 12:57:57PM -0700, Andy Lutomirski wrote:
> On Sep 18, 2016 5:05 AM, "Jann Horn" <jann@thejh.net> wrote:
> >
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> >
> 
> Ooh, nifty!  Can you warn about capable() too?
> 
> Warning about all access to current->cred could be fun.  I expect we
> have zillions of these bugs.  Think keys, netlink, proc, etc.

True, that would probably spam dmesg quite a bit. %pK under printk() is
pretty nonsensical (heh, maybe we even have code doing capable() checks
in IRQ context via printk()), then there are all those broken capable()
checks for root-only sysctls and so on.

For capable(), I've been thinking about adopting Linus' old suggestion of
simply overriding the creds on VFS read/write entry. If we make
current->cred non-refcounted and task-private (no RCU-safe pointer
assignment), it's going to basically be two extra pointer reads and writes
per read/write call - even for such a hot code path, that should be fine
IMO. (And if we don't want to do this for all read/write calls, we could
at least do it for sysctls, that would already help quite a bit.)
Jann Horn Sept. 18, 2016, 8:52 p.m. UTC | #6
On Sun, Sep 18, 2016 at 01:18:48PM -0700, Linus Torvalds wrote:
> On Sun, Sep 18, 2016 at 8:05 AM, Jann Horn <jann@thejh.net> wrote:
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> 
> Quite frankly, this smells like it should be a static check, not some
> kind of runtime one. Or if runtime, it should be abstracted out so
> that you can do an occasional "let's run a checking pass" rather than
> enable it unconditionally and universally.

Hm, fair point.

I guess this could be implemented in eBPF or systemtap?

Then for now, I guess I'll remove this patch from the series - and maybe
I'll think about writing some external checker with eBPF kprobes or so.


> It's just too specialized. Soon you'll want to do other random context
> checking, and we can't just keep adding those kinds of ad-hoc things
> without it becoming a maintenance nightmare. I can well imagine
> somebody ending up writing some stupid patch to take that
> "in_unprivileged_vfs" thing into account for some semantics, and then
> we're *really* screwed. So there are many reasons to make sure this is
> *not* something that people actually expect to always be there.

Oh, yuck. Yes, those are good points.

Patch
diff mbox

diff --git a/fs/aio.c b/fs/aio.c
index fb8e45b..c941885 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1454,9 +1454,11 @@  rw_common:
 
 		if (rw == WRITE)
 			file_start_write(file);
+		current->in_unprivileged_vfs++;
 
 		ret = iter_op(req, &iter);
 
+		current->in_unprivileged_vfs--;
 		if (rw == WRITE)
 			file_end_write(file);
 		kfree(iovec);
diff --git a/fs/read_write.c b/fs/read_write.c
index 66215a7..deddb93 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -472,7 +472,9 @@  ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 	if (!ret) {
 		if (count > MAX_RW_COUNT)
 			count =  MAX_RW_COUNT;
+		current->in_unprivileged_vfs++;
 		ret = __vfs_read(file, buf, count, pos);
+		current->in_unprivileged_vfs--;
 		if (ret > 0) {
 			fsnotify_access(file);
 			add_rchar(current, ret);
@@ -557,7 +559,9 @@  ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 		if (count > MAX_RW_COUNT)
 			count =  MAX_RW_COUNT;
 		file_start_write(file);
+		current->in_unprivileged_vfs++;
 		ret = __vfs_write(file, buf, count, pos);
+		current->in_unprivileged_vfs--;
 		if (ret > 0) {
 			fsnotify_modify(file);
 			add_wchar(current, ret);
@@ -838,12 +842,14 @@  static ssize_t do_readv_writev(int type, struct file *file,
 		iter_fn = file->f_op->write_iter;
 		file_start_write(file);
 	}
+	current->in_unprivileged_vfs++;
 
 	if (iter_fn)
 		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
 		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
+	current->in_unprivileged_vfs--;
 	if (type != READ)
 		file_end_write(file);
 
@@ -1063,12 +1069,14 @@  static ssize_t compat_do_readv_writev(int type, struct file *file,
 		iter_fn = file->f_op->write_iter;
 		file_start_write(file);
 	}
+	current->in_unprivileged_vfs++;
 
 	if (iter_fn)
 		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
 		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
+	current->in_unprivileged_vfs--;
 	if (type != READ)
 		file_end_write(file);
 
@@ -1369,7 +1377,9 @@  static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		fl = SPLICE_F_NONBLOCK;
 #endif
 	file_start_write(out.file);
+	current->in_unprivileged_vfs++;
 	retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
+	current->in_unprivileged_vfs--;
 	file_end_write(out.file);
 
 	if (retval > 0) {
@@ -1512,6 +1522,7 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	ret = mnt_want_write_file(file_out);
 	if (ret)
 		return ret;
+	current->in_unprivileged_vfs++;
 
 	ret = -EOPNOTSUPP;
 	if (file_out->f_op->copy_file_range)
@@ -1521,6 +1532,7 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
 				len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
 
+	current->in_unprivileged_vfs--;
 	if (ret > 0) {
 		fsnotify_access(file_in);
 		add_rchar(current, ret);
diff --git a/fs/splice.c b/fs/splice.c
index dd9bf7e..5fcad56 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1134,7 +1134,7 @@  static long do_splice_to(struct file *in, loff_t *ppos,
 {
 	ssize_t (*splice_read)(struct file *, loff_t *,
 			       struct pipe_inode_info *, size_t, unsigned int);
-	int ret;
+	long ret;
 
 	if (unlikely(!(in->f_mode & FMODE_READ)))
 		return -EBADF;
@@ -1151,7 +1151,11 @@  static long do_splice_to(struct file *in, loff_t *ppos,
 	else
 		splice_read = default_file_splice_read;
 
-	return splice_read(in, ppos, pipe, len, flags);
+	current->in_unprivileged_vfs++;
+	ret = splice_read(in, ppos, pipe, len, flags);
+	current->in_unprivileged_vfs--;
+
+	return ret;
 }
 
 /**
@@ -1334,7 +1338,9 @@  long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 	if (unlikely(ret < 0))
 		return ret;
 
+	current->in_unprivileged_vfs++;
 	ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
+	current->in_unprivileged_vfs--;
 	if (ret > 0)
 		*ppos = sd.pos;
 
@@ -1401,7 +1407,9 @@  static long do_splice(struct file *in, loff_t __user *off_in,
 			return ret;
 
 		file_start_write(out);
+		current->in_unprivileged_vfs++;
 		ret = do_splice_from(ipipe, out, &offset, len, flags);
+		current->in_unprivileged_vfs--;
 		file_end_write(out);
 
 		if (!off_out)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e4bf894..6094720 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,6 +1473,10 @@  struct task_struct {
 	atomic_t usage;
 	unsigned int flags;	/* per process flags, defined below */
 	unsigned int ptrace;
+	/* depth of VFS read/write; non-zero values let certain privilege checks
+	 * fail with a warning
+	 */
+	unsigned int in_unprivileged_vfs;
 
 #ifdef CONFIG_SMP
 	struct llist_node wake_entry;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b311ca5..d839919 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -302,6 +302,13 @@  ok:
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;
+
+	/* If you have to check for ptrace access from a VFS method, use
+	 * ptrace_may_access_noncurrent() instead.
+	 */
+	if (WARN_ON(current->in_unprivileged_vfs != 0))
+		return false;
+
 	task_lock(task);
 	err = __ptrace_may_access(task, mode, current_cred(),
 				  &current->self_privunit_id);