diff mbox series

[142/192] procfs: allow reading fdinfo with PTRACE_MODE_READ

Message ID 20210701015444.ZOZaFPX0b%akpm@linux-foundation.org (mailing list archive)
State New
Headers show
Series [001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c | expand

Commit Message

Andrew Morton July 1, 2021, 1:54 a.m. UTC
From: Kalesh Singh <kaleshsingh@google.com>
Subject: procfs: allow reading fdinfo with PTRACE_MODE_READ

Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers.  In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting.  Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to a
DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and
/proc/<pid>/fdinfo -- both are only readable by the process owner, as
follows:

  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.

Accessing other processes' fdinfo requires root privileges.  This limits
the use of the interface to debugging environments and is not suitable for
production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/base.c |    4 ++--
 fs/proc/fd.c   |   15 ++++++++++++++-
 2 files changed, 16 insertions(+), 3 deletions(-)

Comments

Christian Brauner July 2, 2021, 2:54 p.m. UTC | #1
On Wed, Jun 30, 2021 at 06:54:44PM -0700, Andrew Morton wrote:
> From: Kalesh Singh <kaleshsingh@google.com>
> Subject: procfs: allow reading fdinfo with PTRACE_MODE_READ
> 
> Android captures per-process system memory state when certain low memory
> events (e.g a foreground app kill) occur, to identify potential memory
> hoggers.  In order to measure how much memory a process actually consumes,
> it is necessary to include the DMA buffer sizes for that process in the
> memory accounting.  Since the handle to DMA buffers are raw FDs, it is
> important to be able to identify which processes have FD references to a
> DMA buffer.
> 
> Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and
> /proc/<pid>/fdinfo -- both are only readable by the process owner, as
> follows:
> 
>   1. Do a readlink on each FD.
>   2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
>   3. stat the file to get the dmabuf inode number.
>   4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.
> 
> Accessing other processes' fdinfo requires root privileges.  This limits
> the use of the interface to debugging environments and is not suitable for
> production builds.  Granting root privileges even to a system process
> increases the attack surface and is highly undesirable.
> 
> Since fdinfo doesn't permit reading process memory and manipulating
> process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.
> 
> Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Acked-by: Christian König <christian.koenig@amd.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: James Morris <jamorris@linux.microsoft.com>
> Cc: Jeff Vander Stoep <jeffv@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---

Rather useful (also for CRIU and others).
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Kees Cook July 2, 2021, 6:43 p.m. UTC | #2
On Wed, Jun 30, 2021 at 06:54:44PM -0700, Andrew Morton wrote:
> From: Kalesh Singh <kaleshsingh@google.com>
> Subject: procfs: allow reading fdinfo with PTRACE_MODE_READ
> 
> Android captures per-process system memory state when certain low memory
> events (e.g a foreground app kill) occur, to identify potential memory
> hoggers.  In order to measure how much memory a process actually consumes,
> it is necessary to include the DMA buffer sizes for that process in the
> memory accounting.  Since the handle to DMA buffers are raw FDs, it is
> important to be able to identify which processes have FD references to a
> DMA buffer.
> 
> Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and
> /proc/<pid>/fdinfo -- both are only readable by the process owner, as
> follows:
> 
>   1. Do a readlink on each FD.
>   2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
>   3. stat the file to get the dmabuf inode number.
>   4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.
> 
> Accessing other processes' fdinfo requires root privileges.  This limits
> the use of the interface to debugging environments and is not suitable for
> production builds.  Granting root privileges even to a system process
> increases the attack surface and is highly undesirable.
> 
> Since fdinfo doesn't permit reading process memory and manipulating
> process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.
> 
> Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Acked-by: Christian König <christian.koenig@amd.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: James Morris <jamorris@linux.microsoft.com>
> Cc: Jeff Vander Stoep <jeffv@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  fs/proc/base.c |    4 ++--
>  fs/proc/fd.c   |   15 ++++++++++++++-
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> --- a/fs/proc/base.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read
> +++ a/fs/proc/base.c
> @@ -3172,7 +3172,7 @@ static const struct pid_entry tgid_base_
>  	DIR("task",       S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
>  	DIR("fd",         S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
>  	DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
> -	DIR("fdinfo",     S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
> +	DIR("fdinfo",     S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations),
>  	DIR("ns",	  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
>  #ifdef CONFIG_NET
>  	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
> @@ -3517,7 +3517,7 @@ static const struct inode_operations pro
>   */
>  static const struct pid_entry tid_base_stuff[] = {
>  	DIR("fd",        S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
> -	DIR("fdinfo",    S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
> +	DIR("fdinfo",    S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations),
>  	DIR("ns",	 S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
>  #ifdef CONFIG_NET
>  	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
> --- a/fs/proc/fd.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read
> +++ a/fs/proc/fd.c
> @@ -6,6 +6,7 @@
>  #include <linux/fdtable.h>
>  #include <linux/namei.h>
>  #include <linux/pid.h>
> +#include <linux/ptrace.h>
>  #include <linux/security.h>
>  #include <linux/file.h>
>  #include <linux/seq_file.h>
> @@ -72,6 +73,18 @@ out:
>  
>  static int seq_fdinfo_open(struct inode *inode, struct file *file)
>  {
> +	bool allowed = false;
> +	struct task_struct *task = get_proc_task(inode);
> +
> +	if (!task)
> +		return -ESRCH;
> +
> +	allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
> +	put_task_struct(task);
> +
> +	if (!allowed)
> +		return -EACCES;

Uhm, this is only checked in open(), and never again? Is this safe in
the face of exec or pid re-use?

-Kees

> +
>  	return single_open(file, seq_show, inode);
>  }
>  
> @@ -308,7 +321,7 @@ static struct dentry *proc_fdinfo_instan
>  	struct proc_inode *ei;
>  	struct inode *inode;
>  
> -	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);
> +	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO);
>  	if (!inode)
>  		return ERR_PTR(-ENOENT);
>  
> _
Linus Torvalds July 2, 2021, 7 p.m. UTC | #3
On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote:
>
> Uhm, this is only checked in open(), and never again? Is this safe in
> the face of exec or pid re-use?

Interesting question, but not really all that valid for this particular patch.

Why? Because we already only check for owner permissions on open, and
never again. So if we have fdinfo issues across a suid exec or pid
re-use, they are pre-existing..

But yes, it would probably be a good idea to think about readdir() on
that directory. If somebody reminds me after the merge window is over,
I'll come back to this, but if somebody else wants to think about it
before then, that would be great.

              Linus
Eric W. Biederman July 2, 2021, 8:40 p.m. UTC | #4
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote:
>>
>> Uhm, this is only checked in open(), and never again? Is this safe in
>> the face of exec or pid re-use?

Exec does not change the file descriptor table.

The open holds a reference to the proc inode.  The proc inode holds the
struct pid of the task and the file descriptor number.  References using
struct pid do not suffer from userspace pid rollover issues.

So the only issue I see is file descriptor reuse after an exec,
that changes the processes struct cred.

Assuming we care it would probably be worth a bug fix patch to check
something.


Eric
Kees Cook July 2, 2021, 11:31 p.m. UTC | #5
On Fri, Jul 02, 2021 at 03:40:49PM -0500, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote:
> >>
> >> Uhm, this is only checked in open(), and never again? Is this safe in
> >> the face of exec or pid re-use?
> 
> Exec does not change the file descriptor table.

Ah yeah, good point. I've been thinking too much about vmas.

> The open holds a reference to the proc inode.  The proc inode holds the
> struct pid of the task and the file descriptor number.  References using
> struct pid do not suffer from userspace pid rollover issues.

Okay, cool.

> So the only issue I see is file descriptor reuse after an exec,
> that changes the processes struct cred.

Right -- the info leak would be snooping on what a privileged process
was doing with a given fd? Similar stuff has been used to do typing
pattern analysis with login passwords, but that's a stretch here, I
think. Hmm.

> Assuming we care it would probably be worth a bug fix patch to check
> something.

Sounds good.
Linus Torvalds July 3, 2021, 12:15 a.m. UTC | #6
On Fri, Jul 2, 2021 at 4:31 PM Kees Cook <keescook@chromium.org> wrote:
>
> Right -- the info leak would be snooping on what a privileged process
> was doing with a given fd? Similar stuff has been used to do typing
> pattern analysis with login passwords, but that's a stretch here, I
> think. Hmm.

So I think you'd see the directory list, but generally that's just the
file descriptor numbers.

Which is information you shouldn't have access to, but it's probably
not very *interesting* information.

I think it would be worth fixing but possibly not a very high priority.

             Linus
Eric W. Biederman July 3, 2021, 9:43 p.m. UTC | #7
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Jul 2, 2021 at 4:31 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> Right -- the info leak would be snooping on what a privileged process
>> was doing with a given fd? Similar stuff has been used to do typing
>> pattern analysis with login passwords, but that's a stretch here, I
>> think. Hmm.
>
> So I think you'd see the directory list, but generally that's just the
> file descriptor numbers.
>
> Which is information you shouldn't have access to, but it's probably
> not very *interesting* information.
>
> I think it would be worth fixing but possibly not a very high
> priority.

It is not just the directory whose permission changed but the individual
files in that directory.

You can also see the position, flags, mnt_id, and soon inode number
of fdinfo files you open before a suid exec.

Knowing what file someone is reading on a particular file descriptor
number and how far they are in reading that file sounds like a side
channel someone can do something with.

Eric
diff mbox series

Patch

--- a/fs/proc/base.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read
+++ a/fs/proc/base.c
@@ -3172,7 +3172,7 @@  static const struct pid_entry tgid_base_
 	DIR("task",       S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
 	DIR("fd",         S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
 	DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
-	DIR("fdinfo",     S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
+	DIR("fdinfo",     S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations),
 	DIR("ns",	  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
 #ifdef CONFIG_NET
 	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
@@ -3517,7 +3517,7 @@  static const struct inode_operations pro
  */
 static const struct pid_entry tid_base_stuff[] = {
 	DIR("fd",        S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
-	DIR("fdinfo",    S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
+	DIR("fdinfo",    S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations),
 	DIR("ns",	 S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
 #ifdef CONFIG_NET
 	DIR("net",        S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
--- a/fs/proc/fd.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read
+++ a/fs/proc/fd.c
@@ -6,6 +6,7 @@ 
 #include <linux/fdtable.h>
 #include <linux/namei.h>
 #include <linux/pid.h>
+#include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/file.h>
 #include <linux/seq_file.h>
@@ -72,6 +73,18 @@  out:
 
 static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
+	bool allowed = false;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (!task)
+		return -ESRCH;
+
+	allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+	put_task_struct(task);
+
+	if (!allowed)
+		return -EACCES;
+
 	return single_open(file, seq_show, inode);
 }
 
@@ -308,7 +321,7 @@  static struct dentry *proc_fdinfo_instan
 	struct proc_inode *ei;
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);
+	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);