diff mbox series

[1/1] mm/madvise: replace ptrace attach requirement for process_madvise

Message ID 20210108205857.1471269-1-surenb@google.com (mailing list archive)
State Superseded
Delegated to: Paul Moore
Headers show
Series [1/1] mm/madvise: replace ptrace attach requirement for process_madvise | expand

Commit Message

Suren Baghdasaryan Jan. 8, 2021, 8:58 p.m. UTC
process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the
two processes (in one direction). Granting ptrace attach capability
even to a system process is considered dangerous since it creates an
attack surface. This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data
is physically located (and therefore, how fast it can be accessed).
What we want is the ability for one process to influence another process
in order to optimize performance across the entire system while leaving
the security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
and CAP_SYS_NICE for influencing process performance.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/madvise.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Minchan Kim Jan. 8, 2021, 10:15 p.m. UTC | #1
On Fri, Jan 08, 2021 at 12:58:57PM -0800, Suren Baghdasaryan wrote:
> process_madvise currently requires ptrace attach capability.
> PTRACE_MODE_ATTACH gives one process complete control over another
> process. It effectively removes the security boundary between the
> two processes (in one direction). Granting ptrace attach capability
> even to a system process is considered dangerous since it creates an
> attack surface. This severely limits the usage of this API.
> The operations process_madvise can perform do not affect the correctness
> of the operation of the target process; they only affect where the data
> is physically located (and therefore, how fast it can be accessed).
> What we want is the ability for one process to influence another process
> in order to optimize performance across the entire system while leaving
> the security boundary intact.
> Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> and CAP_SYS_NICE for influencing process performance.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

It sounds logical to me.
If security folks don't see any concern and fix below,

Acked-by: Minchan Kim <minchan@kernel.org>

> @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>  		goto release_task;
>  	}
>  
> -	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> +	/* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> +	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>  	if (IS_ERR_OR_NULL(mm)) {
>  		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>  		goto release_task;
>  	}
>  
> +	/*
> +	 * Require CAP_SYS_NICE for influencing process performance. Note that
> +	 * only non-destructive hints are currently supported.
> +	 */
> +	if (!capable(CAP_SYS_NICE)) {
> +		ret = -EPERM;
> +		goto release_task;

mmput?

> +	}
> +
>  	total_len = iov_iter_count(&iter);
>  
>  	while (iov_iter_count(&iter)) {
> -- 
> 2.30.0.284.gd98b1dd5eaa7-goog
>
Suren Baghdasaryan Jan. 8, 2021, 10:17 p.m. UTC | #2
On Fri, Jan 8, 2021 at 2:15 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Jan 08, 2021 at 12:58:57PM -0800, Suren Baghdasaryan wrote:
> > process_madvise currently requires ptrace attach capability.
> > PTRACE_MODE_ATTACH gives one process complete control over another
> > process. It effectively removes the security boundary between the
> > two processes (in one direction). Granting ptrace attach capability
> > even to a system process is considered dangerous since it creates an
> > attack surface. This severely limits the usage of this API.
> > The operations process_madvise can perform do not affect the correctness
> > of the operation of the target process; they only affect where the data
> > is physically located (and therefore, how fast it can be accessed).
> > What we want is the ability for one process to influence another process
> > in order to optimize performance across the entire system while leaving
> > the security boundary intact.
> > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
> > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
> > and CAP_SYS_NICE for influencing process performance.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> It sounds logical to me.
> If security folks don't see any concern and fix below,
>
> Acked-by: Minchan Kim <minchan@kernel.org>
>
> > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >               goto release_task;
> >       }
> >
> > -     mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > +     /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > +     mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >       if (IS_ERR_OR_NULL(mm)) {
> >               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >               goto release_task;
> >       }
> >
> > +     /*
> > +      * Require CAP_SYS_NICE for influencing process performance. Note that
> > +      * only non-destructive hints are currently supported.
> > +      */
> > +     if (!capable(CAP_SYS_NICE)) {
> > +             ret = -EPERM;
> > +             goto release_task;
>
> mmput?

Ouch! Thanks for pointing it out! Will include in the next respin.

>
> > +     }
> > +
> >       total_len = iov_iter_count(&iter);
> >
> >       while (iov_iter_count(&iter)) {
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
David Rientjes Jan. 9, 2021, 1:02 a.m. UTC | #3
On Fri, 8 Jan 2021, Suren Baghdasaryan wrote:

> > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > >               goto release_task;
> > >       }
> > >
> > > -     mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > +     /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > +     mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > >       if (IS_ERR_OR_NULL(mm)) {
> > >               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > >               goto release_task;
> > >       }
> > >
> > > +     /*
> > > +      * Require CAP_SYS_NICE for influencing process performance. Note that
> > > +      * only non-destructive hints are currently supported.
> > > +      */
> > > +     if (!capable(CAP_SYS_NICE)) {
> > > +             ret = -EPERM;
> > > +             goto release_task;
> >
> > mmput?
> 
> Ouch! Thanks for pointing it out! Will include in the next respin.
> 

With the fix, feel free to add:

	Acked-by: David Rientjes <rientjes@google.com>

Thanks Suren!
Suren Baghdasaryan Jan. 9, 2021, 2:19 a.m. UTC | #4
On Fri, Jan 8, 2021 at 5:02 PM David Rientjes <rientjes@google.com> wrote:
>
> On Fri, 8 Jan 2021, Suren Baghdasaryan wrote:
>
> > > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > > >               goto release_task;
> > > >       }
> > > >
> > > > -     mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > > +     /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > > +     mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > > >       if (IS_ERR_OR_NULL(mm)) {
> > > >               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > > >               goto release_task;
> > > >       }
> > > >
> > > > +     /*
> > > > +      * Require CAP_SYS_NICE for influencing process performance. Note that
> > > > +      * only non-destructive hints are currently supported.
> > > > +      */
> > > > +     if (!capable(CAP_SYS_NICE)) {
> > > > +             ret = -EPERM;
> > > > +             goto release_task;
> > >
> > > mmput?
> >
> > Ouch! Thanks for pointing it out! Will include in the next respin.
> >
>
> With the fix, feel free to add:
>
>         Acked-by: David Rientjes <rientjes@google.com>

Thanks! Will post a new version with the fix on Monday.

>
> Thanks Suren!
Florian Weimer Jan. 11, 2021, 10:20 a.m. UTC | #5
* Suren Baghdasaryan:

> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6a660858784b..c2d600386902 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>  		goto release_task;
>  	}
>  
> -	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> +	/* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> +	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
>  	if (IS_ERR_OR_NULL(mm)) {
>  		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>  		goto release_task;
>  	}

Shouldn't this depend on the requested behavior?  Several operations
directly result in observable changes, and go beyond performance tuning.

Thanks,
Florian
Suren Baghdasaryan Jan. 11, 2021, 5:05 p.m. UTC | #6
On Mon, Jan 11, 2021 at 2:20 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Suren Baghdasaryan:
>
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6a660858784b..c2d600386902 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> >               goto release_task;
> >       }
> >
> > -     mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > +     /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > +     mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> >       if (IS_ERR_OR_NULL(mm)) {
> >               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> >               goto release_task;
> >       }
>
> Shouldn't this depend on the requested behavior?  Several operations
> directly result in observable changes, and go beyond performance tuning.

Thanks for the comment Florian.
process_madvise supports only MADV_COLD and MADV_PAGEOUT hints which
are both non-destructive (see process_madvise_behavior_valid()
function). Maybe you meant something else by "observable changes", if
so please clarify.
Thanks,
Suren.

>
> Thanks,
> Florian
> --
> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Suren Baghdasaryan Jan. 11, 2021, 5:09 p.m. UTC | #7
On Mon, Jan 11, 2021 at 9:05 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jan 11, 2021 at 2:20 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Suren Baghdasaryan:
> >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 6a660858784b..c2d600386902 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
> > >               goto release_task;
> > >       }
> > >
> > > -     mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > > +     /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
> > > +     mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
> > >       if (IS_ERR_OR_NULL(mm)) {
> > >               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > >               goto release_task;
> > >       }
> >
> > Shouldn't this depend on the requested behavior?  Several operations
> > directly result in observable changes, and go beyond performance tuning.
>
> Thanks for the comment Florian.
> process_madvise supports only MADV_COLD and MADV_PAGEOUT hints which
> are both non-destructive (see process_madvise_behavior_valid()
> function). Maybe you meant something else by "observable changes", if
> so please clarify.
> Thanks,
> Suren.
>

V2 with Minchan's fix is posted at:
https://lore.kernel.org/lkml/20210111170622.2613577-1-surenb@google.com/T/#u

> >
> > Thanks,
> > Florian
> > --
> > Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
> > Commercial register: Amtsgericht Muenchen, HRB 153243,
> > Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 6a660858784b..c2d600386902 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1197,12 +1197,22 @@  SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 		goto release_task;
 	}
 
-	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+	/* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
 	if (IS_ERR_OR_NULL(mm)) {
 		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
 		goto release_task;
 	}
 
+	/*
+	 * Require CAP_SYS_NICE for influencing process performance. Note that
+	 * only non-destructive hints are currently supported.
+	 */
+	if (!capable(CAP_SYS_NICE)) {
+		ret = -EPERM;
+		goto release_task;
+	}
+
 	total_len = iov_iter_count(&iter);
 
 	while (iov_iter_count(&iter)) {