[1/2] security, perf: allow further restriction of perf_event_open
diff mbox

Message ID 1469630746-32279-1-git-send-email-jeffv@google.com
State New
Headers show

Commit Message

Jeffrey Vander Stoep July 27, 2016, 2:45 p.m. UTC
When kernel.perf_event_paranoid is set to 3 (or greater), disallow
all access to performance events by users without CAP_SYS_ADMIN.

This new level of restriction is intended to reduce the attack
surface of the kernel. Perf is a valuable tool for developers but
is generally unnecessary and unused on production systems. Perf may
open up an attack vector to vulnerable device-specific drivers as
recently demonstrated in CVE-2016-0805, CVE-2016-0819,
CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
restriction allows for a safe default to be set on production systems
while leaving a simple means for developers to grant access [1].

This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches
have been modified and split up to address on-list feedback.

kernel.perf_event_paranoid=3 is the default on both Debian [2] and
Android [3].

[1] Making perf available to developers on Android:
https://android-review.googlesource.com/#/c/234400/
[2] Original patch by Ben Hutchings:
https://lkml.org/lkml/2016/1/11/587
[3] https://android-review.googlesource.com/#/c/234743/

Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
---
 Documentation/sysctl/kernel.txt | 1 +
 include/linux/perf_event.h      | 5 +++++
 kernel/events/core.c            | 4 ++++
 3 files changed, 10 insertions(+)

Comments

Kees Cook July 27, 2016, 8:43 p.m. UTC | #1
On Wed, Jul 27, 2016 at 7:45 AM, Jeff Vander Stoep <jeffv@google.com> wrote:
> When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> all access to performance events by users without CAP_SYS_ADMIN.
>
> This new level of restriction is intended to reduce the attack
> surface of the kernel. Perf is a valuable tool for developers but
> is generally unnecessary and unused on production systems. Perf may
> open up an attack vector to vulnerable device-specific drivers as
> recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
> restriction allows for a safe default to be set on production systems
> while leaving a simple means for developers to grant access [1].
>
> This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
> Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches
> have been modified and split up to address on-list feedback.
>
> kernel.perf_event_paranoid=3 is the default on both Debian [2] and
> Android [3].
>
> [1] Making perf available to developers on Android:
> https://android-review.googlesource.com/#/c/234400/
> [2] Original patch by Ben Hutchings:
> https://lkml.org/lkml/2016/1/11/587
> [3] https://android-review.googlesource.com/#/c/234743/
>
> Signed-off-by: Jeff Vander Stoep <jeffv@google.com>

Thanks for splitting this up! It'll be nice to have this delta out of
Debian and Android.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  Documentation/sysctl/kernel.txt | 1 +
>  include/linux/perf_event.h      | 5 +++++
>  kernel/events/core.c            | 4 ++++
>  3 files changed, 10 insertions(+)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index ffab8b5..fac9798 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -665,6 +665,7 @@ users (without CAP_SYS_ADMIN).  The default value is 2.
>  >=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>  >=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>  >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
> +>=3: Disallow all event access by users without CAP_SYS_ADMIN
>
>  ==============================================================
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 8ed43261..1e2080f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1156,6 +1156,11 @@ static inline bool perf_paranoid_kernel(void)
>         return sysctl_perf_event_paranoid > 1;
>  }
>
> +static inline bool perf_paranoid_any(void)
> +{
> +       return sysctl_perf_event_paranoid > 2;
> +}
> +
>  extern void perf_event_init(void);
>  extern void perf_tp_event(u16 event_type, u64 count, void *record,
>                           int entry_size, struct pt_regs *regs,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 356a6c7..52bd100 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -353,6 +353,7 @@ static struct srcu_struct pmus_srcu;
>   *   0 - disallow raw tracepoint access for unpriv
>   *   1 - disallow cpu events for unpriv
>   *   2 - disallow kernel profiling for unpriv
> + *   3 - disallow all unpriv perf event use
>   */
>  int sysctl_perf_event_paranoid __read_mostly = 2;
>
> @@ -9296,6 +9297,9 @@ SYSCALL_DEFINE5(perf_event_open,
>         if (flags & ~PERF_FLAG_ALL)
>                 return -EINVAL;
>
> +       if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN))
> +                               return -EACCES;
> +
>         err = perf_copy_attr(attr_uptr, &attr);
>         if (err)
>                 return err;
> --
> 2.8.0.rc3.226.g39d4020
>
Peter Zijlstra Aug. 2, 2016, 9:52 a.m. UTC | #2
On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> all access to performance events by users without CAP_SYS_ADMIN.
> 
> This new level of restriction is intended to reduce the attack
> surface of the kernel. Perf is a valuable tool for developers but
> is generally unnecessary and unused on production systems. Perf may
> open up an attack vector to vulnerable device-specific drivers as
> recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843.

We have bugs we fix them, we don't kill complete infrastructure because
of them.

> This new level of
> restriction allows for a safe default to be set on production systems
> while leaving a simple means for developers to grant access [1].

So the problem I have with this is that it will completely inhibit
development of things like JITs that self-profile to re-compile
frequently used code.

I would much rather have an LSM hook where the security stuff can do
more fine grained control of things. Allowing some apps perf usage while
denying others.
Arnaldo Carvalho de Melo Aug. 2, 2016, 1:04 p.m. UTC | #3
Em Tue, Aug 02, 2016 at 11:52:43AM +0200, Peter Zijlstra escreveu:
> On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> > When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> > all access to performance events by users without CAP_SYS_ADMIN.

> > This new level of restriction is intended to reduce the attack
> > surface of the kernel. Perf is a valuable tool for developers but
> > is generally unnecessary and unused on production systems. Perf may
> > open up an attack vector to vulnerable device-specific drivers as
> > recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843.
 
> We have bugs we fix them, we don't kill complete infrastructure because
> of them.
 
> > This new level of
> > restriction allows for a safe default to be set on production systems
> > while leaving a simple means for developers to grant access [1].
 
> So the problem I have with this is that it will completely inhibit
> development of things like JITs that self-profile to re-compile
> frequently used code.

Or reimplement strace with sys_perf_event_open(), speeding it up greatly
by not using ptrace (see 'perf trace', one such attempt), combining it
with sys_bpf(), which can run unpriviledged as well, provides lots of
possibilities for efficient tooling that would be greatly stiffled by
such big hammer restrictions :-(
 
> I would much rather have an LSM hook where the security stuff can do
> more fine grained control of things. Allowing some apps perf usage while
> denying others.

- Arnaldo
Daniel Micay Aug. 2, 2016, 1:10 p.m. UTC | #4
> > So the problem I have with this is that it will completely inhibit
> > development of things like JITs that self-profile to re-compile
> > frequently used code.
> 
> Or reimplement strace with sys_perf_event_open(), speeding it up
> greatly
> by not using ptrace (see 'perf trace', one such attempt), combining it
> with sys_bpf(), which can run unpriviledged as well, provides lots of
> possibilities for efficient tooling that would be greatly stiffled by
> such big hammer restrictions :-(

The usage on Android wouldn't impact strace. It's a debugging tool used
over the debugging shell so it could be taught to toggle on unprivileged
access to perf events as the other tools using the API were.
Daniel Micay Aug. 2, 2016, 1:16 p.m. UTC | #5
On Tue, 2016-08-02 at 11:52 +0200, Peter Zijlstra wrote:
> On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> > 
> > When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> > all access to performance events by users without CAP_SYS_ADMIN.
> > 
> > This new level of restriction is intended to reduce the attack
> > surface of the kernel. Perf is a valuable tool for developers but
> > is generally unnecessary and unused on production systems. Perf may
> > open up an attack vector to vulnerable device-specific drivers as
> > recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843.
> 
> We have bugs we fix them, we don't kill complete infrastructure
> because
> 
> of them.

It's still accessible to privileged processes either way. Android still
allows access from unprivileged processes but it can only be enabled via
the debugging shell, which is not enabled by default either.

It isn't even possible to disable the perf events infrastructure via
kernel configuration for every architecture right now. You're forcing
people to have common local privilege escalation and information leak
vulnerabilities for something few people actually use.

This patch is now a requirement for any Android devices with a security
patch level above August 2016. The only thing that not merging it is
going to accomplish is preventing a mainline kernel from ever being used
on Android devices, unless you provide an alternative it can use for the
same use case.

https://source.android.com/security/bulletin/2016-08-01.html

> > This new level of
> > restriction allows for a safe default to be set on production
> > systems
> > while leaving a simple means for developers to grant access [1].
> 
> So the problem I have with this is that it will completely inhibit
> development of things like JITs that self-profile to re-compile
> frequently used code.
> 
> I would much rather have an LSM hook where the security stuff can do
more fine grained control of things. Allowing some apps perf usage while
> denying others.

If the only need was controlling access per-process statically, then
using seccomp-bpf works fine. It needs to be dynamic though. I don't
think SELinux could be used to provide the functionality so it would
have to be a whole new LSM. I doubt anyone will implement that when the
necessary functionality is already available. It's already exposed only
for developers using profiling tools until they reboot, so finer grained
control wouldn't accomplish much.
Kees Cook Aug. 2, 2016, 7:04 p.m. UTC | #6
On Tue, Aug 2, 2016 at 2:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
>> When kernel.perf_event_paranoid is set to 3 (or greater), disallow
>> all access to performance events by users without CAP_SYS_ADMIN.
>>
>> This new level of restriction is intended to reduce the attack
>> surface of the kernel. Perf is a valuable tool for developers but
>> is generally unnecessary and unused on production systems. Perf may
>> open up an attack vector to vulnerable device-specific drivers as
>> recently demonstrated in CVE-2016-0805, CVE-2016-0819,
>> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843.
>
> We have bugs we fix them, we don't kill complete infrastructure because
> of them.

I understand that point of view, but it isn't what things look like
for the average end-user of Linux. The lifetime on bugs is very long,
even in upstream (see both Jon Corbet and my talks about this: an
average of 5 years from introduction to fix), and gets drawn out even
further by vendors with slow (or missing) update processes. Being able
to remove attack surface is a fundamental first step of security
defense, and things like perf, user namespaces, and similar APIs,
expose a lot of attack surface when they are enabled. And the evidence
for this attack surface being a real-world risk is in the history of
security vulnerabilities (that we know about!) in these various APIs.

Now, obviously, these API have huge value, otherwise they wouldn't
exist in the first place, and they wouldn't be built into end-user
kernels if they were universally undesirable. But that's not the
situation: the APIs are needed, but they lack the appropriate knobs to
control their availability. And this isn't just about Android: regular
distro kernels (like Debian, who also uses this patch) tend to build
in everything so people can use whatever they want. But for admins
that want to reduce their systems' attack surface, there needs to be
ways to disable things like this.

>> This new level of
>> restriction allows for a safe default to be set on production systems
>> while leaving a simple means for developers to grant access [1].
>
> So the problem I have with this is that it will completely inhibit
> development of things like JITs that self-profile to re-compile
> frequently used code.

This is a good example of a use-case where this knob would be turned
down. But for many many other use-cases, when presented with a
pre-built kernel, there isn't a way to remove the attack surface.

> I would much rather have an LSM hook where the security stuff can do
> more fine grained control of things. Allowing some apps perf usage while
> denying others.

I'm not against an LSM, but I think it's needless complexity when
there is already a knob for this but it just doesn't go "high" enough.
:)

-Kees
Peter Zijlstra Aug. 2, 2016, 8:30 p.m. UTC | #7
On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote:

> Now, obviously, these API have huge value, otherwise they wouldn't
> exist in the first place, and they wouldn't be built into end-user
> kernels if they were universally undesirable. But that's not the
> situation: the APIs are needed, but they lack the appropriate knobs to
> control their availability.

So far so good, but I take exception with the suggestion that the
proposed knob is appropriate.

> And this isn't just about Android: regular
> distro kernels (like Debian, who also uses this patch) tend to build
> in everything so people can use whatever they want. But for admins
> that want to reduce their systems' attack surface, there needs to be
> ways to disable things like this.

And here I think you're overestimating the knowledge of most admins.

> > So the problem I have with this is that it will completely inhibit
> > development of things like JITs that self-profile to re-compile
> > frequently used code.
> 
> This is a good example of a use-case where this knob would be turned
> down. But for many many other use-cases, when presented with a
> pre-built kernel, there isn't a way to remove the attack surface.

No, quite the opposite. Having this knob will completely inhibit
development of such applications. Worse it will probably render perf
dead for quite a large body of developers.

The moment you frame it like: perf or sekjurity, and even default to
no-perf-because-sekjurity, a whole bunch of corporate IT departments
will not enable this, even for their developers.

Have you never had to 'root' your work machine to get work done? I have.
Luckily this was pre-secure-boot times so it was trivial since I had
physical access to the machine. But it still sucked I had to fight IT
over mostly 'trivial' crap.

> > I would much rather have an LSM hook where the security stuff can do
> > more fine grained control of things. Allowing some apps perf usage while
> > denying others.
> 
> I'm not against an LSM, but I think it's needless complexity when
> there is already a knob for this but it just doesn't go "high" enough.
> :)

So what will you to the moment the Google Dalvik guys come to you and
say: "Hey, we want to do active profiling to do better on-line code
generation?".

I see 0 up-sides of this approach and, as per the above, a whole bunch
of very serious downsides.

A global (esp. default inhibited) knob is too coarse and limiting.
Kees Cook Aug. 2, 2016, 8:51 p.m. UTC | #8
On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote:
>
>> Now, obviously, these API have huge value, otherwise they wouldn't
>> exist in the first place, and they wouldn't be built into end-user
>> kernels if they were universally undesirable. But that's not the
>> situation: the APIs are needed, but they lack the appropriate knobs to
>> control their availability.
>
> So far so good, but I take exception with the suggestion that the
> proposed knob is appropriate.
>
>> And this isn't just about Android: regular
>> distro kernels (like Debian, who also uses this patch) tend to build
>> in everything so people can use whatever they want. But for admins
>> that want to reduce their systems' attack surface, there needs to be
>> ways to disable things like this.
>
> And here I think you're overestimating the knowledge of most admins.

Well, I suspect that's why both Android and Debian default this to off
right now. :) But, regardless, I think it's weird to block admins who
DO understand about attack surface from being able to limit it.

>> > So the problem I have with this is that it will completely inhibit
>> > development of things like JITs that self-profile to re-compile
>> > frequently used code.
>>
>> This is a good example of a use-case where this knob would be turned
>> down. But for many many other use-cases, when presented with a
>> pre-built kernel, there isn't a way to remove the attack surface.
>
> No, quite the opposite. Having this knob will completely inhibit
> development of such applications. Worse it will probably render perf
> dead for quite a large body of developers.

I hear what you're saying, but I think there's a few problems: the
proposed self-profiling JIT doesn't exist (so it's pointless to
discuss), and the number of developers that don't have access to their
own system is impossibly tiny when compared to the hundreds of
millions of end-users for whom perf is not needed.

> The moment you frame it like: perf or sekjurity, and even default to
> no-perf-because-sekjurity, a whole bunch of corporate IT departments
> will not enable this, even for their developers.

It's not "vs security", it's a risk analysis of attack surface. The
vast majority of people running Linux do not use perf (right now).
I've never suggested it be default disabled: I'm wanting to upstream
the sysctl setting that is already in use on distros where the distro
kernel teams have deemed this is needed knob for their end-users. All
of the objections you're talking about assume that the knob doesn't
exist, but it does already. It's just not in upstream. :)

> Have you never had to 'root' your work machine to get work done? I have.
> Luckily this was pre-secure-boot times so it was trivial since I had
> physical access to the machine. But it still sucked I had to fight IT
> over mostly 'trivial' crap.

Yeah, I've had to do similar. Frankly most people use VMs or non-corp
hardware for doing development work, so I don't think this is a useful
comparison.

>
>> > I would much rather have an LSM hook where the security stuff can do
>> > more fine grained control of things. Allowing some apps perf usage while
>> > denying others.
>>
>> I'm not against an LSM, but I think it's needless complexity when
>> there is already a knob for this but it just doesn't go "high" enough.
>> :)
>
> So what will you to the moment the Google Dalvik guys come to you and
> say: "Hey, we want to do active profiling to do better on-line code
> generation?".

That hasn't happened yet. When it does, we can revisit this. But right
now, today, there is a need for this knob.

> I see 0 up-sides of this approach and, as per the above, a whole bunch
> of very serious downsides.
>
> A global (esp. default inhibited) knob is too coarse and limiting.

I haven't suggested it be default inhibit in the upstream Kconfig. And
having this knob already with the 0, 1, and 2 settings seems
incomplete to me without this highest level of restriction that 3
would provide. That seems rather arbitrary to me. :)

Let me take this another way instead. What would be a better way to
provide a mechanism for system owners to disable perf without an LSM?
(Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
wide a coverage as possible.)

-Kees
Jeffrey Vander Stoep Aug. 2, 2016, 9:06 p.m. UTC | #9
Far from trying to kill perf, we want (and require) perf to be available to
developers on Android. All that this patch enables us to do is gate it
behind developer settings - just like we do with other developer targeted
features.


On Tue, Aug 2, 2016 at 1:51 PM Kees Cook <keescook@chromium.org> wrote:

> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org>
> wrote:
> > On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote:
> >
> >> Now, obviously, these API have huge value, otherwise they wouldn't
> >> exist in the first place, and they wouldn't be built into end-user
> >> kernels if they were universally undesirable. But that's not the
> >> situation: the APIs are needed, but they lack the appropriate knobs to
> >> control their availability.
> >
> > So far so good, but I take exception with the suggestion that the
> > proposed knob is appropriate.
> >
> >> And this isn't just about Android: regular
> >> distro kernels (like Debian, who also uses this patch) tend to build
> >> in everything so people can use whatever they want. But for admins
> >> that want to reduce their systems' attack surface, there needs to be
> >> ways to disable things like this.
> >
> > And here I think you're overestimating the knowledge of most admins.
>
> Well, I suspect that's why both Android and Debian default this to off
> right now. :) But, regardless, I think it's weird to block admins who
> DO understand about attack surface from being able to limit it.
>
> >> > So the problem I have with this is that it will completely inhibit
> >> > development of things like JITs that self-profile to re-compile
> >> > frequently used code.
> >>
> >> This is a good example of a use-case where this knob would be turned
> >> down. But for many many other use-cases, when presented with a
> >> pre-built kernel, there isn't a way to remove the attack surface.
> >
> > No, quite the opposite. Having this knob will completely inhibit
> > development of such applications. Worse it will probably render perf
> > dead for quite a large body of developers.
>
> I hear what you're saying, but I think there's a few problems: the
> proposed self-profiling JIT doesn't exist (so it's pointless to
> discuss), and the number of developers that don't have access to their
> own system is impossibly tiny when compared to the hundreds of
> millions of end-users for whom perf is not needed.
>
> > The moment you frame it like: perf or sekjurity, and even default to
> > no-perf-because-sekjurity, a whole bunch of corporate IT departments
> > will not enable this, even for their developers.
>
> It's not "vs security", it's a risk analysis of attack surface. The
> vast majority of people running Linux do not use perf (right now).
> I've never suggested it be default disabled: I'm wanting to upstream
> the sysctl setting that is already in use on distros where the distro
> kernel teams have deemed this is needed knob for their end-users. All
> of the objections you're talking about assume that the knob doesn't
> exist, but it does already. It's just not in upstream. :)
>
> > Have you never had to 'root' your work machine to get work done? I have.
> > Luckily this was pre-secure-boot times so it was trivial since I had
> > physical access to the machine. But it still sucked I had to fight IT
> > over mostly 'trivial' crap.
>
> Yeah, I've had to do similar. Frankly most people use VMs or non-corp
> hardware for doing development work, so I don't think this is a useful
> comparison.
>
> >
> >> > I would much rather have an LSM hook where the security stuff can do
> >> > more fine grained control of things. Allowing some apps perf usage
> while
> >> > denying others.
> >>
> >> I'm not against an LSM, but I think it's needless complexity when
> >> there is already a knob for this but it just doesn't go "high" enough.
> >> :)
> >
> > So what will you to the moment the Google Dalvik guys come to you and
> > say: "Hey, we want to do active profiling to do better on-line code
> > generation?".
>
> That hasn't happened yet. When it does, we can revisit this. But right
> now, today, there is a need for this knob.
>
> > I see 0 up-sides of this approach and, as per the above, a whole bunch
> > of very serious downsides.
> >
> > A global (esp. default inhibited) knob is too coarse and limiting.
>
> I haven't suggested it be default inhibit in the upstream Kconfig. And
> having this knob already with the 0, 1, and 2 settings seems
> incomplete to me without this highest level of restriction that 3
> would provide. That seems rather arbitrary to me. :)
>
> Let me take this another way instead. What would be a better way to
> provide a mechanism for system owners to disable perf without an LSM?
> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
> wide a coverage as possible.)
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security
>
Jeffrey Vander Stoep Aug. 2, 2016, 9:16 p.m. UTC | #10
Far from trying to kill perf, we want (and require) perf to be
available to developers on Android. All that this patch enables us to
do is gate it behind developer settings - just like we do with other
developer targeted features.

(apologies for the dup, bounced due to non-plaintext)

On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote:
>
>> Now, obviously, these API have huge value, otherwise they wouldn't
>> exist in the first place, and they wouldn't be built into end-user
>> kernels if they were universally undesirable. But that's not the
>> situation: the APIs are needed, but they lack the appropriate knobs to
>> control their availability.
>
> So far so good, but I take exception with the suggestion that the
> proposed knob is appropriate.
>
>> And this isn't just about Android: regular
>> distro kernels (like Debian, who also uses this patch) tend to build
>> in everything so people can use whatever they want. But for admins
>> that want to reduce their systems' attack surface, there needs to be
>> ways to disable things like this.
>
> And here I think you're overestimating the knowledge of most admins.
>
>> > So the problem I have with this is that it will completely inhibit
>> > development of things like JITs that self-profile to re-compile
>> > frequently used code.
>>
>> This is a good example of a use-case where this knob would be turned
>> down. But for many many other use-cases, when presented with a
>> pre-built kernel, there isn't a way to remove the attack surface.
>
> No, quite the opposite. Having this knob will completely inhibit
> development of such applications. Worse it will probably render perf
> dead for quite a large body of developers.
>
> The moment you frame it like: perf or sekjurity, and even default to
> no-perf-because-sekjurity, a whole bunch of corporate IT departments
> will not enable this, even for their developers.
>
> Have you never had to 'root' your work machine to get work done? I have.
> Luckily this was pre-secure-boot times so it was trivial since I had
> physical access to the machine. But it still sucked I had to fight IT
> over mostly 'trivial' crap.
>
>> > I would much rather have an LSM hook where the security stuff can do
>> > more fine grained control of things. Allowing some apps perf usage while
>> > denying others.
>>
>> I'm not against an LSM, but I think it's needless complexity when
>> there is already a knob for this but it just doesn't go "high" enough.
>> :)
>
> So what will you to the moment the Google Dalvik guys come to you and
> say: "Hey, we want to do active profiling to do better on-line code
> generation?".
>
> I see 0 up-sides of this approach and, as per the above, a whole bunch
> of very serious downsides.
>
> A global (esp. default inhibited) knob is too coarse and limiting.
>
>
Ingo Molnar Aug. 3, 2016, 8:28 a.m. UTC | #11
* Kees Cook <keescook@chromium.org> wrote:

> > I see 0 up-sides of this approach and, as per the above, a whole bunch of very 
> > serious downsides.
> >
> > A global (esp. default inhibited) knob is too coarse and limiting.
> 
> I haven't suggested it be default inhibit in the upstream Kconfig. And
> having this knob already with the 0, 1, and 2 settings seems
> incomplete to me without this highest level of restriction that 3
> would provide. That seems rather arbitrary to me. :)

The default has no impact on the "it's too coarse and limiting" negative property 
of this patch, which is the show-stopper aspect. Please fix that aspect instead of 
trying to argue around it.

This isn't some narrow debugging mechanism we can turn on/off globally and forget 
about, this is a wide scope performance measurement and event logging 
infrastructure that is being utilized not just by developers but by apps and 
runtimes as well.

> Let me take this another way instead. What would be a better way to provide a 
> mechanism for system owners to disable perf without an LSM? (Since far fewer 
> folks run with an enforcing "big" LSM: I'm seeking as wide a coverage as 
> possible.)

Because in practice what will happen is that if the only option is to do something 
drastic for sekjurity, IT departments will do it - while if there's a more 
flexible mechanism that does not throw out the baby with the bath water that is 
going to be used.

This is as if 20 years ago you had submitted a patch to the early Linux TCP/IP 
networking code to be on/off via a global sysctl switch and told people that 
"in developer mode you can have networking, talk to your admin".

We'd have told you: "this switch is too coarse and limiting, please implement 
something better, like a list of routes which defines which IP ranges are 
accessible, and a privileged range of listen sockets ports and some flexible 
kernel side filtering mechanism to inhibit outgoing/incoming connections".

Global sysctls are way too coarse.

Thanks,

	Ingo
Daniel Micay Aug. 3, 2016, 12:28 p.m. UTC | #12
> The default has no impact on the "it's too coarse and limiting"
> negative property 
> of this patch, which is the show-stopper aspect. Please fix that
> aspect instead of 
> trying to argue around it.

Disabling perf events in the kernel configuration is even more limiting,
and is currently the alternative to doing it this way. It makes sense
for debugging features to be disabled in production releases of products
unless there's a way to control them where risk isn't increased. Having
a way to toggle it dynamically will allow it to be remain available.

> This isn't some narrow debugging mechanism we can turn on/off globally
> and forget 
> about, this is a wide scope performance measurement and event logging 
> infrastructure that is being utilized not just by developers but by
> apps and 
> runtimes as well.

The incredibly wide scope is why it's such a big security problem. If it
wasn't such a frequent source of vulnerabilities, it wouldn't have been
disabled for unprivileged users in grsecurity, Debian and then Android.
It's extended by lots of vendor code to specific to platforms too, so it
isn't just some core kernel code that's properly reviewed.

I don't think there are runtimes using this for JIT tracing. Perhaps it
doesn't actually suit their needs. It's a theoretical use case.

> > Let me take this another way instead. What would be a better way to
> > provide a 
> > mechanism for system owners to disable perf without an LSM? (Since
> > far fewer 
> > folks run with an enforcing "big" LSM: I'm seeking as wide a
> > coverage as 
> > possible.)
> 
> Because in practice what will happen is that if the only option is to
> do something 
> drastic for sekjurity, IT departments will do it - while if there's a
> more 
> flexible mechanism that does not throw out the baby with the bath
> water that is 
> going to be used.

It's already done: Android and Debian disable it for unprivileged users
by default. It's already done for most of desktop and mobile Linux and
perhaps even most servers. Not providing functionality desired by
downstream doesn't mean it won't be provided there.

They'll keep doing it whether or not this lands. If it doesn't land, it
will only mean that mainline kernels aren't usable for making Android
devices. They need to pass the compatibility test suite, which means
having this. The mechanism could change but I don't see why the actual
requirement would.

> This is as if 20 years ago you had submitted a patch to the early
> Linux TCP/IP 
> networking code to be on/off via a global sysctl switch and told
> people that 
> "in developer mode you can have networking, talk to your admin".
> 
> We'd have told you: "this switch is too coarse and limiting, please
> implement 
> something better, like a list of routes which defines which IP ranges
> are 
> accessible, and a privileged range of listen sockets ports and some
> flexible 
> kernel side filtering mechanism to inhibit outgoing/incoming
> connections".
> 
> Global sysctls are way too coarse.

The end result of submitting an LSM hook instead of using this would
probably come down to having a global sysctl toggle in Yama. There would
be two sysctl controls for perf restrictions rather than one which is
needless complexity for the interface.

Android and Debian would be using a fine-grained perf LSM to accomplish
the same thing: globally disabling it for unprivileged users when not
doing development. The nice thing about fine-grained control would be
implementing a *more* restrictive policy for the case when it's toggled
on. It wouldn't necessarily have to be globally enabled, only enabled
for the relevant user or even a specific process, but that would be
complicated to implement.
Daniel Micay Aug. 3, 2016, 12:53 p.m. UTC | #13
Having this in Yama would also make it probable that there would be a
security-centric default. It would end up wiping out unprivileged perf
events access on distributions using Yama for ptrace_scope unless they
make the explicit decision to disable it. Having the perf subsystem
extend the existing perf_event_paranoid sysctl leaves the control over
the upstream default in the hands of the perf subsystem, not LSMs.
Peter Zijlstra Aug. 3, 2016, 1:36 p.m. UTC | #14
On Wed, Aug 03, 2016 at 08:28:10AM -0400, Daniel Micay wrote:
> I don't think there are runtimes using this for JIT tracing. Perhaps it
> doesn't actually suit their needs. It's a theoretical use case.

I know there are compiler teams using perf for FDO, see for example:

  https://gcc.gnu.org/wiki/AutoFDO/Tutorial

LLVM also has AutoFDO support AFAIU.

There is no reason JITs could not also do this, and IIRC there's JITs
build on top of LLVM, so it shouldn't be too hard to imagine an AutoFDO
enabled JIT.
Peter Zijlstra Aug. 3, 2016, 2:41 p.m. UTC | #15
On Tue, Aug 02, 2016 at 01:51:47PM -0700, Kees Cook wrote:
> Let me take this another way instead. What would be a better way to
> provide a mechanism for system owners to disable perf without an LSM?
> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
> wide a coverage as possible.)

Could something like a new capability bit work?

I'm thinking that applications that have network connections already
drop all possible capabilities (I know, unlikely to be true, but should
be true for most stuff I hope). This would disable perf for remote code
execution exploits, including web-browsers and the lot.

It would keep perf working for local stuff by default, although
obviously with pam_cap you can limit this when and where needed.

For Android this could mean the JVM explicitly dropping the cap for its
'children' while retaining the use itself. And this would also keep perf
working on the ADB shell stuff.


And, I think this would allow a JIT executable to gain the cap using
file caps, even when the user using it doesn't have it, which would keep
things usable even in restricted environments.


Or am I misunderstanding capabilities -- which is entirely possible.
Casey Schaufler Aug. 3, 2016, 3:42 p.m. UTC | #16
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Wednesday, August 03, 2016 7:41 AM
> To: Kees Cook <keescook@chromium.org>
> Cc: Jeff Vander Stoep <jeffv@google.com>; Ingo Molnar <mingo@redhat.com>;
> Arnaldo Carvalho de Melo <acme@kernel.org>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; linux-doc@vger.kernel.org; kernel-
> hardening@lists.openwall.com; LKML <linux-kernel@vger.kernel.org>;
> Jonathan Corbet <corbet@lwn.net>; Eric W. Biederman
> <ebiederm@xmission.com>
> Subject: Re: [kernel-hardening] Re: [PATCH 1/2] security, perf: allow further
> restriction of perf_event_open
> 
> On Tue, Aug 02, 2016 at 01:51:47PM -0700, Kees Cook wrote:
> > Let me take this another way instead. What would be a better way to
> > provide a mechanism for system owners to disable perf without an LSM?
> > (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
> > wide a coverage as possible.)
> 
> Could something like a new capability bit work?


Yes, it could, but we don't have a real good experience
with the advanced use of capabilities. The big distros and
the "OS"s have yet to adopt them seriously. It's the same
sort of issue, really. The granularity gets to you.

> I'm thinking that applications that have network connections already
> drop all possible capabilities (I know, unlikely to be true, but should
> be true for most stuff I hope). This would disable perf for remote code
> execution exploits, including web-browsers and the lot.
> 
> It would keep perf working for local stuff by default, although
> obviously with pam_cap you can limit this when and where needed.
> 
> For Android this could mean the JVM explicitly dropping the cap for its
> 'children' while retaining the use itself. And this would also keep perf
> working on the ADB shell stuff.
> 
> 
> And, I think this would allow a JIT executable to gain the cap using
> file caps, even when the user using it doesn't have it, which would keep
> things usable even in restricted environments.
> 
> 
> Or am I misunderstanding capabilities -- which is entirely possible.
Eric W. Biederman Aug. 3, 2016, 5:25 p.m. UTC | #17
Sigh.

Kees we have already had this conversation about user namespaces and
apparently you missed the point.

As I have said before the problem with a system wide off switch is what
happens when you have a single application that needs to use the
feature.  Without care your system wide protection disappears.
That is very brittle design.

What I see as much more palatable is a design that allows for
features to be turned off in sandboxes.

So please if you are going to worry about disabling large swaths of
the kernel to reduce the attack surface please come up with designs
that are not brittle in allowing users to use a feature nor are they
brittle in keeping the feature disabled where you want it disabled.


One of the strengths of linux is applications of features the authors of
the software had not imagined.  Your proposals seem to be trying to put
the world a tiny little box where if someone had not imagined and
preapproved a use of a feature it should not happen.   Let's please
avoid implementing totalitarianism to avoid malicious code exploiting
bugs in the kernel.  I am not interested in that future.

Especially when dealing with disabling code to reduce attack surface,
when then are no known attacks what we are actually dealing with
is a small percentage probability reduction that a malicious attacker
will be able to exploit the attack.

Remember security is as much about availability as it is about
integrity.  You keep imagining features that are great big denial of
service attacks on legitimate users.

Kees Cook <keescook@chromium.org> writes:

> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Let me take this another way instead. What would be a better way to
> provide a mechanism for system owners to disable perf without an LSM?
> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
> wide a coverage as possible.)

I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
Perhaps something else.

Eric
Kees Cook Aug. 3, 2016, 6:53 p.m. UTC | #18
On Wed, Aug 3, 2016 at 10:25 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sigh.
>
> Kees we have already had this conversation about user namespaces and
> apparently you missed the point.

Well, I didn't miss the point: that's why I CCed you. :) This is
nearly the same discussion (though in this case there is already a
sysctl, hence this thread).

> As I have said before the problem with a system wide off switch is what
> happens when you have a single application that needs to use the
> feature.  Without care your system wide protection disappears.
> That is very brittle design.

Yeah, the use-cases tend to be "never", "always", "this process", and
"this user". The way Android would like to be handling the permission,
though, gets back to another thing we talked about briefly when
discussing userns: revocation. Android wants to turn the entire
permission on and off across the entire system, so things like
capabilities or privileged fds or something don't quite fit that
either.

> What I see as much more palatable is a design that allows for
> features to be turned off in sandboxes.
>
> So please if you are going to worry about disabling large swaths of
> the kernel to reduce the attack surface please come up with designs
> that are not brittle in allowing users to use a feature nor are they
> brittle in keeping the feature disabled where you want it disabled.

Yup, Linus asked for this too. I haven't come up with anything
particularly great yet. :P

> One of the strengths of linux is applications of features the authors of
> the software had not imagined.  Your proposals seem to be trying to put
> the world a tiny little box where if someone had not imagined and
> preapproved a use of a feature it should not happen.   Let's please
> avoid implementing totalitarianism to avoid malicious code exploiting
> bugs in the kernel.  I am not interested in that future.

To be clear: I'm interested in giving system owners greater control
over what's exposed. That's not about limiting access everywhere. And
I'm interested in making sure that the upstream kernel actually
provides what end-users want. In the most extreme version of this is
when distros carry kernel patches to get it done (this was true with
userns and is true again here with perf). This IS a desired features,
and it exists in the world. I want to avoid the confusion that arises
from people running patched kernels: upstream developers don't realize
what state their features are in when they reach end users,
documentation doesn't match, etc etc.

However, I do accept that the existing mechanisms that have served
Linux over the years (sysctls, capabilities, etc) are woefully
insufficient in many of these cases.

> Especially when dealing with disabling code to reduce attack surface,
> when then are no known attacks what we are actually dealing with
> is a small percentage probability reduction that a malicious attacker
> will be able to exploit the attack.

But there are bugs. We just haven't found them. Based on the past
research by Jon Corbet (and again recently by me) we're sitting on at
least 1 new critical bug that we won't find for the next 5 years (and
we've got at least 1 more each that are on average 1, 2, 3, and 4
years old that we haven't found yet either).

> Remember security is as much about availability as it is about
> integrity.  You keep imagining features that are great big denial of
> service attacks on legitimate users.

That's not my goal: legitimate users should have access. That's up to
system owners. But I'd like to provide ways for system owners to keep
illegitimate users from having access. :)

> Kees Cook <keescook@chromium.org> writes:
>
>> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> Let me take this another way instead. What would be a better way to
>> provide a mechanism for system owners to disable perf without an LSM?
>> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
>> wide a coverage as possible.)
>
> I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
> Perhaps something else.

Peter, did you happen to see Eric's solution to this problem for
namespaces? Basically, a per-userns sysctl instead of a global sysctl.
Is that something that would be acceptable here?

-Kees
Daniel Micay Aug. 3, 2016, 7:36 p.m. UTC | #19
> One of the strengths of linux is applications of features the authors
> of
> the software had not imagined.  Your proposals seem to be trying to
> put
> the world a tiny little box where if someone had not imagined and
> preapproved a use of a feature it should not happen.   Let's please
> avoid implementing totalitarianism to avoid malicious code exploiting
> bugs in the kernel.  I am not interested in that future.

You're describing operating systems like Android, ChromeOS and iOS.

That future is already here and the Linux kernel is the major weak point
in the attempts to build those systems based on Linux. Even for the very
restricted Chrome sandbox, it's the easiest way out.

Android similarly allows near zero access to /sys for apps and little
access to /proc beyond the /proc/PID directories belonging to an app.

> Especially when dealing with disabling code to reduce attack surface,
> when then are no known attacks what we are actually dealing with
> is a small percentage probability reduction that a malicious attacker
> will be able to exploit the attack.

There are perf events vulnerabilities being exploited in the wild to
gain root on Android. It's not a theoretical attack vector. They're used
in both malware and rooting tools. Local privilege escalation bugs in
the kernel are common so there are a lot of alternatives but it's one of
the major sources for vulnerabilities. There's a lot of architecture and
vendor specific perf events code and lots of bleeding edge features. On
Android, a lot of the perf events vulnerabilities have been specific to
the Qualcomm SoC platform. Other platforms are likely just receiving a
lot less attention.

> Remember security is as much about availability as it is about
> integrity.  You keep imagining features that are great big denial of
> service attacks on legitimate users.

Only developers care about perf events and they still have access to it.
JIT compilers have other ways to do tracing and even if they consider
this to be the ideal API, it's not particularly important if they have
to settle for something else. In reality, it's a small compromise.

> I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
> Perhaps something else.

It's not possible to use the current incarnation of seccomp for this
since it can't be dynamically granted/revoked. Perhaps it would be
possible to support adding/removing or at least toggling seccomp filters
for groups of processes. That would be good enough to take care of user
ns, ptrace, perf events, etc.
Peter Zijlstra Aug. 3, 2016, 9:44 p.m. UTC | #20
On Wed, Aug 03, 2016 at 11:53:41AM -0700, Kees Cook wrote:
> > Kees Cook <keescook@chromium.org> writes:
> >
> >> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> Let me take this another way instead. What would be a better way to
> >> provide a mechanism for system owners to disable perf without an LSM?
> >> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
> >> wide a coverage as possible.)
> >
> > I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
> > Perhaps something else.
> 
> Peter, did you happen to see Eric's solution to this problem for
> namespaces? Basically, a per-userns sysctl instead of a global sysctl.
> Is that something that would be acceptable here?

Someone would have to educate me on what a userns is and how that would
help here.
Eric W. Biederman Aug. 4, 2016, 2:50 a.m. UTC | #21
Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Aug 03, 2016 at 11:53:41AM -0700, Kees Cook wrote:
>> > Kees Cook <keescook@chromium.org> writes:
>> >
>> >> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> Let me take this another way instead. What would be a better way to
>> >> provide a mechanism for system owners to disable perf without an LSM?
>> >> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as
>> >> wide a coverage as possible.)
>> >
>> > I vote for sandboxes.  Perhaps seccomp.  Perhaps a per userns sysctl.
>> > Perhaps something else.
>> 
>> Peter, did you happen to see Eric's solution to this problem for
>> namespaces? Basically, a per-userns sysctl instead of a global sysctl.
>> Is that something that would be acceptable here?
>
> Someone would have to educate me on what a userns is and how that would
> help here.

userns is an abbreviation for user namespace.

How it might help is that it is an easy unescapable context for
processes.  Essentialy the idea is to limit the scope of the sysctl to a
container.

User namespaces run into flack because while tremendously simple in
themselves the code takes advantage of the fact that suid root
executables in a user namespace do not have privileges on anything
outside of the user namespace.  Which means that it is semantically safe
to allow operations like creating mount namespaces, mount filesystems,
creating network namespaces, manipulating the network stack etc.  All of
which allows unprivileged users (that can create network namespaces) to
exercise more kernel code and exercise those bugs.

Fundamentally user namespaces as objects you can create need limits on
the maximum number of user namespaces you can create to cawtch run away
processes.  Set the limit you can create to 0 and you get what Kees
wants.

In my pending patches that were not quite ready for the merge window, I
added a sysctl that described the maximum number of user namepaces that
could be created (default value threads-max), and implemented the sysctl
in a per user way.  Such that counts and limits were kept for every user
namespace.  In a nested user namespace (which are all of them except for
the initial user namspace) the count and limit would be checked in the
current user namepsace, then the count would be incremented in the parent
and verified the count was below the limit in the parent user namespace.

What this means in practice is user namespaces can be enabled by default
on a system, and yet you can easily disable them in a sandbox that was
built with a user namespace.

I named the new sysctls in my patch:
/proc/sys/userns/max_user_namespaces
/proc/sys/userns/max_pid_namespaces
/proc/sys/userns/max_net_namespaces
/proc/sys/userns/max_uts_namespaces
/proc/sys/userns/max_ipc_namespaces
/proc/sys/userns/max_cgroup_namespaces
/proc/sys/userns/max_mnt_namespaces

What Kees was suggesting was to add a similar sysctl say:
/proc/sys/userns/perf_event_enabled

And have the ability to disable perf events in each user namespaces.
While still being able to leave usage perf events enabled by default.

I don't know if any of that is a good fit for perf events.

For purposes of this discussion I assume we are limiting ourselves to
discussing userspace tracing, which semantically is 100% fine for
access by userspace.

Eric
Peter Zijlstra Aug. 4, 2016, 9:11 a.m. UTC | #22
On Wed, Aug 03, 2016 at 09:50:37PM -0500, Eric W. Biederman wrote:

> What this means in practice is user namespaces can be enabled by default
> on a system, and yet you can easily disable them in a sandbox that was
> built with a user namespace.
> 
> I named the new sysctls in my patch:
> /proc/sys/userns/max_user_namespaces
> /proc/sys/userns/max_pid_namespaces
> /proc/sys/userns/max_net_namespaces
> /proc/sys/userns/max_uts_namespaces
> /proc/sys/userns/max_ipc_namespaces
> /proc/sys/userns/max_cgroup_namespaces
> /proc/sys/userns/max_mnt_namespaces
> 
> What Kees was suggesting was to add a similar sysctl say:
> /proc/sys/userns/perf_event_enabled
> 
> And have the ability to disable perf events in each user namespaces.
> While still being able to leave usage perf events enabled by default.
> 
> I don't know if any of that is a good fit for perf events.
> 
> For purposes of this discussion I assume we are limiting ourselves to
> discussing userspace tracing, which semantically is 100% fine for
> access by userspace.

Right, so its basically a 'root' namespace. Not sure how this would
help, or cover the use-cases with perf through.

Do they really only care about the sandbox? I can imagine this being
sufficient for Android as that could do these userns thingies for each
app or whatnot. But does this cover the case Debian disabled perf for?
I'm not sure I've ever seen it described _why_ they did it.

So far I'm still liking the new capability bit better, assuming I
understood those right.
Mark Rutland Aug. 4, 2016, 10:28 a.m. UTC | #23
On Wed, Aug 03, 2016 at 03:36:16PM -0400, Daniel Micay wrote:
> There's a lot of architecture and vendor specific perf events code and
> lots of bleeding edge features. On Android, a lot of the perf events
> vulnerabilities have been specific to the Qualcomm SoC platform. Other
> platforms are likely just receiving a lot less attention.

Are the relevant perf drivers for those platforms upstream? I've seen no
patches addressing security issues in the ARMv7 krait+Scorpion PMU
driver since it was added, and there's no ARMv8 QCOM PMU driver.

If there are outstanding issues, please report them upstream.

FWIW, I've used Vince Weaver's perf fuzzer to test the ARM PMU code
(both the framework and drivers), so other platforms are seeing some
attention. That said, I haven't done that recently.

Thanks,
Mark.
Daniel Micay Aug. 4, 2016, 1:45 p.m. UTC | #24
On Thu, 2016-08-04 at 11:28 +0100, Mark Rutland wrote:
> On Wed, Aug 03, 2016 at 03:36:16PM -0400, Daniel Micay wrote:
> > 
> > There's a lot of architecture and vendor specific perf events code
> > and
> > lots of bleeding edge features. On Android, a lot of the perf events
> > vulnerabilities have been specific to the Qualcomm SoC platform.
> > Other
> > platforms are likely just receiving a lot less attention.
> 
> Are the relevant perf drivers for those platforms upstream? I've seen
> no
> patches addressing security issues in the ARMv7 krait+Scorpion PMU
> driver since it was added, and there's no ARMv8 QCOM PMU driver.
> 
> If there are outstanding issues, please report them upstream.
> 
> FWIW, I've used Vince Weaver's perf fuzzer to test the ARM PMU code
> (both the framework and drivers), so other platforms are seeing some
> attention. That said, I haven't done that recently.

Qualcomm's perf driver is out-of-tree along with most of their other
drivers. Their drivers add up to a LOT of code shared across over a
billion mobile devices, leading to the focus on them. It also helps that
there are bounties for Nexus devices, so there are multi thousand dollar
rewards for bugs in the Qualcomm drivers compared to nothing for other
platforms / drivers. Now that perf is only available via ADB debugging,
further perf bugs no longer technically qualify for their bounties (but
they might still pay, I don't know).
Peter Zijlstra Aug. 4, 2016, 2:11 p.m. UTC | #25
On Thu, Aug 04, 2016 at 09:45:23AM -0400, Daniel Micay wrote:
> Qualcomm's perf driver is out-of-tree along with most of their other
> drivers. 


So you're asking us to maim upstream perf for some out of tree junk?
Srously? *plonk*
Eric W. Biederman Aug. 4, 2016, 3:13 p.m. UTC | #26
Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Aug 03, 2016 at 09:50:37PM -0500, Eric W. Biederman wrote:
>
>> What this means in practice is user namespaces can be enabled by default
>> on a system, and yet you can easily disable them in a sandbox that was
>> built with a user namespace.
>> 
>> I named the new sysctls in my patch:
>> /proc/sys/userns/max_user_namespaces
>> /proc/sys/userns/max_pid_namespaces
>> /proc/sys/userns/max_net_namespaces
>> /proc/sys/userns/max_uts_namespaces
>> /proc/sys/userns/max_ipc_namespaces
>> /proc/sys/userns/max_cgroup_namespaces
>> /proc/sys/userns/max_mnt_namespaces
>> 
>> What Kees was suggesting was to add a similar sysctl say:
>> /proc/sys/userns/perf_event_enabled
>> 
>> And have the ability to disable perf events in each user namespaces.
>> While still being able to leave usage perf events enabled by default.
>> 
>> I don't know if any of that is a good fit for perf events.
>> 
>> For purposes of this discussion I assume we are limiting ourselves to
>> discussing userspace tracing, which semantically is 100% fine for
>> access by userspace.
>
> Right, so its basically a 'root' namespace. Not sure how this would
> help, or cover the use-cases with perf through.

The bits useful to the perf situation are:
- user namespaces nest.
- anyone can create a user namespace.
- a sysctl can be bound to the userns that takes local privilege to
  change so you can't override it arbitrarily.

Which is a long way of saying a user namespace is one way of marking
processes that may or may not use perf.

It was given in this case as an example of something that has been
looked at that appears to solve peoples concerns.

Another way to achieve a similar effect is to build something like an
rlimit.

What is attractive to me semantically about something like this is
applications that have perf_event disabled can still be traced with perf.

> Do they really only care about the sandbox? I can imagine this being
> sufficient for Android as that could do these userns thingies for each
> app or whatnot.

So the question is how do we want to apply policy in this case.
If the only concern is that there might be some bug somewhere
in the code that is undiscovered and people who don't use a feature
don't want to have to worry about it, disabling things at the
application level makes sense.

In my mind a sandbox is policy like this that I apply to my application,
in contrast a sandbox approach with a global disable or some other
specific poicy that the system administrator applies.

The really important property that I think needs to exist is a less than
system granularity.  So a solution that doesn't disable it for everyone
and doesn't disable something by default can be deployed while still
allowing the feature to be disabled where people don't want to take
the chance (such as in network facing daemons like apache).

> But does this cover the case Debian disabled perf for?
> I'm not sure I've ever seen it described _why_ they did it.

Good question.  I suspect someone should ask.  Especially since debian
defaults to 3.  perf event is disabled for everyone.

> So far I'm still liking the new capability bit better, assuming I
> understood those right.

Your subsystem your call.  I have never had much luck with capability
bits.  They are not particularly flexible, and are hard to get rid of
permanently any suid root app gains them all.

But it isn't a particularly easy problem and I don't think we have any
solutions that have lasted the test of time for this kind of thing other
than seccomp.

Eric
Peter Zijlstra Aug. 4, 2016, 3:37 p.m. UTC | #27
On Thu, Aug 04, 2016 at 10:13:29AM -0500, Eric W. Biederman wrote:

> The bits useful to the perf situation are:
> - user namespaces nest.
> - anyone can create a user namespace.
> - a sysctl can be bound to the userns that takes local privilege to
>   change so you can't override it arbitrarily.
> 
> Which is a long way of saying a user namespace is one way of marking
> processes that may or may not use perf.
> 
> It was given in this case as an example of something that has been
> looked at that appears to solve peoples concerns.

> What is attractive to me semantically about something like this is
> applications that have perf_event disabled can still be traced with perf.

> > So far I'm still liking the new capability bit better, assuming I
> > understood those right.
> 
> Your subsystem your call.  I have never had much luck with capability
> bits.  They are not particularly flexible, and are hard to get rid of
> permanently any suid root app gains them all.

Right, so I've no experience with any of this.

But from what I understood amluto recently made capabilities much more
useful with: 58319057b784 ("capabilities: ambient capabilities").

And the thing I like is having file capabilities, so even though the
user cannot in general create perf events, we could mark the
/usr/bin/perf executable as having CAP_PERF and allow it to create them,
because its a 'trusted' executable.

Can something like that be done with userns? Afaiu once you create a
userns with perf disabled, even a nested one cannot re-enable it,
otherwise you cannot create sandboxes.
Daniel Micay Aug. 4, 2016, 3:44 p.m. UTC | #28
On Thu, 2016-08-04 at 16:11 +0200, Peter Zijlstra wrote:
> On Thu, Aug 04, 2016 at 09:45:23AM -0400, Daniel Micay wrote:
> > 
> > Qualcomm's perf driver is out-of-tree along with most of their other
> > drivers. 
> 
> 
> So you're asking us to maim upstream perf for some out of tree junk?
> Srously? *plonk*

This feature doesn't come from Android. The perf events subsystem in the
mainline kernel is packed full of vulnerabilities too. The problem is so
bad that pointing one of the public fuzzers at it for a short period of
time is all that's required to start finding them.

Qualcomm's drivers might be lower quality than core kernel code, but
they're way above the baseline set by mainline kernel drivers...

Shining the same light on mainline drivers wouldn't be pretty. The work
going into hardening the Qualcomm drivers isn't happening upstream to
any comparable extent.
Peter Zijlstra Aug. 4, 2016, 3:55 p.m. UTC | #29
On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote:

> This feature doesn't come from Android. The perf events subsystem in the
> mainline kernel is packed full of vulnerabilities too. 

Uhh, not so much. I spend a _lot_ of time a while back to get the core
and x86 solid. I could run the fuzzers for hours on end at some point.

> The problem is so bad that pointing one of the public fuzzers at it
> for a short period of time is all that's required to start finding
> them.

If you know of any that reproduce on x86 I'll go fix. For anything else
you need to complain elsewhere as I don't have hardware nor bandwidth.
Mark Rutland Aug. 4, 2016, 4:10 p.m. UTC | #30
On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote:
> Qualcomm's drivers might be lower quality than core kernel code, but
> they're way above the baseline set by mainline kernel drivers...

I don't think that's true for the arm/arm64 perf code.

I think we've done a reasonable job of testing and fixing those, along
with core infrastructure issues. The perf fuzzer runs for a very long
time on a mainline kernel without issues, while on my Nexus 5x I get a
hard lockup after ~85 seconds (and prior to the last android update the
lockup was instantaneous).

> Shining the same light on mainline drivers wouldn't be pretty. The work
> going into hardening the Qualcomm drivers isn't happening upstream to
> any comparable extent.

From my personal experience (and as above), and talking specifically
about PMU drivers, I think that the opposite is true. This is not to say
there aren't issues; I would not be surprised if there are. But it's
disingenuous to say that mainline code is worse than that which exists
in a vendor kernel when the latter is demonstrably much easier to break
than the former.

If there are issues you are aware of, please report them. If those
issues only exist in non-upstream code, then the applicable concerns are
somewhat different (though certainly still exist).

But please, let's frame the argument to match reality.

Thanks,
Mark.
Daniel Micay Aug. 4, 2016, 4:32 p.m. UTC | #31
On Thu, 2016-08-04 at 17:10 +0100, Mark Rutland wrote:
> On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote:
> > 
> > Qualcomm's drivers might be lower quality than core kernel code, but
> > they're way above the baseline set by mainline kernel drivers...
> 
> I don't think that's true for the arm/arm64 perf code.

The baseline architecture support is essentially core kernel code. I
agree it's much better than the SoC vendor code. You're spending a lot
of time auditing, fuzzing and improving the code in general, which is
not true for most drivers. They don't get that attention.

> I think we've done a reasonable job of testing and fixing those, along
> with core infrastructure issues. The perf fuzzer runs for a very long
> time on a mainline kernel without issues, while on my Nexus 5x I get a
> hard lockup after ~85 seconds (and prior to the last android update
> the
> lockup was instantaneous).
>
> From my personal experience (and as above), and talking specifically
> about PMU drivers, I think that the opposite is true. This is not to
> say
> there aren't issues; I would not be surprised if there are. But it's
> disingenuous to say that mainline code is worse than that which exists
> in a vendor kernel when the latter is demonstrably much easier to
> break
> than the former.

I wasn't talking specifically about perf.

> If there are issues you are aware of, please report them. If those
> issues only exist in non-upstream code, then the applicable concerns
> are
> somewhat different (though certainly still exist).

I'm not going to do volunteer work for a corporation. I've learned that
lesson after spending years doing it.

> But please, let's frame the argument to match reality.

The argument is framed in reality. Stating that it now often takes a few
hours to find a vulnerability with the unaltered, widely known public
perf fuzzer is not impressive. It's really an argument for claiming that
it's a significant security issue.
Mark Rutland Aug. 4, 2016, 5:09 p.m. UTC | #32
On Thu, Aug 04, 2016 at 12:32:32PM -0400, Daniel Micay wrote:
> On Thu, 2016-08-04 at 17:10 +0100, Mark Rutland wrote:
> I wasn't talking specifically about perf.

Then this is irrelevant to a discussion about limiting access to the
perf interface.

Hardening drivers in general is a very interesting topic, but it is a
different topic.

> > But please, let's frame the argument to match reality.
> 
> The argument is framed in reality. Stating that it now often takes a
> few hours to find a vulnerability with the unaltered, widely known
> public perf fuzzer is not impressive. It's really an argument for
> claiming that it's a significant security issue.

My claim was not that the mainline code was impressively perfect, but
rather that the vendor code was worse, countering a prior claim
otherwise. Hence, reality.

There is cetainly much that can be done to improve things, if we discuss
that which is actually applicable.

Thanks,
Mark.
Daniel Micay Aug. 4, 2016, 5:36 p.m. UTC | #33
> My claim was not that the mainline code was impressively perfect, but
> rather that the vendor code was worse, countering a prior claim
> otherwise. Hence, reality.

You're arguing with a straw man.

I was responding to a comment about out-of-tree code, not generic
architecture perf drivers vs. alternative versions by SoC vendors.

Qualcomm and other vendors landing their drivers in mainline would be
nice, but it wouldn't make it inherently higher quality. I don't really
see what it has to do with this, which I why I responded...
Mark Rutland Oct. 17, 2016, 1:44 p.m. UTC | #34
Hi,

Attempt to revive discussions below...

On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> all access to performance events by users without CAP_SYS_ADMIN.
> 
> This new level of restriction is intended to reduce the attack
> surface of the kernel. Perf is a valuable tool for developers but
> is generally unnecessary and unused on production systems. Perf may
> open up an attack vector to vulnerable device-specific drivers as
> recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
> restriction allows for a safe default to be set on production systems
> while leaving a simple means for developers to grant access [1].
> 
> This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
> Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches
> have been modified and split up to address on-list feedback.
> 
> kernel.perf_event_paranoid=3 is the default on both Debian [2] and
> Android [3].

While people weren't particularly happy with this global toggle
approach, my understanding from face-to-face discussions at LSS2016 was
that people were happy with a more scoped restriction (e.g. using
capabilities or some other access control mechanism), but no-one had the
time to work on that.

Does that match everyone's understanding, or am I mistaken?

It's also my understanding that for Android, perf_event_paranoid is
lowered when the user enables developer mode (rather than only when an
external debugger is attached); is that correct?

Thanks,
Mark.
Daniel Micay Oct. 17, 2016, 2:54 p.m. UTC | #35
On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote:
> Hi,
> 
> Attempt to revive discussions below...
> 
> On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> > When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> > all access to performance events by users without CAP_SYS_ADMIN.
> > 
> > This new level of restriction is intended to reduce the attack
> > surface of the kernel. Perf is a valuable tool for developers but
> > is generally unnecessary and unused on production systems. Perf may
> > open up an attack vector to vulnerable device-specific drivers as
> > recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
> > restriction allows for a safe default to be set on production
> > systems
> > while leaving a simple means for developers to grant access [1].
> > 
> > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
> > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches
> > have been modified and split up to address on-list feedback.
> > 
> > kernel.perf_event_paranoid=3 is the default on both Debian [2] and
> > Android [3].
> 
> While people weren't particularly happy with this global toggle
> approach, my understanding from face-to-face discussions at LSS2016
> was
> that people were happy with a more scoped restriction (e.g. using
> capabilities or some other access control mechanism), but no-one had
> the
> time to work on that.
> 
> Does that match everyone's understanding, or am I mistaken?
> 
> It's also my understanding that for Android, perf_event_paranoid is
> lowered when the user enables developer mode (rather than only when an
> external debugger is attached); is that correct?

It's exposed as a "system property" marked as writable by the shell
user, so the Android Debug Bridge shell can lower it. The debugging
tools learned how to toggle it off automatically when they're used. It
intentionally isn't a persist. prefixed property so the setting also
goes away on reboot.

ADB (incl. the shell user) isn't available until developer mode is
enabled + ADB is toggled on in the developer settings, and then it still
requires whitelisting keys.
Kees Cook Oct. 18, 2016, 8:48 p.m. UTC | #36
On Mon, Oct 17, 2016 at 6:44 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> Attempt to revive discussions below...
>
> On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
>> When kernel.perf_event_paranoid is set to 3 (or greater), disallow
>> all access to performance events by users without CAP_SYS_ADMIN.
>>
>> This new level of restriction is intended to reduce the attack
>> surface of the kernel. Perf is a valuable tool for developers but
>> is generally unnecessary and unused on production systems. Perf may
>> open up an attack vector to vulnerable device-specific drivers as
>> recently demonstrated in CVE-2016-0805, CVE-2016-0819,
>> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
>> restriction allows for a safe default to be set on production systems
>> while leaving a simple means for developers to grant access [1].
>>
>> This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
>> Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches
>> have been modified and split up to address on-list feedback.
>>
>> kernel.perf_event_paranoid=3 is the default on both Debian [2] and
>> Android [3].
>
> While people weren't particularly happy with this global toggle
> approach, my understanding from face-to-face discussions at LSS2016 was
> that people were happy with a more scoped restriction (e.g. using
> capabilities or some other access control mechanism), but no-one had the
> time to work on that.
>
> Does that match everyone's understanding, or am I mistaken?

That's correct: some kind of finer-grain control would be preferred to
the maintainer, but no one has had time to work on it. (The =3 sysctl
setting present in Android, Debian, and Ubuntu satisfies most people.)

-Kees
Daniel Micay Oct. 18, 2016, 9:15 p.m. UTC | #37
On Tue, 2016-10-18 at 13:48 -0700, Kees Cook wrote:
> On Mon, Oct 17, 2016 at 6:44 AM, Mark Rutland <mark.rutland@arm.com>
> wrote:
> > Hi,
> > 
> > Attempt to revive discussions below...
> > 
> > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote:
> > > When kernel.perf_event_paranoid is set to 3 (or greater), disallow
> > > all access to performance events by users without CAP_SYS_ADMIN.
> > > 
> > > This new level of restriction is intended to reduce the attack
> > > surface of the kernel. Perf is a valuable tool for developers but
> > > is generally unnecessary and unused on production systems. Perf
> > > may
> > > open up an attack vector to vulnerable device-specific drivers as
> > > recently demonstrated in CVE-2016-0805, CVE-2016-0819,
> > > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
> > > restriction allows for a safe default to be set on production
> > > systems
> > > while leaving a simple means for developers to grant access [1].
> > > 
> > > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
> > > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's
> > > patches
> > > have been modified and split up to address on-list feedback.
> > > 
> > > kernel.perf_event_paranoid=3 is the default on both Debian [2] and
> > > Android [3].
> > 
> > While people weren't particularly happy with this global toggle
> > approach, my understanding from face-to-face discussions at LSS2016
> > was
> > that people were happy with a more scoped restriction (e.g. using
> > capabilities or some other access control mechanism), but no-one had
> > the
> > time to work on that.
> > 
> > Does that match everyone's understanding, or am I mistaken?
> 
> That's correct: some kind of finer-grain control would be preferred to
> the maintainer, but no one has had time to work on it. (The =3 sysctl
> setting present in Android, Debian, and Ubuntu satisfies most people.)
> 
> -Kees

It's also worth noting that fine-grained control via a scoped mechanism
would likely only be used to implement *more restrictions* on Android,
not to make the feature less aggressive. It's desirable for perf events
to be disabled by default for non-root across the board on Android. The
part that's imperfect is that when a developer uses a profiling tool,
unprivileged usage is automatically enabled across the board until
reboot. Ideally, it would be enabled only for the scope where it's
needed. It would be very tricky to implement though, especially without
adding friction, and it would only have value for protecting devices
being used for development. It really doesn't seem to be worth the
trouble, especially since it doesn't persist on reboot. It's only a
temporary security hole and only for developer devices.
Mark Rutland Oct. 19, 2016, 9:41 a.m. UTC | #38
On Mon, Oct 17, 2016 at 10:54:33AM -0400, Daniel Micay wrote:
> On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote:
> > It's also my understanding that for Android, perf_event_paranoid is
> > lowered when the user enables developer mode (rather than only when an
> > external debugger is attached); is that correct?
> 
> It's exposed as a "system property" marked as writable by the shell
> user, so the Android Debug Bridge shell can lower it. The debugging
> tools learned how to toggle it off automatically when they're used. It
> intentionally isn't a persist. prefixed property so the setting also
> goes away on reboot.
> 
> ADB (incl. the shell user) isn't available until developer mode is
> enabled + ADB is toggled on in the developer settings, and then it still
> requires whitelisting keys.

Ah; so I'd misunderstood somewhat.

I was under the (clearly mistaken) impression that this was lowered when
developer mode was enabled, rather than only when it was both enabled
and ADB was connected, for example.

Thanks for clearing that up!

Thanks,
Mark.
Mark Rutland Oct. 19, 2016, 9:56 a.m. UTC | #39
On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote:
> It's also worth noting that fine-grained control via a scoped
> mechanism would likely only be used to implement *more restrictions*
> on Android, not to make the feature less aggressive. It's desirable
> for perf events to be disabled by default for non-root across the
> board on Android.  The part that's imperfect is that when a developer
> uses a profiling tool, unprivileged usage is automatically enabled
> across the board until reboot. Ideally, it would be enabled only for
> the scope where it's needed. 

Sure; understood.

> It would be very tricky to implement though, especially without adding
> friction, and it would only have value for protecting devices being
> used for development. It really doesn't seem to be worth the trouble,
> especially since it doesn't persist on reboot. It's only a temporary
> security hole and only for developer devices.

I can see that for Android this isn't much of a win. It is beneficial
elsewhere, and covers a larger set of use-cases.

If perf were a filesystem object, we'd only allow access by a given
'perf' group, and that would be sufficient to avoid most of that
friction (IIUC). I wonder what we can do that's similar.

Thanks,
Mark.
Peter Zijlstra Oct. 19, 2016, 10:01 a.m. UTC | #40
On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote:
> It's also worth noting that fine-grained control via a scoped mechanism
> would likely only be used to implement *more restrictions* on Android,
> not to make the feature less aggressive.

> It's desirable for perf events to be disabled by default for non-root
> across the board on Android.

Right, but this is Android. The knob seems to now also live in Debian
(and derived) distros. And there it is utter crap.

It completely defeats having perf for a fairly large segment of
corporate developers who do not get to have root on their own machines
(which is stupid policy but whatever).

It similarly defeats development of self profiling JITs and whatnot.

A capability would allow people to run perf (or another sanctioned
binary), even though in general they cannot do sys_perf_event_open().
Arnaldo Carvalho de Melo Oct. 19, 2016, 10:26 a.m. UTC | #41
Em Wed, Oct 19, 2016 at 12:01:26PM +0200, Peter Zijlstra escreveu:
> On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote:
> > It's also worth noting that fine-grained control via a scoped mechanism
> > would likely only be used to implement *more restrictions* on Android,
> > not to make the feature less aggressive.
 
> > It's desirable for perf events to be disabled by default for non-root
> > across the board on Android.
 
> Right, but this is Android. The knob seems to now also live in Debian
> (and derived) distros. And there it is utter crap.
 
> It completely defeats having perf for a fairly large segment of
> corporate developers who do not get to have root on their own machines
> (which is stupid policy but whatever).
 
> It similarly defeats development of self profiling JITs and whatnot.
 
> A capability would allow people to run perf (or another sanctioned
> binary), even though in general they cannot do sys_perf_event_open().

But self profiling JITs would be useful for non-developers, on Android
(anywhere, really), and for that it would require being able to at
least, well, self profile, using sys_perf_event_open() by a normal
process, limited to profiling itself, no?

This not being possible, self profiling will use some other means, its
like sys_perf_event_open() never existed for them.

- Arnaldo
Peter Zijlstra Oct. 19, 2016, 10:40 a.m. UTC | #42
On Wed, Oct 19, 2016 at 07:26:02AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 19, 2016 at 12:01:26PM +0200, Peter Zijlstra escreveu:
> > On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote:
> > > It's also worth noting that fine-grained control via a scoped mechanism
> > > would likely only be used to implement *more restrictions* on Android,
> > > not to make the feature less aggressive.
>  
> > > It's desirable for perf events to be disabled by default for non-root
> > > across the board on Android.
>  
> > Right, but this is Android. The knob seems to now also live in Debian
> > (and derived) distros. And there it is utter crap.
>  
> > It completely defeats having perf for a fairly large segment of
> > corporate developers who do not get to have root on their own machines
> > (which is stupid policy but whatever).
>  
> > It similarly defeats development of self profiling JITs and whatnot.
>  
> > A capability would allow people to run perf (or another sanctioned
> > binary), even though in general they cannot do sys_perf_event_open().
> 
> But self profiling JITs would be useful for non-developers, on Android
> (anywhere, really), and for that it would require being able to at
> least, well, self profile, using sys_perf_event_open() by a normal
> process, limited to profiling itself, no?
> 
> This not being possible, self profiling will use some other means, its
> like sys_perf_event_open() never existed for them.

Right, so with capabilities, we could grant the binary the capability to
use sys_perf_event_open().

That would still leave developers of said JIT in a tight spot, because
there'd be no way to set the capability on their freshly compiled
binary.

They'd have to be granted the capability to the user, using pam_cap.
Which would involve corp. IT doing something sensible, ergo, this'll
never happen :-(.
Daniel Micay Oct. 19, 2016, 3:16 p.m. UTC | #43
On Wed, 2016-10-19 at 10:41 +0100, Mark Rutland wrote:
> On Mon, Oct 17, 2016 at 10:54:33AM -0400, Daniel Micay wrote:
> > On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote:
> > > It's also my understanding that for Android, perf_event_paranoid
> > > is
> > > lowered when the user enables developer mode (rather than only
> > > when an
> > > external debugger is attached); is that correct?
> > 
> > It's exposed as a "system property" marked as writable by the shell
> > user, so the Android Debug Bridge shell can lower it. The debugging
> > tools learned how to toggle it off automatically when they're used.
> > It
> > intentionally isn't a persist. prefixed property so the setting also
> > goes away on reboot.
> > 
> > ADB (incl. the shell user) isn't available until developer mode is
> > enabled + ADB is toggled on in the developer settings, and then it
> > still
> > requires whitelisting keys.
> 
> Ah; so I'd misunderstood somewhat.
> 
> I was under the (clearly mistaken) impression that this was lowered
> when
> developer mode was enabled, rather than only when it was both enabled
> and ADB was connected, for example.
> 
> Thanks for clearing that up!

ADB provides a shell as the 'shell' user, and that user has the ability
to toggle the sysctl. So profiling tools were able to be taught to do
that automatically. It's the only way that the 'shell' user is actually
exposed. For example, a terminal app just runs in the untrusted_app
SELinux domain as a unique unprivileged uid/gid pair, not as the much
more privileged ADB 'shell' domain.

So it doesn't actually get toggled off you use ADB to do something else.

ADB itself is pretty much comparable to SSH, but over USB (i.e. key-
based way of getting a shell).

The 'shell' user has tools like 'run-as' to be able to run things as
various apps (if they are marked debuggable), so in theory it could be
finer-grained and act only there, for the app being debugged. It would
be really hard to cover all use cases and maybe things other than apps
though (although in an Android 'user' build, the base system itself
isn't very debuggable, you really need 'userdebug' or 'eng' which isn't
what ships on end user devices).
Daniel Micay Oct. 19, 2016, 3:39 p.m. UTC | #44
On Wed, 2016-10-19 at 07:26 -0300, Arnaldo Carvalho de Melo wrote:
> 
> But self profiling JITs would be useful for non-developers, on Android
> (anywhere, really), and for that it would require being able to at
> least, well, self profile, using sys_perf_event_open() by a normal
> process, limited to profiling itself, no?
> 
> This not being possible, self profiling will use some other means, its
> like sys_perf_event_open() never existed for them.
> 
> - Arnaldo

It would defeat the purpose of the security feature if it was exposed to
apps on Android. There are other ways for Chromium (including WebView)
and the ART JIT to profile. AFAIK, they've never actually considered
using perf for their JIT profiling. It wasn't something that anyone
cared about when this was implemented.

Chromium would *certainly* never use perf for this. They use a much
tighter sandbox than the Android app sandbox. It doesn't have system
calls like open(), let alone perf events. Their seccomp-bpf usage is to
reduce kernel attack surface, since it has proven to be the easiest way
out of a sandbox. They don't even allow futex() without filtering based 
on the parameters anymore. That proved to be a major
issue.

The only case where untrusted apps declaring the privileges they need
actually works out is when the privileges are high-level controls over
access to a user's personal information. That way, they can be exposed
to the user with the ability to reject access when it's first needed or
toggle it off later. The strength of the app sandbox isn't something
that end users can be responsible for. Permissions also don't work if
apps bypass them with local privilege escalation vulnerabilities.

Even for the base system, no perf access is better than having it used
anywhere. The difference is only that the base system can actually be
trusted to declare what it needs, but those components can be exploited
so it's still best if they are trusted as little as possible at runtime.

I could see Android completely removing unprivileged access down the
road too, not as a form of access control (apps already run as unique
uid/gid pairs, etc.) but to remove a non-trivial form of kernel attack
surface. The perf API isn't being singled out here. It just happened to
be the first case where a kernel API was restricted to developer usage
on Android. Should expect more of this.

If you want perf exposed, make it secure. Stop writing kernel code in a
memory unsafe language and start using isolation. Not going to happen in
all likelihood, so kernel code will be increasingly walled off instead
since it's the biggest liability on the system. Fixing bugs on a case by
case basis doesn't work for systems that need even basic security.

Patch
diff mbox

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..fac9798 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -665,6 +665,7 @@  users (without CAP_SYS_ADMIN).  The default value is 2.
 >=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
 >=1: Disallow CPU event access by users without CAP_SYS_ADMIN
 >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
+>=3: Disallow all event access by users without CAP_SYS_ADMIN
 
 ==============================================================
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8ed43261..1e2080f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1156,6 +1156,11 @@  static inline bool perf_paranoid_kernel(void)
 	return sysctl_perf_event_paranoid > 1;
 }
 
+static inline bool perf_paranoid_any(void)
+{
+	return sysctl_perf_event_paranoid > 2;
+}
+
 extern void perf_event_init(void);
 extern void perf_tp_event(u16 event_type, u64 count, void *record,
 			  int entry_size, struct pt_regs *regs,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 356a6c7..52bd100 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -353,6 +353,7 @@  static struct srcu_struct pmus_srcu;
  *   0 - disallow raw tracepoint access for unpriv
  *   1 - disallow cpu events for unpriv
  *   2 - disallow kernel profiling for unpriv
+ *   3 - disallow all unpriv perf event use
  */
 int sysctl_perf_event_paranoid __read_mostly = 2;
 
@@ -9296,6 +9297,9 @@  SYSCALL_DEFINE5(perf_event_open,
 	if (flags & ~PERF_FLAG_ALL)
 		return -EINVAL;
 
+	if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN))
+				return -EACCES;
+
 	err = perf_copy_attr(attr_uptr, &attr);
 	if (err)
 		return err;