Message ID | 1469630746-32279-1-git-send-email-jeffv@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 27, 2016 at 7:45 AM, Jeff Vander Stoep <jeffv@google.com> wrote: > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > all access to performance events by users without CAP_SYS_ADMIN. > > This new level of restriction is intended to reduce the attack > surface of the kernel. Perf is a valuable tool for developers but > is generally unnecessary and unused on production systems. Perf may > open up an attack vector to vulnerable device-specific drivers as > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of > restriction allows for a safe default to be set on production systems > while leaving a simple means for developers to grant access [1]. > > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches > have been modified and split up to address on-list feedback. > > kernel.perf_event_paranoid=3 is the default on both Debian [2] and > Android [3]. > > [1] Making perf available to developers on Android: > https://android-review.googlesource.com/#/c/234400/ > [2] Original patch by Ben Hutchings: > https://lkml.org/lkml/2016/1/11/587 > [3] https://android-review.googlesource.com/#/c/234743/ > > Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Thanks for splitting this up! It'll be nice to have this delta out of Debian and Android. Reviewed-by: Kees Cook <keescook@chromium.org> -Kees > --- > Documentation/sysctl/kernel.txt | 1 + > include/linux/perf_event.h | 5 +++++ > kernel/events/core.c | 4 ++++ > 3 files changed, 10 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index ffab8b5..fac9798 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -665,6 +665,7 @@ users (without CAP_SYS_ADMIN). The default value is 2. > >=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK > >=1: Disallow CPU event access by users without CAP_SYS_ADMIN > >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN > +>=3: Disallow all event access by users without CAP_SYS_ADMIN > > ============================================================== > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 8ed43261..1e2080f 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -1156,6 +1156,11 @@ static inline bool perf_paranoid_kernel(void) > return sysctl_perf_event_paranoid > 1; > } > > +static inline bool perf_paranoid_any(void) > +{ > + return sysctl_perf_event_paranoid > 2; > +} > + > extern void perf_event_init(void); > extern void perf_tp_event(u16 event_type, u64 count, void *record, > int entry_size, struct pt_regs *regs, > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 356a6c7..52bd100 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -353,6 +353,7 @@ static struct srcu_struct pmus_srcu; > * 0 - disallow raw tracepoint access for unpriv > * 1 - disallow cpu events for unpriv > * 2 - disallow kernel profiling for unpriv > + * 3 - disallow all unpriv perf event use > */ > int sysctl_perf_event_paranoid __read_mostly = 2; > > @@ -9296,6 +9297,9 @@ SYSCALL_DEFINE5(perf_event_open, > if (flags & ~PERF_FLAG_ALL) > return -EINVAL; > > + if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN)) > + return -EACCES; > + > err = perf_copy_attr(attr_uptr, &attr); > if (err) > return err; > -- > 2.8.0.rc3.226.g39d4020 >
On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > all access to performance events by users without CAP_SYS_ADMIN. > > This new level of restriction is intended to reduce the attack > surface of the kernel. Perf is a valuable tool for developers but > is generally unnecessary and unused on production systems. Perf may > open up an attack vector to vulnerable device-specific drivers as > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. We have bugs we fix them, we don't kill complete infrastructure because of them. > This new level of > restriction allows for a safe default to be set on production systems > while leaving a simple means for developers to grant access [1]. So the problem I have with this is that it will completely inhibit development of things like JITs that self-profile to re-compile frequently used code. I would much rather have an LSM hook where the security stuff can do more fine grained control of things. Allowing some apps perf usage while denying others.
Em Tue, Aug 02, 2016 at 11:52:43AM +0200, Peter Zijlstra escreveu: > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > > all access to performance events by users without CAP_SYS_ADMIN. > > This new level of restriction is intended to reduce the attack > > surface of the kernel. Perf is a valuable tool for developers but > > is generally unnecessary and unused on production systems. Perf may > > open up an attack vector to vulnerable device-specific drivers as > > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. > We have bugs we fix them, we don't kill complete infrastructure because > of them. > > This new level of > > restriction allows for a safe default to be set on production systems > > while leaving a simple means for developers to grant access [1]. > So the problem I have with this is that it will completely inhibit > development of things like JITs that self-profile to re-compile > frequently used code. Or reimplement strace with sys_perf_event_open(), speeding it up greatly by not using ptrace (see 'perf trace', one such attempt), combining it with sys_bpf(), which can run unpriviledged as well, provides lots of possibilities for efficient tooling that would be greatly stiffled by such big hammer restrictions :-( > I would much rather have an LSM hook where the security stuff can do > more fine grained control of things. Allowing some apps perf usage while > denying others. - Arnaldo
> > So the problem I have with this is that it will completely inhibit > > development of things like JITs that self-profile to re-compile > > frequently used code. > > Or reimplement strace with sys_perf_event_open(), speeding it up > greatly > by not using ptrace (see 'perf trace', one such attempt), combining it > with sys_bpf(), which can run unpriviledged as well, provides lots of > possibilities for efficient tooling that would be greatly stiffled by > such big hammer restrictions :-( The usage on Android wouldn't impact strace. It's a debugging tool used over the debugging shell so it could be taught to toggle on unprivileged access to perf events as the other tools using the API were.
On Tue, 2016-08-02 at 11:52 +0200, Peter Zijlstra wrote: > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > > > > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > > all access to performance events by users without CAP_SYS_ADMIN. > > > > This new level of restriction is intended to reduce the attack > > surface of the kernel. Perf is a valuable tool for developers but > > is generally unnecessary and unused on production systems. Perf may > > open up an attack vector to vulnerable device-specific drivers as > > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. > > We have bugs we fix them, we don't kill complete infrastructure > because > > of them. It's still accessible to privileged processes either way. Android still allows access from unprivileged processes but it can only be enabled via the debugging shell, which is not enabled by default either. It isn't even possible to disable the perf events infrastructure via kernel configuration for every architecture right now. You're forcing people to have common local privilege escalation and information leak vulnerabilities for something few people actually use. This patch is now a requirement for any Android devices with a security patch level above August 2016. The only thing that not merging it is going to accomplish is preventing a mainline kernel from ever being used on Android devices, unless you provide an alternative it can use for the same use case. https://source.android.com/security/bulletin/2016-08-01.html > > This new level of > > restriction allows for a safe default to be set on production > > systems > > while leaving a simple means for developers to grant access [1]. > > So the problem I have with this is that it will completely inhibit > development of things like JITs that self-profile to re-compile > frequently used code. > > I would much rather have an LSM hook where the security stuff can do more fine grained control of things. Allowing some apps perf usage while > denying others. If the only need was controlling access per-process statically, then using seccomp-bpf works fine. It needs to be dynamic though. I don't think SELinux could be used to provide the functionality so it would have to be a whole new LSM. I doubt anyone will implement that when the necessary functionality is already available. It's already exposed only for developers using profiling tools until they reboot, so finer grained control wouldn't accomplish much.
On Tue, Aug 2, 2016 at 2:52 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: >> When kernel.perf_event_paranoid is set to 3 (or greater), disallow >> all access to performance events by users without CAP_SYS_ADMIN. >> >> This new level of restriction is intended to reduce the attack >> surface of the kernel. Perf is a valuable tool for developers but >> is generally unnecessary and unused on production systems. Perf may >> open up an attack vector to vulnerable device-specific drivers as >> recently demonstrated in CVE-2016-0805, CVE-2016-0819, >> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. > > We have bugs we fix them, we don't kill complete infrastructure because > of them. I understand that point of view, but it isn't what things look like for the average end-user of Linux. The lifetime on bugs is very long, even in upstream (see both Jon Corbet and my talks about this: an average of 5 years from introduction to fix), and gets drawn out even further by vendors with slow (or missing) update processes. Being able to remove attack surface is a fundamental first step of security defense, and things like perf, user namespaces, and similar APIs, expose a lot of attack surface when they are enabled. And the evidence for this attack surface being a real-world risk is in the history of security vulnerabilities (that we know about!) in these various APIs. Now, obviously, these API have huge value, otherwise they wouldn't exist in the first place, and they wouldn't be built into end-user kernels if they were universally undesirable. But that's not the situation: the APIs are needed, but they lack the appropriate knobs to control their availability. And this isn't just about Android: regular distro kernels (like Debian, who also uses this patch) tend to build in everything so people can use whatever they want. But for admins that want to reduce their systems' attack surface, there needs to be ways to disable things like this. >> This new level of >> restriction allows for a safe default to be set on production systems >> while leaving a simple means for developers to grant access [1]. > > So the problem I have with this is that it will completely inhibit > development of things like JITs that self-profile to re-compile > frequently used code. This is a good example of a use-case where this knob would be turned down. But for many many other use-cases, when presented with a pre-built kernel, there isn't a way to remove the attack surface. > I would much rather have an LSM hook where the security stuff can do > more fine grained control of things. Allowing some apps perf usage while > denying others. I'm not against an LSM, but I think it's needless complexity when there is already a knob for this but it just doesn't go "high" enough. :) -Kees
On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote: > Now, obviously, these API have huge value, otherwise they wouldn't > exist in the first place, and they wouldn't be built into end-user > kernels if they were universally undesirable. But that's not the > situation: the APIs are needed, but they lack the appropriate knobs to > control their availability. So far so good, but I take exception with the suggestion that the proposed knob is appropriate. > And this isn't just about Android: regular > distro kernels (like Debian, who also uses this patch) tend to build > in everything so people can use whatever they want. But for admins > that want to reduce their systems' attack surface, there needs to be > ways to disable things like this. And here I think you're overestimating the knowledge of most admins. > > So the problem I have with this is that it will completely inhibit > > development of things like JITs that self-profile to re-compile > > frequently used code. > > This is a good example of a use-case where this knob would be turned > down. But for many many other use-cases, when presented with a > pre-built kernel, there isn't a way to remove the attack surface. No, quite the opposite. Having this knob will completely inhibit development of such applications. Worse it will probably render perf dead for quite a large body of developers. The moment you frame it like: perf or sekjurity, and even default to no-perf-because-sekjurity, a whole bunch of corporate IT departments will not enable this, even for their developers. Have you never had to 'root' your work machine to get work done? I have. Luckily this was pre-secure-boot times so it was trivial since I had physical access to the machine. But it still sucked I had to fight IT over mostly 'trivial' crap. > > I would much rather have an LSM hook where the security stuff can do > > more fine grained control of things. Allowing some apps perf usage while > > denying others. > > I'm not against an LSM, but I think it's needless complexity when > there is already a knob for this but it just doesn't go "high" enough. > :) So what will you to the moment the Google Dalvik guys come to you and say: "Hey, we want to do active profiling to do better on-line code generation?". I see 0 up-sides of this approach and, as per the above, a whole bunch of very serious downsides. A global (esp. default inhibited) knob is too coarse and limiting.
On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote: > >> Now, obviously, these API have huge value, otherwise they wouldn't >> exist in the first place, and they wouldn't be built into end-user >> kernels if they were universally undesirable. But that's not the >> situation: the APIs are needed, but they lack the appropriate knobs to >> control their availability. > > So far so good, but I take exception with the suggestion that the > proposed knob is appropriate. > >> And this isn't just about Android: regular >> distro kernels (like Debian, who also uses this patch) tend to build >> in everything so people can use whatever they want. But for admins >> that want to reduce their systems' attack surface, there needs to be >> ways to disable things like this. > > And here I think you're overestimating the knowledge of most admins. Well, I suspect that's why both Android and Debian default this to off right now. :) But, regardless, I think it's weird to block admins who DO understand about attack surface from being able to limit it. >> > So the problem I have with this is that it will completely inhibit >> > development of things like JITs that self-profile to re-compile >> > frequently used code. >> >> This is a good example of a use-case where this knob would be turned >> down. But for many many other use-cases, when presented with a >> pre-built kernel, there isn't a way to remove the attack surface. > > No, quite the opposite. Having this knob will completely inhibit > development of such applications. Worse it will probably render perf > dead for quite a large body of developers. I hear what you're saying, but I think there's a few problems: the proposed self-profiling JIT doesn't exist (so it's pointless to discuss), and the number of developers that don't have access to their own system is impossibly tiny when compared to the hundreds of millions of end-users for whom perf is not needed. > The moment you frame it like: perf or sekjurity, and even default to > no-perf-because-sekjurity, a whole bunch of corporate IT departments > will not enable this, even for their developers. It's not "vs security", it's a risk analysis of attack surface. The vast majority of people running Linux do not use perf (right now). I've never suggested it be default disabled: I'm wanting to upstream the sysctl setting that is already in use on distros where the distro kernel teams have deemed this is needed knob for their end-users. All of the objections you're talking about assume that the knob doesn't exist, but it does already. It's just not in upstream. :) > Have you never had to 'root' your work machine to get work done? I have. > Luckily this was pre-secure-boot times so it was trivial since I had > physical access to the machine. But it still sucked I had to fight IT > over mostly 'trivial' crap. Yeah, I've had to do similar. Frankly most people use VMs or non-corp hardware for doing development work, so I don't think this is a useful comparison. > >> > I would much rather have an LSM hook where the security stuff can do >> > more fine grained control of things. Allowing some apps perf usage while >> > denying others. >> >> I'm not against an LSM, but I think it's needless complexity when >> there is already a knob for this but it just doesn't go "high" enough. >> :) > > So what will you to the moment the Google Dalvik guys come to you and > say: "Hey, we want to do active profiling to do better on-line code > generation?". That hasn't happened yet. When it does, we can revisit this. But right now, today, there is a need for this knob. > I see 0 up-sides of this approach and, as per the above, a whole bunch > of very serious downsides. > > A global (esp. default inhibited) knob is too coarse and limiting. I haven't suggested it be default inhibit in the upstream Kconfig. And having this knob already with the 0, 1, and 2 settings seems incomplete to me without this highest level of restriction that 3 would provide. That seems rather arbitrary to me. :) Let me take this another way instead. What would be a better way to provide a mechanism for system owners to disable perf without an LSM? (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as wide a coverage as possible.) -Kees
Far from trying to kill perf, we want (and require) perf to be available to developers on Android. All that this patch enables us to do is gate it behind developer settings - just like we do with other developer targeted features. On Tue, Aug 2, 2016 at 1:51 PM Kees Cook <keescook@chromium.org> wrote: > On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> > wrote: > > On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote: > > > >> Now, obviously, these API have huge value, otherwise they wouldn't > >> exist in the first place, and they wouldn't be built into end-user > >> kernels if they were universally undesirable. But that's not the > >> situation: the APIs are needed, but they lack the appropriate knobs to > >> control their availability. > > > > So far so good, but I take exception with the suggestion that the > > proposed knob is appropriate. > > > >> And this isn't just about Android: regular > >> distro kernels (like Debian, who also uses this patch) tend to build > >> in everything so people can use whatever they want. But for admins > >> that want to reduce their systems' attack surface, there needs to be > >> ways to disable things like this. > > > > And here I think you're overestimating the knowledge of most admins. > > Well, I suspect that's why both Android and Debian default this to off > right now. :) But, regardless, I think it's weird to block admins who > DO understand about attack surface from being able to limit it. > > >> > So the problem I have with this is that it will completely inhibit > >> > development of things like JITs that self-profile to re-compile > >> > frequently used code. > >> > >> This is a good example of a use-case where this knob would be turned > >> down. But for many many other use-cases, when presented with a > >> pre-built kernel, there isn't a way to remove the attack surface. > > > > No, quite the opposite. Having this knob will completely inhibit > > development of such applications. Worse it will probably render perf > > dead for quite a large body of developers. > > I hear what you're saying, but I think there's a few problems: the > proposed self-profiling JIT doesn't exist (so it's pointless to > discuss), and the number of developers that don't have access to their > own system is impossibly tiny when compared to the hundreds of > millions of end-users for whom perf is not needed. > > > The moment you frame it like: perf or sekjurity, and even default to > > no-perf-because-sekjurity, a whole bunch of corporate IT departments > > will not enable this, even for their developers. > > It's not "vs security", it's a risk analysis of attack surface. The > vast majority of people running Linux do not use perf (right now). > I've never suggested it be default disabled: I'm wanting to upstream > the sysctl setting that is already in use on distros where the distro > kernel teams have deemed this is needed knob for their end-users. All > of the objections you're talking about assume that the knob doesn't > exist, but it does already. It's just not in upstream. :) > > > Have you never had to 'root' your work machine to get work done? I have. > > Luckily this was pre-secure-boot times so it was trivial since I had > > physical access to the machine. But it still sucked I had to fight IT > > over mostly 'trivial' crap. > > Yeah, I've had to do similar. Frankly most people use VMs or non-corp > hardware for doing development work, so I don't think this is a useful > comparison. > > > > >> > I would much rather have an LSM hook where the security stuff can do > >> > more fine grained control of things. Allowing some apps perf usage > while > >> > denying others. > >> > >> I'm not against an LSM, but I think it's needless complexity when > >> there is already a knob for this but it just doesn't go "high" enough. > >> :) > > > > So what will you to the moment the Google Dalvik guys come to you and > > say: "Hey, we want to do active profiling to do better on-line code > > generation?". > > That hasn't happened yet. When it does, we can revisit this. But right > now, today, there is a need for this knob. > > > I see 0 up-sides of this approach and, as per the above, a whole bunch > > of very serious downsides. > > > > A global (esp. default inhibited) knob is too coarse and limiting. > > I haven't suggested it be default inhibit in the upstream Kconfig. And > having this knob already with the 0, 1, and 2 settings seems > incomplete to me without this highest level of restriction that 3 > would provide. That seems rather arbitrary to me. :) > > Let me take this another way instead. What would be a better way to > provide a mechanism for system owners to disable perf without an LSM? > (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as > wide a coverage as possible.) > > -Kees > > -- > Kees Cook > Chrome OS & Brillo Security >
Far from trying to kill perf, we want (and require) perf to be available to developers on Android. All that this patch enables us to do is gate it behind developer settings - just like we do with other developer targeted features. (apologies for the dup, bounced due to non-plaintext) On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Aug 02, 2016 at 12:04:34PM -0700, Kees Cook wrote: > >> Now, obviously, these API have huge value, otherwise they wouldn't >> exist in the first place, and they wouldn't be built into end-user >> kernels if they were universally undesirable. But that's not the >> situation: the APIs are needed, but they lack the appropriate knobs to >> control their availability. > > So far so good, but I take exception with the suggestion that the > proposed knob is appropriate. > >> And this isn't just about Android: regular >> distro kernels (like Debian, who also uses this patch) tend to build >> in everything so people can use whatever they want. But for admins >> that want to reduce their systems' attack surface, there needs to be >> ways to disable things like this. > > And here I think you're overestimating the knowledge of most admins. > >> > So the problem I have with this is that it will completely inhibit >> > development of things like JITs that self-profile to re-compile >> > frequently used code. >> >> This is a good example of a use-case where this knob would be turned >> down. But for many many other use-cases, when presented with a >> pre-built kernel, there isn't a way to remove the attack surface. > > No, quite the opposite. Having this knob will completely inhibit > development of such applications. Worse it will probably render perf > dead for quite a large body of developers. > > The moment you frame it like: perf or sekjurity, and even default to > no-perf-because-sekjurity, a whole bunch of corporate IT departments > will not enable this, even for their developers. > > Have you never had to 'root' your work machine to get work done? I have. > Luckily this was pre-secure-boot times so it was trivial since I had > physical access to the machine. But it still sucked I had to fight IT > over mostly 'trivial' crap. > >> > I would much rather have an LSM hook where the security stuff can do >> > more fine grained control of things. Allowing some apps perf usage while >> > denying others. >> >> I'm not against an LSM, but I think it's needless complexity when >> there is already a knob for this but it just doesn't go "high" enough. >> :) > > So what will you to the moment the Google Dalvik guys come to you and > say: "Hey, we want to do active profiling to do better on-line code > generation?". > > I see 0 up-sides of this approach and, as per the above, a whole bunch > of very serious downsides. > > A global (esp. default inhibited) knob is too coarse and limiting. > >
* Kees Cook <keescook@chromium.org> wrote: > > I see 0 up-sides of this approach and, as per the above, a whole bunch of very > > serious downsides. > > > > A global (esp. default inhibited) knob is too coarse and limiting. > > I haven't suggested it be default inhibit in the upstream Kconfig. And > having this knob already with the 0, 1, and 2 settings seems > incomplete to me without this highest level of restriction that 3 > would provide. That seems rather arbitrary to me. :) The default has no impact on the "it's too coarse and limiting" negative property of this patch, which is the show-stopper aspect. Please fix that aspect instead of trying to argue around it. This isn't some narrow debugging mechanism we can turn on/off globally and forget about, this is a wide scope performance measurement and event logging infrastructure that is being utilized not just by developers but by apps and runtimes as well. > Let me take this another way instead. What would be a better way to provide a > mechanism for system owners to disable perf without an LSM? (Since far fewer > folks run with an enforcing "big" LSM: I'm seeking as wide a coverage as > possible.) Because in practice what will happen is that if the only option is to do something drastic for sekjurity, IT departments will do it - while if there's a more flexible mechanism that does not throw out the baby with the bath water that is going to be used. This is as if 20 years ago you had submitted a patch to the early Linux TCP/IP networking code to be on/off via a global sysctl switch and told people that "in developer mode you can have networking, talk to your admin". We'd have told you: "this switch is too coarse and limiting, please implement something better, like a list of routes which defines which IP ranges are accessible, and a privileged range of listen sockets ports and some flexible kernel side filtering mechanism to inhibit outgoing/incoming connections". Global sysctls are way too coarse. Thanks, Ingo
> The default has no impact on the "it's too coarse and limiting" > negative property > of this patch, which is the show-stopper aspect. Please fix that > aspect instead of > trying to argue around it. Disabling perf events in the kernel configuration is even more limiting, and is currently the alternative to doing it this way. It makes sense for debugging features to be disabled in production releases of products unless there's a way to control them where risk isn't increased. Having a way to toggle it dynamically will allow it to be remain available. > This isn't some narrow debugging mechanism we can turn on/off globally > and forget > about, this is a wide scope performance measurement and event logging > infrastructure that is being utilized not just by developers but by > apps and > runtimes as well. The incredibly wide scope is why it's such a big security problem. If it wasn't such a frequent source of vulnerabilities, it wouldn't have been disabled for unprivileged users in grsecurity, Debian and then Android. It's extended by lots of vendor code to specific to platforms too, so it isn't just some core kernel code that's properly reviewed. I don't think there are runtimes using this for JIT tracing. Perhaps it doesn't actually suit their needs. It's a theoretical use case. > > Let me take this another way instead. What would be a better way to > > provide a > > mechanism for system owners to disable perf without an LSM? (Since > > far fewer > > folks run with an enforcing "big" LSM: I'm seeking as wide a > > coverage as > > possible.) > > Because in practice what will happen is that if the only option is to > do something > drastic for sekjurity, IT departments will do it - while if there's a > more > flexible mechanism that does not throw out the baby with the bath > water that is > going to be used. It's already done: Android and Debian disable it for unprivileged users by default. It's already done for most of desktop and mobile Linux and perhaps even most servers. Not providing functionality desired by downstream doesn't mean it won't be provided there. They'll keep doing it whether or not this lands. If it doesn't land, it will only mean that mainline kernels aren't usable for making Android devices. They need to pass the compatibility test suite, which means having this. The mechanism could change but I don't see why the actual requirement would. > This is as if 20 years ago you had submitted a patch to the early > Linux TCP/IP > networking code to be on/off via a global sysctl switch and told > people that > "in developer mode you can have networking, talk to your admin". > > We'd have told you: "this switch is too coarse and limiting, please > implement > something better, like a list of routes which defines which IP ranges > are > accessible, and a privileged range of listen sockets ports and some > flexible > kernel side filtering mechanism to inhibit outgoing/incoming > connections". > > Global sysctls are way too coarse. The end result of submitting an LSM hook instead of using this would probably come down to having a global sysctl toggle in Yama. There would be two sysctl controls for perf restrictions rather than one which is needless complexity for the interface. Android and Debian would be using a fine-grained perf LSM to accomplish the same thing: globally disabling it for unprivileged users when not doing development. The nice thing about fine-grained control would be implementing a *more* restrictive policy for the case when it's toggled on. It wouldn't necessarily have to be globally enabled, only enabled for the relevant user or even a specific process, but that would be complicated to implement.
Having this in Yama would also make it probable that there would be a security-centric default. It would end up wiping out unprivileged perf events access on distributions using Yama for ptrace_scope unless they make the explicit decision to disable it. Having the perf subsystem extend the existing perf_event_paranoid sysctl leaves the control over the upstream default in the hands of the perf subsystem, not LSMs.
On Wed, Aug 03, 2016 at 08:28:10AM -0400, Daniel Micay wrote: > I don't think there are runtimes using this for JIT tracing. Perhaps it > doesn't actually suit their needs. It's a theoretical use case. I know there are compiler teams using perf for FDO, see for example: https://gcc.gnu.org/wiki/AutoFDO/Tutorial LLVM also has AutoFDO support AFAIU. There is no reason JITs could not also do this, and IIRC there's JITs build on top of LLVM, so it shouldn't be too hard to imagine an AutoFDO enabled JIT.
On Tue, Aug 02, 2016 at 01:51:47PM -0700, Kees Cook wrote: > Let me take this another way instead. What would be a better way to > provide a mechanism for system owners to disable perf without an LSM? > (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as > wide a coverage as possible.) Could something like a new capability bit work? I'm thinking that applications that have network connections already drop all possible capabilities (I know, unlikely to be true, but should be true for most stuff I hope). This would disable perf for remote code execution exploits, including web-browsers and the lot. It would keep perf working for local stuff by default, although obviously with pam_cap you can limit this when and where needed. For Android this could mean the JVM explicitly dropping the cap for its 'children' while retaining the use itself. And this would also keep perf working on the ADB shell stuff. And, I think this would allow a JIT executable to gain the cap using file caps, even when the user using it doesn't have it, which would keep things usable even in restricted environments. Or am I misunderstanding capabilities -- which is entirely possible.
> -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Wednesday, August 03, 2016 7:41 AM > To: Kees Cook <keescook@chromium.org> > Cc: Jeff Vander Stoep <jeffv@google.com>; Ingo Molnar <mingo@redhat.com>; > Arnaldo Carvalho de Melo <acme@kernel.org>; Alexander Shishkin > <alexander.shishkin@linux.intel.com>; linux-doc@vger.kernel.org; kernel- > hardening@lists.openwall.com; LKML <linux-kernel@vger.kernel.org>; > Jonathan Corbet <corbet@lwn.net>; Eric W. Biederman > <ebiederm@xmission.com> > Subject: Re: [kernel-hardening] Re: [PATCH 1/2] security, perf: allow further > restriction of perf_event_open > > On Tue, Aug 02, 2016 at 01:51:47PM -0700, Kees Cook wrote: > > Let me take this another way instead. What would be a better way to > > provide a mechanism for system owners to disable perf without an LSM? > > (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as > > wide a coverage as possible.) > > Could something like a new capability bit work? Yes, it could, but we don't have a real good experience with the advanced use of capabilities. The big distros and the "OS"s have yet to adopt them seriously. It's the same sort of issue, really. The granularity gets to you. > I'm thinking that applications that have network connections already > drop all possible capabilities (I know, unlikely to be true, but should > be true for most stuff I hope). This would disable perf for remote code > execution exploits, including web-browsers and the lot. > > It would keep perf working for local stuff by default, although > obviously with pam_cap you can limit this when and where needed. > > For Android this could mean the JVM explicitly dropping the cap for its > 'children' while retaining the use itself. And this would also keep perf > working on the ADB shell stuff. > > > And, I think this would allow a JIT executable to gain the cap using > file caps, even when the user using it doesn't have it, which would keep > things usable even in restricted environments. > > > Or am I misunderstanding capabilities -- which is entirely possible.
Sigh. Kees we have already had this conversation about user namespaces and apparently you missed the point. As I have said before the problem with a system wide off switch is what happens when you have a single application that needs to use the feature. Without care your system wide protection disappears. That is very brittle design. What I see as much more palatable is a design that allows for features to be turned off in sandboxes. So please if you are going to worry about disabling large swaths of the kernel to reduce the attack surface please come up with designs that are not brittle in allowing users to use a feature nor are they brittle in keeping the feature disabled where you want it disabled. One of the strengths of linux is applications of features the authors of the software had not imagined. Your proposals seem to be trying to put the world a tiny little box where if someone had not imagined and preapproved a use of a feature it should not happen. Let's please avoid implementing totalitarianism to avoid malicious code exploiting bugs in the kernel. I am not interested in that future. Especially when dealing with disabling code to reduce attack surface, when then are no known attacks what we are actually dealing with is a small percentage probability reduction that a malicious attacker will be able to exploit the attack. Remember security is as much about availability as it is about integrity. You keep imagining features that are great big denial of service attacks on legitimate users. Kees Cook <keescook@chromium.org> writes: > On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: > Let me take this another way instead. What would be a better way to > provide a mechanism for system owners to disable perf without an LSM? > (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as > wide a coverage as possible.) I vote for sandboxes. Perhaps seccomp. Perhaps a per userns sysctl. Perhaps something else. Eric
On Wed, Aug 3, 2016 at 10:25 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Sigh. > > Kees we have already had this conversation about user namespaces and > apparently you missed the point. Well, I didn't miss the point: that's why I CCed you. :) This is nearly the same discussion (though in this case there is already a sysctl, hence this thread). > As I have said before the problem with a system wide off switch is what > happens when you have a single application that needs to use the > feature. Without care your system wide protection disappears. > That is very brittle design. Yeah, the use-cases tend to be "never", "always", "this process", and "this user". The way Android would like to be handling the permission, though, gets back to another thing we talked about briefly when discussing userns: revocation. Android wants to turn the entire permission on and off across the entire system, so things like capabilities or privileged fds or something don't quite fit that either. > What I see as much more palatable is a design that allows for > features to be turned off in sandboxes. > > So please if you are going to worry about disabling large swaths of > the kernel to reduce the attack surface please come up with designs > that are not brittle in allowing users to use a feature nor are they > brittle in keeping the feature disabled where you want it disabled. Yup, Linus asked for this too. I haven't come up with anything particularly great yet. :P > One of the strengths of linux is applications of features the authors of > the software had not imagined. Your proposals seem to be trying to put > the world a tiny little box where if someone had not imagined and > preapproved a use of a feature it should not happen. Let's please > avoid implementing totalitarianism to avoid malicious code exploiting > bugs in the kernel. I am not interested in that future. To be clear: I'm interested in giving system owners greater control over what's exposed. That's not about limiting access everywhere. And I'm interested in making sure that the upstream kernel actually provides what end-users want. In the most extreme version of this is when distros carry kernel patches to get it done (this was true with userns and is true again here with perf). This IS a desired features, and it exists in the world. I want to avoid the confusion that arises from people running patched kernels: upstream developers don't realize what state their features are in when they reach end users, documentation doesn't match, etc etc. However, I do accept that the existing mechanisms that have served Linux over the years (sysctls, capabilities, etc) are woefully insufficient in many of these cases. > Especially when dealing with disabling code to reduce attack surface, > when then are no known attacks what we are actually dealing with > is a small percentage probability reduction that a malicious attacker > will be able to exploit the attack. But there are bugs. We just haven't found them. Based on the past research by Jon Corbet (and again recently by me) we're sitting on at least 1 new critical bug that we won't find for the next 5 years (and we've got at least 1 more each that are on average 1, 2, 3, and 4 years old that we haven't found yet either). > Remember security is as much about availability as it is about > integrity. You keep imagining features that are great big denial of > service attacks on legitimate users. That's not my goal: legitimate users should have access. That's up to system owners. But I'd like to provide ways for system owners to keep illegitimate users from having access. :) > Kees Cook <keescook@chromium.org> writes: > >> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> Let me take this another way instead. What would be a better way to >> provide a mechanism for system owners to disable perf without an LSM? >> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as >> wide a coverage as possible.) > > I vote for sandboxes. Perhaps seccomp. Perhaps a per userns sysctl. > Perhaps something else. Peter, did you happen to see Eric's solution to this problem for namespaces? Basically, a per-userns sysctl instead of a global sysctl. Is that something that would be acceptable here? -Kees
> One of the strengths of linux is applications of features the authors > of > the software had not imagined. Your proposals seem to be trying to > put > the world a tiny little box where if someone had not imagined and > preapproved a use of a feature it should not happen. Let's please > avoid implementing totalitarianism to avoid malicious code exploiting > bugs in the kernel. I am not interested in that future. You're describing operating systems like Android, ChromeOS and iOS. That future is already here and the Linux kernel is the major weak point in the attempts to build those systems based on Linux. Even for the very restricted Chrome sandbox, it's the easiest way out. Android similarly allows near zero access to /sys for apps and little access to /proc beyond the /proc/PID directories belonging to an app. > Especially when dealing with disabling code to reduce attack surface, > when then are no known attacks what we are actually dealing with > is a small percentage probability reduction that a malicious attacker > will be able to exploit the attack. There are perf events vulnerabilities being exploited in the wild to gain root on Android. It's not a theoretical attack vector. They're used in both malware and rooting tools. Local privilege escalation bugs in the kernel are common so there are a lot of alternatives but it's one of the major sources for vulnerabilities. There's a lot of architecture and vendor specific perf events code and lots of bleeding edge features. On Android, a lot of the perf events vulnerabilities have been specific to the Qualcomm SoC platform. Other platforms are likely just receiving a lot less attention. > Remember security is as much about availability as it is about > integrity. You keep imagining features that are great big denial of > service attacks on legitimate users. Only developers care about perf events and they still have access to it. JIT compilers have other ways to do tracing and even if they consider this to be the ideal API, it's not particularly important if they have to settle for something else. In reality, it's a small compromise. > I vote for sandboxes. Perhaps seccomp. Perhaps a per userns sysctl. > Perhaps something else. It's not possible to use the current incarnation of seccomp for this since it can't be dynamically granted/revoked. Perhaps it would be possible to support adding/removing or at least toggling seccomp filters for groups of processes. That would be good enough to take care of user ns, ptrace, perf events, etc.
On Wed, Aug 03, 2016 at 11:53:41AM -0700, Kees Cook wrote: > > Kees Cook <keescook@chromium.org> writes: > > > >> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >> Let me take this another way instead. What would be a better way to > >> provide a mechanism for system owners to disable perf without an LSM? > >> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as > >> wide a coverage as possible.) > > > > I vote for sandboxes. Perhaps seccomp. Perhaps a per userns sysctl. > > Perhaps something else. > > Peter, did you happen to see Eric's solution to this problem for > namespaces? Basically, a per-userns sysctl instead of a global sysctl. > Is that something that would be acceptable here? Someone would have to educate me on what a userns is and how that would help here.
Peter Zijlstra <peterz@infradead.org> writes: > On Wed, Aug 03, 2016 at 11:53:41AM -0700, Kees Cook wrote: >> > Kees Cook <keescook@chromium.org> writes: >> > >> >> On Tue, Aug 2, 2016 at 1:30 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> >> Let me take this another way instead. What would be a better way to >> >> provide a mechanism for system owners to disable perf without an LSM? >> >> (Since far fewer folks run with an enforcing "big" LSM: I'm seeking as >> >> wide a coverage as possible.) >> > >> > I vote for sandboxes. Perhaps seccomp. Perhaps a per userns sysctl. >> > Perhaps something else. >> >> Peter, did you happen to see Eric's solution to this problem for >> namespaces? Basically, a per-userns sysctl instead of a global sysctl. >> Is that something that would be acceptable here? > > Someone would have to educate me on what a userns is and how that would > help here. userns is an abbreviation for user namespace. How it might help is that it is an easy unescapable context for processes. Essentialy the idea is to limit the scope of the sysctl to a container. User namespaces run into flack because while tremendously simple in themselves the code takes advantage of the fact that suid root executables in a user namespace do not have privileges on anything outside of the user namespace. Which means that it is semantically safe to allow operations like creating mount namespaces, mount filesystems, creating network namespaces, manipulating the network stack etc. All of which allows unprivileged users (that can create network namespaces) to exercise more kernel code and exercise those bugs. Fundamentally user namespaces as objects you can create need limits on the maximum number of user namespaces you can create to cawtch run away processes. Set the limit you can create to 0 and you get what Kees wants. In my pending patches that were not quite ready for the merge window, I added a sysctl that described the maximum number of user namepaces that could be created (default value threads-max), and implemented the sysctl in a per user way. Such that counts and limits were kept for every user namespace. In a nested user namespace (which are all of them except for the initial user namspace) the count and limit would be checked in the current user namepsace, then the count would be incremented in the parent and verified the count was below the limit in the parent user namespace. What this means in practice is user namespaces can be enabled by default on a system, and yet you can easily disable them in a sandbox that was built with a user namespace. I named the new sysctls in my patch: /proc/sys/userns/max_user_namespaces /proc/sys/userns/max_pid_namespaces /proc/sys/userns/max_net_namespaces /proc/sys/userns/max_uts_namespaces /proc/sys/userns/max_ipc_namespaces /proc/sys/userns/max_cgroup_namespaces /proc/sys/userns/max_mnt_namespaces What Kees was suggesting was to add a similar sysctl say: /proc/sys/userns/perf_event_enabled And have the ability to disable perf events in each user namespaces. While still being able to leave usage perf events enabled by default. I don't know if any of that is a good fit for perf events. For purposes of this discussion I assume we are limiting ourselves to discussing userspace tracing, which semantically is 100% fine for access by userspace. Eric
On Wed, Aug 03, 2016 at 09:50:37PM -0500, Eric W. Biederman wrote: > What this means in practice is user namespaces can be enabled by default > on a system, and yet you can easily disable them in a sandbox that was > built with a user namespace. > > I named the new sysctls in my patch: > /proc/sys/userns/max_user_namespaces > /proc/sys/userns/max_pid_namespaces > /proc/sys/userns/max_net_namespaces > /proc/sys/userns/max_uts_namespaces > /proc/sys/userns/max_ipc_namespaces > /proc/sys/userns/max_cgroup_namespaces > /proc/sys/userns/max_mnt_namespaces > > What Kees was suggesting was to add a similar sysctl say: > /proc/sys/userns/perf_event_enabled > > And have the ability to disable perf events in each user namespaces. > While still being able to leave usage perf events enabled by default. > > I don't know if any of that is a good fit for perf events. > > For purposes of this discussion I assume we are limiting ourselves to > discussing userspace tracing, which semantically is 100% fine for > access by userspace. Right, so its basically a 'root' namespace. Not sure how this would help, or cover the use-cases with perf through. Do they really only care about the sandbox? I can imagine this being sufficient for Android as that could do these userns thingies for each app or whatnot. But does this cover the case Debian disabled perf for? I'm not sure I've ever seen it described _why_ they did it. So far I'm still liking the new capability bit better, assuming I understood those right.
On Wed, Aug 03, 2016 at 03:36:16PM -0400, Daniel Micay wrote: > There's a lot of architecture and vendor specific perf events code and > lots of bleeding edge features. On Android, a lot of the perf events > vulnerabilities have been specific to the Qualcomm SoC platform. Other > platforms are likely just receiving a lot less attention. Are the relevant perf drivers for those platforms upstream? I've seen no patches addressing security issues in the ARMv7 krait+Scorpion PMU driver since it was added, and there's no ARMv8 QCOM PMU driver. If there are outstanding issues, please report them upstream. FWIW, I've used Vince Weaver's perf fuzzer to test the ARM PMU code (both the framework and drivers), so other platforms are seeing some attention. That said, I haven't done that recently. Thanks, Mark.
On Thu, 2016-08-04 at 11:28 +0100, Mark Rutland wrote: > On Wed, Aug 03, 2016 at 03:36:16PM -0400, Daniel Micay wrote: > > > > There's a lot of architecture and vendor specific perf events code > > and > > lots of bleeding edge features. On Android, a lot of the perf events > > vulnerabilities have been specific to the Qualcomm SoC platform. > > Other > > platforms are likely just receiving a lot less attention. > > Are the relevant perf drivers for those platforms upstream? I've seen > no > patches addressing security issues in the ARMv7 krait+Scorpion PMU > driver since it was added, and there's no ARMv8 QCOM PMU driver. > > If there are outstanding issues, please report them upstream. > > FWIW, I've used Vince Weaver's perf fuzzer to test the ARM PMU code > (both the framework and drivers), so other platforms are seeing some > attention. That said, I haven't done that recently. Qualcomm's perf driver is out-of-tree along with most of their other drivers. Their drivers add up to a LOT of code shared across over a billion mobile devices, leading to the focus on them. It also helps that there are bounties for Nexus devices, so there are multi thousand dollar rewards for bugs in the Qualcomm drivers compared to nothing for other platforms / drivers. Now that perf is only available via ADB debugging, further perf bugs no longer technically qualify for their bounties (but they might still pay, I don't know).
On Thu, Aug 04, 2016 at 09:45:23AM -0400, Daniel Micay wrote: > Qualcomm's perf driver is out-of-tree along with most of their other > drivers. So you're asking us to maim upstream perf for some out of tree junk? Srously? *plonk*
Peter Zijlstra <peterz@infradead.org> writes: > On Wed, Aug 03, 2016 at 09:50:37PM -0500, Eric W. Biederman wrote: > >> What this means in practice is user namespaces can be enabled by default >> on a system, and yet you can easily disable them in a sandbox that was >> built with a user namespace. >> >> I named the new sysctls in my patch: >> /proc/sys/userns/max_user_namespaces >> /proc/sys/userns/max_pid_namespaces >> /proc/sys/userns/max_net_namespaces >> /proc/sys/userns/max_uts_namespaces >> /proc/sys/userns/max_ipc_namespaces >> /proc/sys/userns/max_cgroup_namespaces >> /proc/sys/userns/max_mnt_namespaces >> >> What Kees was suggesting was to add a similar sysctl say: >> /proc/sys/userns/perf_event_enabled >> >> And have the ability to disable perf events in each user namespaces. >> While still being able to leave usage perf events enabled by default. >> >> I don't know if any of that is a good fit for perf events. >> >> For purposes of this discussion I assume we are limiting ourselves to >> discussing userspace tracing, which semantically is 100% fine for >> access by userspace. > > Right, so its basically a 'root' namespace. Not sure how this would > help, or cover the use-cases with perf through. The bits useful to the perf situation are: - user namespaces nest. - anyone can create a user namespace. - a sysctl can be bound to the userns that takes local privilege to change so you can't override it arbitrarily. Which is a long way of saying a user namespace is one way of marking processes that may or may not use perf. It was given in this case as an example of something that has been looked at that appears to solve peoples concerns. Another way to achieve a similar effect is to build something like an rlimit. What is attractive to me semantically about something like this is applications that have perf_event disabled can still be traced with perf. > Do they really only care about the sandbox? I can imagine this being > sufficient for Android as that could do these userns thingies for each > app or whatnot. So the question is how do we want to apply policy in this case. If the only concern is that there might be some bug somewhere in the code that is undiscovered and people who don't use a feature don't want to have to worry about it, disabling things at the application level makes sense. In my mind a sandbox is policy like this that I apply to my application, in contrast a sandbox approach with a global disable or some other specific poicy that the system administrator applies. The really important property that I think needs to exist is a less than system granularity. So a solution that doesn't disable it for everyone and doesn't disable something by default can be deployed while still allowing the feature to be disabled where people don't want to take the chance (such as in network facing daemons like apache). > But does this cover the case Debian disabled perf for? > I'm not sure I've ever seen it described _why_ they did it. Good question. I suspect someone should ask. Especially since debian defaults to 3. perf event is disabled for everyone. > So far I'm still liking the new capability bit better, assuming I > understood those right. Your subsystem your call. I have never had much luck with capability bits. They are not particularly flexible, and are hard to get rid of permanently any suid root app gains them all. But it isn't a particularly easy problem and I don't think we have any solutions that have lasted the test of time for this kind of thing other than seccomp. Eric
On Thu, Aug 04, 2016 at 10:13:29AM -0500, Eric W. Biederman wrote: > The bits useful to the perf situation are: > - user namespaces nest. > - anyone can create a user namespace. > - a sysctl can be bound to the userns that takes local privilege to > change so you can't override it arbitrarily. > > Which is a long way of saying a user namespace is one way of marking > processes that may or may not use perf. > > It was given in this case as an example of something that has been > looked at that appears to solve peoples concerns. > What is attractive to me semantically about something like this is > applications that have perf_event disabled can still be traced with perf. > > So far I'm still liking the new capability bit better, assuming I > > understood those right. > > Your subsystem your call. I have never had much luck with capability > bits. They are not particularly flexible, and are hard to get rid of > permanently any suid root app gains them all. Right, so I've no experience with any of this. But from what I understood amluto recently made capabilities much more useful with: 58319057b784 ("capabilities: ambient capabilities"). And the thing I like is having file capabilities, so even though the user cannot in general create perf events, we could mark the /usr/bin/perf executable as having CAP_PERF and allow it to create them, because its a 'trusted' executable. Can something like that be done with userns? Afaiu once you create a userns with perf disabled, even a nested one cannot re-enable it, otherwise you cannot create sandboxes.
On Thu, 2016-08-04 at 16:11 +0200, Peter Zijlstra wrote: > On Thu, Aug 04, 2016 at 09:45:23AM -0400, Daniel Micay wrote: > > > > Qualcomm's perf driver is out-of-tree along with most of their other > > drivers. > > > So you're asking us to maim upstream perf for some out of tree junk? > Srously? *plonk* This feature doesn't come from Android. The perf events subsystem in the mainline kernel is packed full of vulnerabilities too. The problem is so bad that pointing one of the public fuzzers at it for a short period of time is all that's required to start finding them. Qualcomm's drivers might be lower quality than core kernel code, but they're way above the baseline set by mainline kernel drivers... Shining the same light on mainline drivers wouldn't be pretty. The work going into hardening the Qualcomm drivers isn't happening upstream to any comparable extent.
On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote: > This feature doesn't come from Android. The perf events subsystem in the > mainline kernel is packed full of vulnerabilities too. Uhh, not so much. I spend a _lot_ of time a while back to get the core and x86 solid. I could run the fuzzers for hours on end at some point. > The problem is so bad that pointing one of the public fuzzers at it > for a short period of time is all that's required to start finding > them. If you know of any that reproduce on x86 I'll go fix. For anything else you need to complain elsewhere as I don't have hardware nor bandwidth.
On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote: > Qualcomm's drivers might be lower quality than core kernel code, but > they're way above the baseline set by mainline kernel drivers... I don't think that's true for the arm/arm64 perf code. I think we've done a reasonable job of testing and fixing those, along with core infrastructure issues. The perf fuzzer runs for a very long time on a mainline kernel without issues, while on my Nexus 5x I get a hard lockup after ~85 seconds (and prior to the last android update the lockup was instantaneous). > Shining the same light on mainline drivers wouldn't be pretty. The work > going into hardening the Qualcomm drivers isn't happening upstream to > any comparable extent. From my personal experience (and as above), and talking specifically about PMU drivers, I think that the opposite is true. This is not to say there aren't issues; I would not be surprised if there are. But it's disingenuous to say that mainline code is worse than that which exists in a vendor kernel when the latter is demonstrably much easier to break than the former. If there are issues you are aware of, please report them. If those issues only exist in non-upstream code, then the applicable concerns are somewhat different (though certainly still exist). But please, let's frame the argument to match reality. Thanks, Mark.
On Thu, 2016-08-04 at 17:10 +0100, Mark Rutland wrote: > On Thu, Aug 04, 2016 at 11:44:28AM -0400, Daniel Micay wrote: > > > > Qualcomm's drivers might be lower quality than core kernel code, but > > they're way above the baseline set by mainline kernel drivers... > > I don't think that's true for the arm/arm64 perf code. The baseline architecture support is essentially core kernel code. I agree it's much better than the SoC vendor code. You're spending a lot of time auditing, fuzzing and improving the code in general, which is not true for most drivers. They don't get that attention. > I think we've done a reasonable job of testing and fixing those, along > with core infrastructure issues. The perf fuzzer runs for a very long > time on a mainline kernel without issues, while on my Nexus 5x I get a > hard lockup after ~85 seconds (and prior to the last android update > the > lockup was instantaneous). > > From my personal experience (and as above), and talking specifically > about PMU drivers, I think that the opposite is true. This is not to > say > there aren't issues; I would not be surprised if there are. But it's > disingenuous to say that mainline code is worse than that which exists > in a vendor kernel when the latter is demonstrably much easier to > break > than the former. I wasn't talking specifically about perf. > If there are issues you are aware of, please report them. If those > issues only exist in non-upstream code, then the applicable concerns > are > somewhat different (though certainly still exist). I'm not going to do volunteer work for a corporation. I've learned that lesson after spending years doing it. > But please, let's frame the argument to match reality. The argument is framed in reality. Stating that it now often takes a few hours to find a vulnerability with the unaltered, widely known public perf fuzzer is not impressive. It's really an argument for claiming that it's a significant security issue.
On Thu, Aug 04, 2016 at 12:32:32PM -0400, Daniel Micay wrote: > On Thu, 2016-08-04 at 17:10 +0100, Mark Rutland wrote: > I wasn't talking specifically about perf. Then this is irrelevant to a discussion about limiting access to the perf interface. Hardening drivers in general is a very interesting topic, but it is a different topic. > > But please, let's frame the argument to match reality. > > The argument is framed in reality. Stating that it now often takes a > few hours to find a vulnerability with the unaltered, widely known > public perf fuzzer is not impressive. It's really an argument for > claiming that it's a significant security issue. My claim was not that the mainline code was impressively perfect, but rather that the vendor code was worse, countering a prior claim otherwise. Hence, reality. There is cetainly much that can be done to improve things, if we discuss that which is actually applicable. Thanks, Mark.
> My claim was not that the mainline code was impressively perfect, but > rather that the vendor code was worse, countering a prior claim > otherwise. Hence, reality. You're arguing with a straw man. I was responding to a comment about out-of-tree code, not generic architecture perf drivers vs. alternative versions by SoC vendors. Qualcomm and other vendors landing their drivers in mainline would be nice, but it wouldn't make it inherently higher quality. I don't really see what it has to do with this, which I why I responded...
Hi, Attempt to revive discussions below... On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > all access to performance events by users without CAP_SYS_ADMIN. > > This new level of restriction is intended to reduce the attack > surface of the kernel. Perf is a valuable tool for developers but > is generally unnecessary and unused on production systems. Perf may > open up an attack vector to vulnerable device-specific drivers as > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of > restriction allows for a safe default to be set on production systems > while leaving a simple means for developers to grant access [1]. > > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches > have been modified and split up to address on-list feedback. > > kernel.perf_event_paranoid=3 is the default on both Debian [2] and > Android [3]. While people weren't particularly happy with this global toggle approach, my understanding from face-to-face discussions at LSS2016 was that people were happy with a more scoped restriction (e.g. using capabilities or some other access control mechanism), but no-one had the time to work on that. Does that match everyone's understanding, or am I mistaken? It's also my understanding that for Android, perf_event_paranoid is lowered when the user enables developer mode (rather than only when an external debugger is attached); is that correct? Thanks, Mark.
On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote: > Hi, > > Attempt to revive discussions below... > > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > > all access to performance events by users without CAP_SYS_ADMIN. > > > > This new level of restriction is intended to reduce the attack > > surface of the kernel. Perf is a valuable tool for developers but > > is generally unnecessary and unused on production systems. Perf may > > open up an attack vector to vulnerable device-specific drivers as > > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of > > restriction allows for a safe default to be set on production > > systems > > while leaving a simple means for developers to grant access [1]. > > > > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad > > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches > > have been modified and split up to address on-list feedback. > > > > kernel.perf_event_paranoid=3 is the default on both Debian [2] and > > Android [3]. > > While people weren't particularly happy with this global toggle > approach, my understanding from face-to-face discussions at LSS2016 > was > that people were happy with a more scoped restriction (e.g. using > capabilities or some other access control mechanism), but no-one had > the > time to work on that. > > Does that match everyone's understanding, or am I mistaken? > > It's also my understanding that for Android, perf_event_paranoid is > lowered when the user enables developer mode (rather than only when an > external debugger is attached); is that correct? It's exposed as a "system property" marked as writable by the shell user, so the Android Debug Bridge shell can lower it. The debugging tools learned how to toggle it off automatically when they're used. It intentionally isn't a persist. prefixed property so the setting also goes away on reboot. ADB (incl. the shell user) isn't available until developer mode is enabled + ADB is toggled on in the developer settings, and then it still requires whitelisting keys.
On Mon, Oct 17, 2016 at 6:44 AM, Mark Rutland <mark.rutland@arm.com> wrote: > Hi, > > Attempt to revive discussions below... > > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: >> When kernel.perf_event_paranoid is set to 3 (or greater), disallow >> all access to performance events by users without CAP_SYS_ADMIN. >> >> This new level of restriction is intended to reduce the attack >> surface of the kernel. Perf is a valuable tool for developers but >> is generally unnecessary and unused on production systems. Perf may >> open up an attack vector to vulnerable device-specific drivers as >> recently demonstrated in CVE-2016-0805, CVE-2016-0819, >> CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of >> restriction allows for a safe default to be set on production systems >> while leaving a simple means for developers to grant access [1]. >> >> This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad >> Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches >> have been modified and split up to address on-list feedback. >> >> kernel.perf_event_paranoid=3 is the default on both Debian [2] and >> Android [3]. > > While people weren't particularly happy with this global toggle > approach, my understanding from face-to-face discussions at LSS2016 was > that people were happy with a more scoped restriction (e.g. using > capabilities or some other access control mechanism), but no-one had the > time to work on that. > > Does that match everyone's understanding, or am I mistaken? That's correct: some kind of finer-grain control would be preferred to the maintainer, but no one has had time to work on it. (The =3 sysctl setting present in Android, Debian, and Ubuntu satisfies most people.) -Kees
On Tue, 2016-10-18 at 13:48 -0700, Kees Cook wrote: > On Mon, Oct 17, 2016 at 6:44 AM, Mark Rutland <mark.rutland@arm.com> > wrote: > > Hi, > > > > Attempt to revive discussions below... > > > > On Wed, Jul 27, 2016 at 07:45:46AM -0700, Jeff Vander Stoep wrote: > > > When kernel.perf_event_paranoid is set to 3 (or greater), disallow > > > all access to performance events by users without CAP_SYS_ADMIN. > > > > > > This new level of restriction is intended to reduce the attack > > > surface of the kernel. Perf is a valuable tool for developers but > > > is generally unnecessary and unused on production systems. Perf > > > may > > > open up an attack vector to vulnerable device-specific drivers as > > > recently demonstrated in CVE-2016-0805, CVE-2016-0819, > > > CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of > > > restriction allows for a safe default to be set on production > > > systems > > > while leaving a simple means for developers to grant access [1]. > > > > > > This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad > > > Spengler. It is based on a patch by Ben Hutchings [2]. Ben's > > > patches > > > have been modified and split up to address on-list feedback. > > > > > > kernel.perf_event_paranoid=3 is the default on both Debian [2] and > > > Android [3]. > > > > While people weren't particularly happy with this global toggle > > approach, my understanding from face-to-face discussions at LSS2016 > > was > > that people were happy with a more scoped restriction (e.g. using > > capabilities or some other access control mechanism), but no-one had > > the > > time to work on that. > > > > Does that match everyone's understanding, or am I mistaken? > > That's correct: some kind of finer-grain control would be preferred to > the maintainer, but no one has had time to work on it. (The =3 sysctl > setting present in Android, Debian, and Ubuntu satisfies most people.) > > -Kees It's also worth noting that fine-grained control via a scoped mechanism would likely only be used to implement *more restrictions* on Android, not to make the feature less aggressive. It's desirable for perf events to be disabled by default for non-root across the board on Android. The part that's imperfect is that when a developer uses a profiling tool, unprivileged usage is automatically enabled across the board until reboot. Ideally, it would be enabled only for the scope where it's needed. It would be very tricky to implement though, especially without adding friction, and it would only have value for protecting devices being used for development. It really doesn't seem to be worth the trouble, especially since it doesn't persist on reboot. It's only a temporary security hole and only for developer devices.
On Mon, Oct 17, 2016 at 10:54:33AM -0400, Daniel Micay wrote: > On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote: > > It's also my understanding that for Android, perf_event_paranoid is > > lowered when the user enables developer mode (rather than only when an > > external debugger is attached); is that correct? > > It's exposed as a "system property" marked as writable by the shell > user, so the Android Debug Bridge shell can lower it. The debugging > tools learned how to toggle it off automatically when they're used. It > intentionally isn't a persist. prefixed property so the setting also > goes away on reboot. > > ADB (incl. the shell user) isn't available until developer mode is > enabled + ADB is toggled on in the developer settings, and then it still > requires whitelisting keys. Ah; so I'd misunderstood somewhat. I was under the (clearly mistaken) impression that this was lowered when developer mode was enabled, rather than only when it was both enabled and ADB was connected, for example. Thanks for clearing that up! Thanks, Mark.
On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote: > It's also worth noting that fine-grained control via a scoped > mechanism would likely only be used to implement *more restrictions* > on Android, not to make the feature less aggressive. It's desirable > for perf events to be disabled by default for non-root across the > board on Android. The part that's imperfect is that when a developer > uses a profiling tool, unprivileged usage is automatically enabled > across the board until reboot. Ideally, it would be enabled only for > the scope where it's needed. Sure; understood. > It would be very tricky to implement though, especially without adding > friction, and it would only have value for protecting devices being > used for development. It really doesn't seem to be worth the trouble, > especially since it doesn't persist on reboot. It's only a temporary > security hole and only for developer devices. I can see that for Android this isn't much of a win. It is beneficial elsewhere, and covers a larger set of use-cases. If perf were a filesystem object, we'd only allow access by a given 'perf' group, and that would be sufficient to avoid most of that friction (IIUC). I wonder what we can do that's similar. Thanks, Mark.
On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote: > It's also worth noting that fine-grained control via a scoped mechanism > would likely only be used to implement *more restrictions* on Android, > not to make the feature less aggressive. > It's desirable for perf events to be disabled by default for non-root > across the board on Android. Right, but this is Android. The knob seems to now also live in Debian (and derived) distros. And there it is utter crap. It completely defeats having perf for a fairly large segment of corporate developers who do not get to have root on their own machines (which is stupid policy but whatever). It similarly defeats development of self profiling JITs and whatnot. A capability would allow people to run perf (or another sanctioned binary), even though in general they cannot do sys_perf_event_open().
Em Wed, Oct 19, 2016 at 12:01:26PM +0200, Peter Zijlstra escreveu: > On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote: > > It's also worth noting that fine-grained control via a scoped mechanism > > would likely only be used to implement *more restrictions* on Android, > > not to make the feature less aggressive. > > It's desirable for perf events to be disabled by default for non-root > > across the board on Android. > Right, but this is Android. The knob seems to now also live in Debian > (and derived) distros. And there it is utter crap. > It completely defeats having perf for a fairly large segment of > corporate developers who do not get to have root on their own machines > (which is stupid policy but whatever). > It similarly defeats development of self profiling JITs and whatnot. > A capability would allow people to run perf (or another sanctioned > binary), even though in general they cannot do sys_perf_event_open(). But self profiling JITs would be useful for non-developers, on Android (anywhere, really), and for that it would require being able to at least, well, self profile, using sys_perf_event_open() by a normal process, limited to profiling itself, no? This not being possible, self profiling will use some other means, its like sys_perf_event_open() never existed for them. - Arnaldo
On Wed, Oct 19, 2016 at 07:26:02AM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Oct 19, 2016 at 12:01:26PM +0200, Peter Zijlstra escreveu: > > On Tue, Oct 18, 2016 at 05:15:01PM -0400, Daniel Micay wrote: > > > It's also worth noting that fine-grained control via a scoped mechanism > > > would likely only be used to implement *more restrictions* on Android, > > > not to make the feature less aggressive. > > > > It's desirable for perf events to be disabled by default for non-root > > > across the board on Android. > > > Right, but this is Android. The knob seems to now also live in Debian > > (and derived) distros. And there it is utter crap. > > > It completely defeats having perf for a fairly large segment of > > corporate developers who do not get to have root on their own machines > > (which is stupid policy but whatever). > > > It similarly defeats development of self profiling JITs and whatnot. > > > A capability would allow people to run perf (or another sanctioned > > binary), even though in general they cannot do sys_perf_event_open(). > > But self profiling JITs would be useful for non-developers, on Android > (anywhere, really), and for that it would require being able to at > least, well, self profile, using sys_perf_event_open() by a normal > process, limited to profiling itself, no? > > This not being possible, self profiling will use some other means, its > like sys_perf_event_open() never existed for them. Right, so with capabilities, we could grant the binary the capability to use sys_perf_event_open(). That would still leave developers of said JIT in a tight spot, because there'd be no way to set the capability on their freshly compiled binary. They'd have to be granted the capability to the user, using pam_cap. Which would involve corp. IT doing something sensible, ergo, this'll never happen :-(.
On Wed, 2016-10-19 at 10:41 +0100, Mark Rutland wrote: > On Mon, Oct 17, 2016 at 10:54:33AM -0400, Daniel Micay wrote: > > On Mon, 2016-10-17 at 14:44 +0100, Mark Rutland wrote: > > > It's also my understanding that for Android, perf_event_paranoid > > > is > > > lowered when the user enables developer mode (rather than only > > > when an > > > external debugger is attached); is that correct? > > > > It's exposed as a "system property" marked as writable by the shell > > user, so the Android Debug Bridge shell can lower it. The debugging > > tools learned how to toggle it off automatically when they're used. > > It > > intentionally isn't a persist. prefixed property so the setting also > > goes away on reboot. > > > > ADB (incl. the shell user) isn't available until developer mode is > > enabled + ADB is toggled on in the developer settings, and then it > > still > > requires whitelisting keys. > > Ah; so I'd misunderstood somewhat. > > I was under the (clearly mistaken) impression that this was lowered > when > developer mode was enabled, rather than only when it was both enabled > and ADB was connected, for example. > > Thanks for clearing that up! ADB provides a shell as the 'shell' user, and that user has the ability to toggle the sysctl. So profiling tools were able to be taught to do that automatically. It's the only way that the 'shell' user is actually exposed. For example, a terminal app just runs in the untrusted_app SELinux domain as a unique unprivileged uid/gid pair, not as the much more privileged ADB 'shell' domain. So it doesn't actually get toggled off you use ADB to do something else. ADB itself is pretty much comparable to SSH, but over USB (i.e. key- based way of getting a shell). The 'shell' user has tools like 'run-as' to be able to run things as various apps (if they are marked debuggable), so in theory it could be finer-grained and act only there, for the app being debugged. It would be really hard to cover all use cases and maybe things other than apps though (although in an Android 'user' build, the base system itself isn't very debuggable, you really need 'userdebug' or 'eng' which isn't what ships on end user devices).
On Wed, 2016-10-19 at 07:26 -0300, Arnaldo Carvalho de Melo wrote: > > But self profiling JITs would be useful for non-developers, on Android > (anywhere, really), and for that it would require being able to at > least, well, self profile, using sys_perf_event_open() by a normal > process, limited to profiling itself, no? > > This not being possible, self profiling will use some other means, its > like sys_perf_event_open() never existed for them. > > - Arnaldo It would defeat the purpose of the security feature if it was exposed to apps on Android. There are other ways for Chromium (including WebView) and the ART JIT to profile. AFAIK, they've never actually considered using perf for their JIT profiling. It wasn't something that anyone cared about when this was implemented. Chromium would *certainly* never use perf for this. They use a much tighter sandbox than the Android app sandbox. It doesn't have system calls like open(), let alone perf events. Their seccomp-bpf usage is to reduce kernel attack surface, since it has proven to be the easiest way out of a sandbox. They don't even allow futex() without filtering based on the parameters anymore. That proved to be a major issue. The only case where untrusted apps declaring the privileges they need actually works out is when the privileges are high-level controls over access to a user's personal information. That way, they can be exposed to the user with the ability to reject access when it's first needed or toggle it off later. The strength of the app sandbox isn't something that end users can be responsible for. Permissions also don't work if apps bypass them with local privilege escalation vulnerabilities. Even for the base system, no perf access is better than having it used anywhere. The difference is only that the base system can actually be trusted to declare what it needs, but those components can be exploited so it's still best if they are trusted as little as possible at runtime. I could see Android completely removing unprivileged access down the road too, not as a form of access control (apps already run as unique uid/gid pairs, etc.) but to remove a non-trivial form of kernel attack surface. The perf API isn't being singled out here. It just happened to be the first case where a kernel API was restricted to developer usage on Android. Should expect more of this. If you want perf exposed, make it secure. Stop writing kernel code in a memory unsafe language and start using isolation. Not going to happen in all likelihood, so kernel code will be increasingly walled off instead since it's the biggest liability on the system. Fixing bugs on a case by case basis doesn't work for systems that need even basic security.
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ffab8b5..fac9798 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -665,6 +665,7 @@ users (without CAP_SYS_ADMIN). The default value is 2. >=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN +>=3: Disallow all event access by users without CAP_SYS_ADMIN ============================================================== diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 8ed43261..1e2080f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1156,6 +1156,11 @@ static inline bool perf_paranoid_kernel(void) return sysctl_perf_event_paranoid > 1; } +static inline bool perf_paranoid_any(void) +{ + return sysctl_perf_event_paranoid > 2; +} + extern void perf_event_init(void); extern void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, struct pt_regs *regs, diff --git a/kernel/events/core.c b/kernel/events/core.c index 356a6c7..52bd100 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -353,6 +353,7 @@ static struct srcu_struct pmus_srcu; * 0 - disallow raw tracepoint access for unpriv * 1 - disallow cpu events for unpriv * 2 - disallow kernel profiling for unpriv + * 3 - disallow all unpriv perf event use */ int sysctl_perf_event_paranoid __read_mostly = 2; @@ -9296,6 +9297,9 @@ SYSCALL_DEFINE5(perf_event_open, if (flags & ~PERF_FLAG_ALL) return -EINVAL; + if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN)) + return -EACCES; + err = perf_copy_attr(attr_uptr, &attr); if (err) return err;
When kernel.perf_event_paranoid is set to 3 (or greater), disallow all access to performance events by users without CAP_SYS_ADMIN. This new level of restriction is intended to reduce the attack surface of the kernel. Perf is a valuable tool for developers but is generally unnecessary and unused on production systems. Perf may open up an attack vector to vulnerable device-specific drivers as recently demonstrated in CVE-2016-0805, CVE-2016-0819, CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of restriction allows for a safe default to be set on production systems while leaving a simple means for developers to grant access [1]. This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad Spengler. It is based on a patch by Ben Hutchings [2]. Ben's patches have been modified and split up to address on-list feedback. kernel.perf_event_paranoid=3 is the default on both Debian [2] and Android [3]. [1] Making perf available to developers on Android: https://android-review.googlesource.com/#/c/234400/ [2] Original patch by Ben Hutchings: https://lkml.org/lkml/2016/1/11/587 [3] https://android-review.googlesource.com/#/c/234743/ Signed-off-by: Jeff Vander Stoep <jeffv@google.com> --- Documentation/sysctl/kernel.txt | 1 + include/linux/perf_event.h | 5 +++++ kernel/events/core.c | 4 ++++ 3 files changed, 10 insertions(+)